You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2020/03/02 14:48:23 UTC

[GitHub] [hive] kgyrtkirk opened a new pull request #930: HIVE-22940 datasketches functions

kgyrtkirk opened a new pull request #930: HIVE-22940 datasketches functions
URL: https://github.com/apache/hive/pull/930
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] jcamachor commented on a change in pull request #930: HIVE-22940 datasketches functions

Posted by GitBox <gi...@apache.org>.
jcamachor commented on a change in pull request #930: HIVE-22940 datasketches functions
URL: https://github.com/apache/hive/pull/930#discussion_r391880374
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DataSketchesFunctions.java
 ##########
 @@ -0,0 +1,218 @@
+package org.apache.hadoop.hive.ql.exec;
+
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFResolver2;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
+
+public class DataSketchesFunctions {
+
+  private static final String DATA_TO_SKETCH = "datatosketch";
+  private static final String SKETCH_TO_ESTIMATE_WITH_ERROR_BOUNDS = "sketchToEstimateWithErrorBounds";
 
 Review comment:
   +1

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] jcamachor commented on a change in pull request #930: HIVE-22940 datasketches functions

Posted by GitBox <gi...@apache.org>.
jcamachor commented on a change in pull request #930: HIVE-22940 datasketches functions
URL: https://github.com/apache/hive/pull/930#discussion_r386675311
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DataSketchesFunctions.java
 ##########
 @@ -0,0 +1,218 @@
+package org.apache.hadoop.hive.ql.exec;
+
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFResolver2;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
+
+public class DataSketchesFunctions {
+
+  private static final String DATA_TO_SKETCH = "datatosketch";
+  private static final String SKETCH_TO_ESTIMATE_WITH_ERROR_BOUNDS = "sketchToEstimateWithErrorBounds";
+  // FIXME: consider to rename it to simply "estimate" or "evaluate" - in case of the counting sketches the "sketchto..." doesnt add value
+  private static final String SKETCH_TO_ESTIMATE = "sketchToEstimate";
+  private static final String SKETCH_TO_STRING = "sketchToString";
 
 Review comment:
   `stringify` ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] jcamachor commented on a change in pull request #930: HIVE-22940 datasketches functions

Posted by GitBox <gi...@apache.org>.
jcamachor commented on a change in pull request #930: HIVE-22940 datasketches functions
URL: https://github.com/apache/hive/pull/930#discussion_r386675181
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DataSketchesFunctions.java
 ##########
 @@ -0,0 +1,218 @@
+package org.apache.hadoop.hive.ql.exec;
+
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFResolver2;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
+
+public class DataSketchesFunctions {
+
+  private static final String DATA_TO_SKETCH = "datatosketch";
+  private static final String SKETCH_TO_ESTIMATE_WITH_ERROR_BOUNDS = "sketchToEstimateWithErrorBounds";
+  // FIXME: consider to rename it to simply "estimate" or "evaluate" - in case of the counting sketches the "sketchto..." doesnt add value
+  private static final String SKETCH_TO_ESTIMATE = "sketchToEstimate";
 
 Review comment:
   `get_estimate` ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] jcamachor commented on a change in pull request #930: HIVE-22940 datasketches functions

Posted by GitBox <gi...@apache.org>.
jcamachor commented on a change in pull request #930: HIVE-22940 datasketches functions
URL: https://github.com/apache/hive/pull/930#discussion_r391880118
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DataSketchesFunctions.java
 ##########
 @@ -0,0 +1,218 @@
+package org.apache.hadoop.hive.ql.exec;
+
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFResolver2;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
+
+public class DataSketchesFunctions {
+
+  private static final String DATA_TO_SKETCH = "datatosketch";
+  private static final String SKETCH_TO_ESTIMATE_WITH_ERROR_BOUNDS = "sketchToEstimateWithErrorBounds";
+  // FIXME: consider to rename it to simply "estimate" or "evaluate" - in case of the counting sketches the "sketchto..." doesnt add value
+  private static final String SKETCH_TO_ESTIMATE = "sketchToEstimate";
 
 Review comment:
   makes sense, +1

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kgyrtkirk commented on a change in pull request #930: HIVE-22940 datasketches functions

Posted by GitBox <gi...@apache.org>.
kgyrtkirk commented on a change in pull request #930: HIVE-22940 datasketches functions
URL: https://github.com/apache/hive/pull/930#discussion_r390409296
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DataSketchesFunctions.java
 ##########
 @@ -0,0 +1,218 @@
+package org.apache.hadoop.hive.ql.exec;
+
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFResolver2;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
+
+public class DataSketchesFunctions {
+
+  private static final String DATA_TO_SKETCH = "datatosketch";
+  private static final String SKETCH_TO_ESTIMATE_WITH_ERROR_BOUNDS = "sketchToEstimateWithErrorBounds";
 
 Review comment:
   `get_estimate_bounds` ? 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kgyrtkirk commented on a change in pull request #930: HIVE-22940 datasketches functions

Posted by GitBox <gi...@apache.org>.
kgyrtkirk commented on a change in pull request #930: HIVE-22940 datasketches functions
URL: https://github.com/apache/hive/pull/930#discussion_r390434617
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DataSketchesFunctions.java
 ##########
 @@ -0,0 +1,218 @@
+package org.apache.hadoop.hive.ql.exec;
+
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFResolver2;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
+
+public class DataSketchesFunctions {
+
+  private static final String DATA_TO_SKETCH = "datatosketch";
 
 Review comment:
   the final name of the method is:
   * `ds_theta_gen_sketch`
   * `ds_theta_to_sketch`
   * `ds_theta_build_sketch`
   * `ds_theta_build`
   * `ds_theta_sketch`
   I right now feel these last 2 the best; they are easy to remember and the naming of these methods will be structured anyway - so the `ds_theta_` prefix somewhat adds a context for the function we may not need to repeat "sketch"

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] jcamachor commented on a change in pull request #930: HIVE-22940 datasketches functions

Posted by GitBox <gi...@apache.org>.
jcamachor commented on a change in pull request #930: HIVE-22940 datasketches functions
URL: https://github.com/apache/hive/pull/930#discussion_r391880266
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DataSketchesFunctions.java
 ##########
 @@ -0,0 +1,218 @@
+package org.apache.hadoop.hive.ql.exec;
+
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFResolver2;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
+
+public class DataSketchesFunctions {
+
+  private static final String DATA_TO_SKETCH = "datatosketch";
 
 Review comment:
   LGTM, I think they should easy enough to identify, and it is good we follow same pattern.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] jcamachor commented on a change in pull request #930: HIVE-22940 datasketches functions

Posted by GitBox <gi...@apache.org>.
jcamachor commented on a change in pull request #930: HIVE-22940 datasketches functions
URL: https://github.com/apache/hive/pull/930#discussion_r386691018
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DataSketchesFunctions.java
 ##########
 @@ -0,0 +1,218 @@
+package org.apache.hadoop.hive.ql.exec;
+
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFResolver2;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
+
+public class DataSketchesFunctions {
+
+  private static final String DATA_TO_SKETCH = "datatosketch";
+  private static final String SKETCH_TO_ESTIMATE_WITH_ERROR_BOUNDS = "sketchToEstimateWithErrorBounds";
+  // FIXME: consider to rename it to simply "estimate" or "evaluate" - in case of the counting sketches the "sketchto..." doesnt add value
+  private static final String SKETCH_TO_ESTIMATE = "sketchToEstimate";
+  private static final String SKETCH_TO_STRING = "sketchToString";
+  // FIXME: probably use simply "union" instead unionSketch?
+  private static final String UNION_SKETCH = "unionSketch";
+  private static final String GET_N = "getN";
+  private static final String GET_CDF = "getCdf";
+  private static final String GET_PMF = "getPmf";
+  private static final String GET_QUANTILES = "GetQuantiles";
+  private static final String GET_QUANTILE = "GetQuantile";
+  private static final String GET_RANK = "getRank";
+  private static final String INTERSECT_SKETCH = "intersection";
+  private static final String EXCLUDE_SKETCH = "exclude";
+  private static final String GET_K = "getK";
+  private static final String GET_FREQUENT_ITEMS = "getFrequentItems";
+  private static final String T_TEST = "TTest";
+  private static final String SKETCH_TO_MEANS = "sketchtomeans";
+  private static final String SKETCH_TO_NUMBER_OF_RETAINED_ENTRIES = "sketchtonumberofretainedentries";
 
 Review comment:
   `get_number_retained_entries` ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] jcamachor commented on a change in pull request #930: HIVE-22940 datasketches functions

Posted by GitBox <gi...@apache.org>.
jcamachor commented on a change in pull request #930: HIVE-22940 datasketches functions
URL: https://github.com/apache/hive/pull/930#discussion_r386684058
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DataSketchesFunctions.java
 ##########
 @@ -0,0 +1,218 @@
+package org.apache.hadoop.hive.ql.exec;
+
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFResolver2;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
+
+public class DataSketchesFunctions {
+
+  private static final String DATA_TO_SKETCH = "datatosketch";
+  private static final String SKETCH_TO_ESTIMATE_WITH_ERROR_BOUNDS = "sketchToEstimateWithErrorBounds";
+  // FIXME: consider to rename it to simply "estimate" or "evaluate" - in case of the counting sketches the "sketchto..." doesnt add value
+  private static final String SKETCH_TO_ESTIMATE = "sketchToEstimate";
+  private static final String SKETCH_TO_STRING = "sketchToString";
+  // FIXME: probably use simply "union" instead unionSketch?
+  private static final String UNION_SKETCH = "unionSketch";
+  private static final String GET_N = "getN";
 
 Review comment:
   Can we use `_` for this function and those below instead of lower-upper case?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kgyrtkirk commented on a change in pull request #930: HIVE-22940 datasketches functions

Posted by GitBox <gi...@apache.org>.
kgyrtkirk commented on a change in pull request #930: HIVE-22940 datasketches functions
URL: https://github.com/apache/hive/pull/930#discussion_r390419219
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DataSketchesFunctions.java
 ##########
 @@ -0,0 +1,218 @@
+package org.apache.hadoop.hive.ql.exec;
+
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFResolver2;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
+
+public class DataSketchesFunctions {
+
+  private static final String DATA_TO_SKETCH = "datatosketch";
+  private static final String SKETCH_TO_ESTIMATE_WITH_ERROR_BOUNDS = "sketchToEstimateWithErrorBounds";
+  // FIXME: consider to rename it to simply "estimate" or "evaluate" - in case of the counting sketches the "sketchto..." doesnt add value
+  private static final String SKETCH_TO_ESTIMATE = "sketchToEstimate";
+  private static final String SKETCH_TO_STRING = "sketchToString";
+  // FIXME: probably use simply "union" instead unionSketch?
+  private static final String UNION_SKETCH = "unionSketch";
+  private static final String GET_N = "getN";
+  private static final String GET_CDF = "getCdf";
+  private static final String GET_PMF = "getPmf";
+  private static final String GET_QUANTILES = "GetQuantiles";
+  private static final String GET_QUANTILE = "GetQuantile";
+  private static final String GET_RANK = "getRank";
+  private static final String INTERSECT_SKETCH = "intersection";
+  private static final String EXCLUDE_SKETCH = "exclude";
+  private static final String GET_K = "getK";
+  private static final String GET_FREQUENT_ITEMS = "getFrequentItems";
+  private static final String T_TEST = "TTest";
+  private static final String SKETCH_TO_MEANS = "sketchtomeans";
+  private static final String SKETCH_TO_NUMBER_OF_RETAINED_ENTRIES = "sketchtonumberofretainedentries";
+  private static final String SKETCH_TO_QUANTILES_SKETCH = "sketchToQuantilesSketch";
+  private static final String SKETCH_TO_VALUES = "sketchToValues";
+  private static final String SKETCH_TO_VARIANCES = "sketchToVariances";
+  private static final String SKETCH_TO_PERCENTILE = "sketchToPercentile";
+  private static final String UNION_SKETCH1 = "unionSketch1";
+  private static final String INTERSECT_SKETCH1 = "intersect";
+
+  private final Registry system;
+
+  public DataSketchesFunctions(Registry system) {
+    this.system = system;
+  }
+
+  public static void register(Registry system) {
+    DataSketchesFunctions dsf = new DataSketchesFunctions(system);
+    // FIXME: what this should be approx, ds ... other?
+    String prefix = "ds";
 
 Review comment:
   @jcamachor  what do you think about this "ds" prefix ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] jcamachor commented on a change in pull request #930: HIVE-22940 datasketches functions

Posted by GitBox <gi...@apache.org>.
jcamachor commented on a change in pull request #930: HIVE-22940 datasketches functions
URL: https://github.com/apache/hive/pull/930#discussion_r386684979
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DataSketchesFunctions.java
 ##########
 @@ -0,0 +1,218 @@
+package org.apache.hadoop.hive.ql.exec;
+
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFResolver2;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
+
+public class DataSketchesFunctions {
+
+  private static final String DATA_TO_SKETCH = "datatosketch";
+  private static final String SKETCH_TO_ESTIMATE_WITH_ERROR_BOUNDS = "sketchToEstimateWithErrorBounds";
+  // FIXME: consider to rename it to simply "estimate" or "evaluate" - in case of the counting sketches the "sketchto..." doesnt add value
+  private static final String SKETCH_TO_ESTIMATE = "sketchToEstimate";
+  private static final String SKETCH_TO_STRING = "sketchToString";
+  // FIXME: probably use simply "union" instead unionSketch?
+  private static final String UNION_SKETCH = "unionSketch";
+  private static final String GET_N = "getN";
+  private static final String GET_CDF = "getCdf";
+  private static final String GET_PMF = "getPmf";
+  private static final String GET_QUANTILES = "GetQuantiles";
+  private static final String GET_QUANTILE = "GetQuantile";
+  private static final String GET_RANK = "getRank";
+  private static final String INTERSECT_SKETCH = "intersection";
+  private static final String EXCLUDE_SKETCH = "exclude";
+  private static final String GET_K = "getK";
+  private static final String GET_FREQUENT_ITEMS = "getFrequentItems";
+  private static final String T_TEST = "TTest";
+  private static final String SKETCH_TO_MEANS = "sketchtomeans";
 
 Review comment:
   As you mentioned above, probably `sketchto` can be removed from these function names. Additionally, we can use `_` when they consist of multiple words.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] jcamachor commented on a change in pull request #930: HIVE-22940 datasketches functions

Posted by GitBox <gi...@apache.org>.
jcamachor commented on a change in pull request #930: HIVE-22940 datasketches functions
URL: https://github.com/apache/hive/pull/930#discussion_r386674517
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DataSketchesFunctions.java
 ##########
 @@ -0,0 +1,218 @@
+package org.apache.hadoop.hive.ql.exec;
+
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFResolver2;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
+
+public class DataSketchesFunctions {
+
+  private static final String DATA_TO_SKETCH = "datatosketch";
 
 Review comment:
   `gen_sketch` ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] jcamachor commented on a change in pull request #930: HIVE-22940 datasketches functions

Posted by GitBox <gi...@apache.org>.
jcamachor commented on a change in pull request #930: HIVE-22940 datasketches functions
URL: https://github.com/apache/hive/pull/930#discussion_r391880759
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DataSketchesFunctions.java
 ##########
 @@ -0,0 +1,218 @@
+package org.apache.hadoop.hive.ql.exec;
+
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFResolver2;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
+
+public class DataSketchesFunctions {
+
+  private static final String DATA_TO_SKETCH = "datatosketch";
+  private static final String SKETCH_TO_ESTIMATE_WITH_ERROR_BOUNDS = "sketchToEstimateWithErrorBounds";
+  // FIXME: consider to rename it to simply "estimate" or "evaluate" - in case of the counting sketches the "sketchto..." doesnt add value
+  private static final String SKETCH_TO_ESTIMATE = "sketchToEstimate";
+  private static final String SKETCH_TO_STRING = "sketchToString";
+  // FIXME: probably use simply "union" instead unionSketch?
+  private static final String UNION_SKETCH = "unionSketch";
+  private static final String GET_N = "getN";
+  private static final String GET_CDF = "getCdf";
+  private static final String GET_PMF = "getPmf";
+  private static final String GET_QUANTILES = "GetQuantiles";
+  private static final String GET_QUANTILE = "GetQuantile";
+  private static final String GET_RANK = "getRank";
+  private static final String INTERSECT_SKETCH = "intersection";
+  private static final String EXCLUDE_SKETCH = "exclude";
+  private static final String GET_K = "getK";
+  private static final String GET_FREQUENT_ITEMS = "getFrequentItems";
+  private static final String T_TEST = "TTest";
+  private static final String SKETCH_TO_MEANS = "sketchtomeans";
+  private static final String SKETCH_TO_NUMBER_OF_RETAINED_ENTRIES = "sketchtonumberofretainedentries";
 
 Review comment:
   LGTM, if you remove `get` in all of them, use `n_retained` 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kgyrtkirk commented on a change in pull request #930: HIVE-22940 datasketches functions

Posted by GitBox <gi...@apache.org>.
kgyrtkirk commented on a change in pull request #930: HIVE-22940 datasketches functions
URL: https://github.com/apache/hive/pull/930#discussion_r390410296
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DataSketchesFunctions.java
 ##########
 @@ -0,0 +1,218 @@
+package org.apache.hadoop.hive.ql.exec;
+
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFResolver2;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
+
+public class DataSketchesFunctions {
+
+  private static final String DATA_TO_SKETCH = "datatosketch";
 
 Review comment:
   this was something which was really consistent; but I think:
   * `to_sketch`
   * `gen_sketch`
   * `build_sketch`
   would be probably better - I changed it to `gen_sketch` for now...

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] jcamachor commented on a change in pull request #930: HIVE-22940 datasketches functions

Posted by GitBox <gi...@apache.org>.
jcamachor commented on a change in pull request #930: HIVE-22940 datasketches functions
URL: https://github.com/apache/hive/pull/930#discussion_r386684215
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DataSketchesFunctions.java
 ##########
 @@ -0,0 +1,218 @@
+package org.apache.hadoop.hive.ql.exec;
+
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFResolver2;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
+
+public class DataSketchesFunctions {
+
+  private static final String DATA_TO_SKETCH = "datatosketch";
+  private static final String SKETCH_TO_ESTIMATE_WITH_ERROR_BOUNDS = "sketchToEstimateWithErrorBounds";
+  // FIXME: consider to rename it to simply "estimate" or "evaluate" - in case of the counting sketches the "sketchto..." doesnt add value
+  private static final String SKETCH_TO_ESTIMATE = "sketchToEstimate";
+  private static final String SKETCH_TO_STRING = "sketchToString";
+  // FIXME: probably use simply "union" instead unionSketch?
+  private static final String UNION_SKETCH = "unionSketch";
+  private static final String GET_N = "getN";
+  private static final String GET_CDF = "getCdf";
+  private static final String GET_PMF = "getPmf";
+  private static final String GET_QUANTILES = "GetQuantiles";
+  private static final String GET_QUANTILE = "GetQuantile";
+  private static final String GET_RANK = "getRank";
+  private static final String INTERSECT_SKETCH = "intersection";
 
 Review comment:
   `intersection` ->  `intersect`
   
   Below:
   `intersect` ->  `intersect_f` ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] jcamachor commented on a change in pull request #930: HIVE-22940 datasketches functions

Posted by GitBox <gi...@apache.org>.
jcamachor commented on a change in pull request #930: HIVE-22940 datasketches functions
URL: https://github.com/apache/hive/pull/930#discussion_r386683403
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DataSketchesFunctions.java
 ##########
 @@ -0,0 +1,218 @@
+package org.apache.hadoop.hive.ql.exec;
+
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFResolver2;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
+
+public class DataSketchesFunctions {
+
+  private static final String DATA_TO_SKETCH = "datatosketch";
+  private static final String SKETCH_TO_ESTIMATE_WITH_ERROR_BOUNDS = "sketchToEstimateWithErrorBounds";
 
 Review comment:
   `get_bounded_estimate` ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] jcamachor commented on a change in pull request #930: HIVE-22940 datasketches functions

Posted by GitBox <gi...@apache.org>.
jcamachor commented on a change in pull request #930: HIVE-22940 datasketches functions
URL: https://github.com/apache/hive/pull/930#discussion_r391881328
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DataSketchesFunctions.java
 ##########
 @@ -0,0 +1,218 @@
+package org.apache.hadoop.hive.ql.exec;
+
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFResolver2;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
+
+public class DataSketchesFunctions {
+
+  private static final String DATA_TO_SKETCH = "datatosketch";
+  private static final String SKETCH_TO_ESTIMATE_WITH_ERROR_BOUNDS = "sketchToEstimateWithErrorBounds";
+  // FIXME: consider to rename it to simply "estimate" or "evaluate" - in case of the counting sketches the "sketchto..." doesnt add value
+  private static final String SKETCH_TO_ESTIMATE = "sketchToEstimate";
+  private static final String SKETCH_TO_STRING = "sketchToString";
+  // FIXME: probably use simply "union" instead unionSketch?
+  private static final String UNION_SKETCH = "unionSketch";
+  private static final String GET_N = "getN";
+  private static final String GET_CDF = "getCdf";
+  private static final String GET_PMF = "getPmf";
+  private static final String GET_QUANTILES = "GetQuantiles";
+  private static final String GET_QUANTILE = "GetQuantile";
+  private static final String GET_RANK = "getRank";
+  private static final String INTERSECT_SKETCH = "intersection";
+  private static final String EXCLUDE_SKETCH = "exclude";
+  private static final String GET_K = "getK";
+  private static final String GET_FREQUENT_ITEMS = "getFrequentItems";
+  private static final String T_TEST = "TTest";
+  private static final String SKETCH_TO_MEANS = "sketchtomeans";
+  private static final String SKETCH_TO_NUMBER_OF_RETAINED_ENTRIES = "sketchtonumberofretainedentries";
+  private static final String SKETCH_TO_QUANTILES_SKETCH = "sketchToQuantilesSketch";
+  private static final String SKETCH_TO_VALUES = "sketchToValues";
+  private static final String SKETCH_TO_VARIANCES = "sketchToVariances";
+  private static final String SKETCH_TO_PERCENTILE = "sketchToPercentile";
+  private static final String UNION_SKETCH1 = "unionSketch1";
+  private static final String INTERSECT_SKETCH1 = "intersect";
+
+  private final Registry system;
+
+  public DataSketchesFunctions(Registry system) {
+    this.system = system;
+  }
+
+  public static void register(Registry system) {
+    DataSketchesFunctions dsf = new DataSketchesFunctions(system);
+    // FIXME: what this should be approx, ds ... other?
+    String prefix = "ds";
 
 Review comment:
   I like `ds` 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] jcamachor commented on a change in pull request #930: HIVE-22940 datasketches functions

Posted by GitBox <gi...@apache.org>.
jcamachor commented on a change in pull request #930: HIVE-22940 datasketches functions
URL: https://github.com/apache/hive/pull/930#discussion_r386676030
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DataSketchesFunctions.java
 ##########
 @@ -0,0 +1,218 @@
+package org.apache.hadoop.hive.ql.exec;
+
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFResolver2;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
+
+public class DataSketchesFunctions {
+
+  private static final String DATA_TO_SKETCH = "datatosketch";
+  private static final String SKETCH_TO_ESTIMATE_WITH_ERROR_BOUNDS = "sketchToEstimateWithErrorBounds";
+  // FIXME: consider to rename it to simply "estimate" or "evaluate" - in case of the counting sketches the "sketchto..." doesnt add value
+  private static final String SKETCH_TO_ESTIMATE = "sketchToEstimate";
+  private static final String SKETCH_TO_STRING = "sketchToString";
+  // FIXME: probably use simply "union" instead unionSketch?
+  private static final String UNION_SKETCH = "unionSketch";
 
 Review comment:
   Makes sense to change to `union`. Maybe the UDF (`UNION_SKETCH1`), which I expect will be less used, can be changed to `union_f`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kgyrtkirk commented on a change in pull request #930: HIVE-22940 datasketches functions

Posted by GitBox <gi...@apache.org>.
kgyrtkirk commented on a change in pull request #930: HIVE-22940 datasketches functions
URL: https://github.com/apache/hive/pull/930#discussion_r390411646
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DataSketchesFunctions.java
 ##########
 @@ -0,0 +1,218 @@
+package org.apache.hadoop.hive.ql.exec;
+
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFResolver2;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
+
+public class DataSketchesFunctions {
+
+  private static final String DATA_TO_SKETCH = "datatosketch";
+  private static final String SKETCH_TO_ESTIMATE_WITH_ERROR_BOUNDS = "sketchToEstimateWithErrorBounds";
+  // FIXME: consider to rename it to simply "estimate" or "evaluate" - in case of the counting sketches the "sketchto..." doesnt add value
+  private static final String SKETCH_TO_ESTIMATE = "sketchToEstimate";
 
 Review comment:
   I think it might make sense to remove the `get_` prefix - but in that case remove it from all of the functions...
   what do you think?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kgyrtkirk commented on a change in pull request #930: HIVE-22940 datasketches functions

Posted by GitBox <gi...@apache.org>.
kgyrtkirk commented on a change in pull request #930: HIVE-22940 datasketches functions
URL: https://github.com/apache/hive/pull/930#discussion_r390417433
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DataSketchesFunctions.java
 ##########
 @@ -0,0 +1,218 @@
+package org.apache.hadoop.hive.ql.exec;
+
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFResolver2;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
+
+public class DataSketchesFunctions {
+
+  private static final String DATA_TO_SKETCH = "datatosketch";
+  private static final String SKETCH_TO_ESTIMATE_WITH_ERROR_BOUNDS = "sketchToEstimateWithErrorBounds";
+  // FIXME: consider to rename it to simply "estimate" or "evaluate" - in case of the counting sketches the "sketchto..." doesnt add value
+  private static final String SKETCH_TO_ESTIMATE = "sketchToEstimate";
+  private static final String SKETCH_TO_STRING = "sketchToString";
+  // FIXME: probably use simply "union" instead unionSketch?
+  private static final String UNION_SKETCH = "unionSketch";
+  private static final String GET_N = "getN";
+  private static final String GET_CDF = "getCdf";
+  private static final String GET_PMF = "getPmf";
+  private static final String GET_QUANTILES = "GetQuantiles";
+  private static final String GET_QUANTILE = "GetQuantile";
+  private static final String GET_RANK = "getRank";
+  private static final String INTERSECT_SKETCH = "intersection";
+  private static final String EXCLUDE_SKETCH = "exclude";
+  private static final String GET_K = "getK";
+  private static final String GET_FREQUENT_ITEMS = "getFrequentItems";
+  private static final String T_TEST = "TTest";
+  private static final String SKETCH_TO_MEANS = "sketchtomeans";
+  private static final String SKETCH_TO_NUMBER_OF_RETAINED_ENTRIES = "sketchtonumberofretainedentries";
 
 Review comment:
   I right now think that we could change this to:
   * `get_n_retained`
   * `n_retained`
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org