You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by GitBox <gi...@apache.org> on 2020/03/26 13:48:20 UTC

[GitHub] [drill] cgivre opened a new pull request #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps

cgivre opened a new pull request #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps
URL: https://github.com/apache/drill/pull/2040
 
 
   # [DRILL-7668](https://issues.apache.org/jira/browse/DRILL-7668): Allow Time Bucket Function to Accept Floats and Timestamps
   
   ## Description
   
   Drill has a function `time_bucket()` which facilitates time series analysis. This PR extends this function to accept `FLOAT8` and `TIMESTAMPS` as input.  Floats are typically not used for timestamps, however in the event that the data is coming from imperfect files, the numbers may be read as floats and hence require casting in queries.  This PR makes this easier. 
   
   ## Documentation
   `time_bucket()` function now will accept a `timestamp` or `float8` as an argument for the time.
   
   ## Testing
   Added two unit tests for the new data types.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] cgivre commented on a change in pull request #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps

Posted by GitBox <gi...@apache.org>.
cgivre commented on a change in pull request #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps
URL: https://github.com/apache/drill/pull/2040#discussion_r398790952
 
 

 ##########
 File path: contrib/udfs/src/main/java/org/apache/drill/exec/udfs/TimeBucketFunctions.java
 ##########
 @@ -102,4 +104,80 @@ public void eval() {
       out.value = timestamp - (timestamp % intervalToAdd);
     }
   }
+
+  /**
+   * This function is used for facilitating time series analysis by creating buckets of time intervals.  See
+   * https://blog.timescale.com/blog/simplified-time-series-analytics-using-the-time_bucket-function/ for usage. The function takes two arguments:
+   * 1. The timestamp (as a Drill timestamp)
+   * 2. The desired bucket interval IN milliseconds
+   *
+   * The function returns a BIGINT of the nearest time bucket.
+   */
+  @FunctionTemplate(name = "time_bucket",
+    scope = FunctionTemplate.FunctionScope.SIMPLE,
+    nulls = FunctionTemplate.NullHandling.NULL_IF_NULL)
+  public static class TimestampTimeBucketFunction implements DrillSimpleFunc {
+
+    @Param
+    TimeStampHolder inputDate;
+
+    @Param
+    BigIntHolder interval;
+
+    @Output
+    BigIntHolder out;
+
+    @Override
+    public void setup() {
+    }
+
+    @Override
+    public void eval() {
+      // Get the timestamp in milliseconds
+      long timestamp = inputDate.value;
+
+      // Get the interval in milliseconds
+      long intervalToAdd = interval.value;
+
+      out.value = timestamp - (timestamp % intervalToAdd);
+    }
+  }
+
+  /**
+   * This function is used for facilitating time series analysis by creating buckets of time intervals.  See
+   * https://blog.timescale.com/blog/simplified-time-series-analytics-using-the-time_bucket-function/ for usage. The function takes two arguments:
+   * 1. The timestamp (as a Drill timestamp)
+   * 2. The desired bucket interval IN milliseconds
+   *
+   * The function returns a BIGINT of the nearest time bucket.
+   */
+  @FunctionTemplate(name = "time_bucket",
+    scope = FunctionTemplate.FunctionScope.SIMPLE,
+    nulls = FunctionTemplate.NullHandling.NULL_IF_NULL)
+  public static class DoubleTimeBucketFunction implements DrillSimpleFunc {
+
+    @Param
+    Float8Holder inputDate;
+
+    @Param
+    BigIntHolder interval;
+
+    @Output
+    BigIntHolder out;
+
+    @Override
+    public void setup() {
+    }
+
+    @Override
+    public void eval() {
+      // Get the timestamp in milliseconds
+      long timestamp = (long)inputDate.value;
 
 Review comment:
   Fixed

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] vvysotskyi commented on a change in pull request #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps

Posted by GitBox <gi...@apache.org>.
vvysotskyi commented on a change in pull request #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps
URL: https://github.com/apache/drill/pull/2040#discussion_r403849368
 
 

 ##########
 File path: contrib/udfs/src/main/java/org/apache/drill/exec/udfs/TimeBucketFunctions.java
 ##########
 @@ -97,9 +99,86 @@ public void eval() {
       long timestamp = inputDate.value;
 
       // Get the interval in milliseconds
-      long intervalToAdd = interval.value;
+      long groupByInterval = interval.value;
 
-      out.value = timestamp - (timestamp % intervalToAdd);
+      out.value = timestamp - (timestamp % groupByInterval);
+    }
+  }
+
+  /**
+   * This function is used for facilitating time series analysis by creating buckets of time intervals.  See
+   * https://blog.timescale.com/blog/simplified-time-series-analytics-using-the-time_bucket-function/ for usage. The function takes two arguments:
+   * 1. The timestamp (as a Drill timestamp)
+   * 2. The desired bucket interval IN milliseconds
+   *
+   * The function returns a BIGINT of the nearest time bucket.
+   */
+  @FunctionTemplate(name = "time_bucket",
+    scope = FunctionTemplate.FunctionScope.SIMPLE,
+    nulls = FunctionTemplate.NullHandling.NULL_IF_NULL)
+  public static class TimestampTimeBucketFunction implements DrillSimpleFunc {
+
+    @Param
+    TimeStampHolder inputDate;
+
+    @Param
+    BigIntHolder interval;
+
+    @Output
+    TimeStampHolder out;
+
+    @Override
+    public void setup() {
+    }
+
+    @Override
+    public void eval() {
+      // Get the timestamp in milliseconds
+      long timestamp = inputDate.value;
+
+      // Get the interval in milliseconds
+      long groupByInterval = interval.value;
+
+      java.time.Instant instant = java.time.Instant.ofEpochMilli(timestamp - (timestamp % groupByInterval));
 
 Review comment:
   Looks like creating `Instant` here may be also omitted.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] cgivre commented on a change in pull request #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps

Posted by GitBox <gi...@apache.org>.
cgivre commented on a change in pull request #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps
URL: https://github.com/apache/drill/pull/2040#discussion_r398791038
 
 

 ##########
 File path: contrib/udfs/src/main/java/org/apache/drill/exec/udfs/TimeBucketFunctions.java
 ##########
 @@ -102,4 +104,80 @@ public void eval() {
       out.value = timestamp - (timestamp % intervalToAdd);
     }
   }
+
+  /**
+   * This function is used for facilitating time series analysis by creating buckets of time intervals.  See
+   * https://blog.timescale.com/blog/simplified-time-series-analytics-using-the-time_bucket-function/ for usage. The function takes two arguments:
+   * 1. The timestamp (as a Drill timestamp)
+   * 2. The desired bucket interval IN milliseconds
+   *
+   * The function returns a BIGINT of the nearest time bucket.
+   */
+  @FunctionTemplate(name = "time_bucket",
+    scope = FunctionTemplate.FunctionScope.SIMPLE,
+    nulls = FunctionTemplate.NullHandling.NULL_IF_NULL)
+  public static class TimestampTimeBucketFunction implements DrillSimpleFunc {
+
+    @Param
+    TimeStampHolder inputDate;
+
+    @Param
+    BigIntHolder interval;
+
+    @Output
+    BigIntHolder out;
+
+    @Override
+    public void setup() {
+    }
+
+    @Override
+    public void eval() {
+      // Get the timestamp in milliseconds
+      long timestamp = inputDate.value;
+
+      // Get the interval in milliseconds
+      long intervalToAdd = interval.value;
 
 Review comment:
   Fixed

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] cgivre commented on a change in pull request #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps

Posted by GitBox <gi...@apache.org>.
cgivre commented on a change in pull request #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps
URL: https://github.com/apache/drill/pull/2040#discussion_r404339305
 
 

 ##########
 File path: contrib/udfs/src/main/java/org/apache/drill/exec/udfs/TimeBucketFunctions.java
 ##########
 @@ -97,9 +99,86 @@ public void eval() {
       long timestamp = inputDate.value;
 
       // Get the interval in milliseconds
-      long intervalToAdd = interval.value;
+      long groupByInterval = interval.value;
 
-      out.value = timestamp - (timestamp % intervalToAdd);
+      out.value = timestamp - (timestamp % groupByInterval);
+    }
+  }
+
+  /**
+   * This function is used for facilitating time series analysis by creating buckets of time intervals.  See
+   * https://blog.timescale.com/blog/simplified-time-series-analytics-using-the-time_bucket-function/ for usage. The function takes two arguments:
+   * 1. The timestamp (as a Drill timestamp)
+   * 2. The desired bucket interval IN milliseconds
+   *
+   * The function returns a BIGINT of the nearest time bucket.
+   */
+  @FunctionTemplate(name = "time_bucket",
+    scope = FunctionTemplate.FunctionScope.SIMPLE,
+    nulls = FunctionTemplate.NullHandling.NULL_IF_NULL)
+  public static class TimestampTimeBucketFunction implements DrillSimpleFunc {
+
+    @Param
+    TimeStampHolder inputDate;
+
+    @Param
+    BigIntHolder interval;
+
+    @Output
+    TimeStampHolder out;
+
+    @Override
+    public void setup() {
+    }
+
+    @Override
+    public void eval() {
+      // Get the timestamp in milliseconds
+      long timestamp = inputDate.value;
+
+      // Get the interval in milliseconds
+      long groupByInterval = interval.value;
+
+      java.time.Instant instant = java.time.Instant.ofEpochMilli(timestamp - (timestamp % groupByInterval));
 
 Review comment:
   Done.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] cgivre commented on a change in pull request #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps

Posted by GitBox <gi...@apache.org>.
cgivre commented on a change in pull request #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps
URL: https://github.com/apache/drill/pull/2040#discussion_r403792155
 
 

 ##########
 File path: contrib/udfs/src/test/java/org/apache/drill/exec/udfs/TestTimeBucketFunction.java
 ##########
 @@ -92,6 +94,28 @@ public void testTimeBucket() throws Exception {
       .go();
   }
 
+  @Test
+  public void testDoubleTimeBucket() throws Exception {
+    String query = "SELECT time_bucket(CAST(1451606760 AS DOUBLE), 300000) AS high FROM (values(1))";
+    testBuilder()
+      .sqlQuery(query)
+      .ordered()
+      .baselineColumns("high")
+      .baselineValues(1451400000L)
+      .go();
+  }
+
+  @Test
+  public void testTimeBucketTimestamp() throws Exception {
+    String query = "SELECT time_bucket(CAST(1585272833845 AS TIMESTAMP), 300000) AS high FROM (values(1))";
 
 Review comment:
   Removed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] asfgit closed pull request #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps
URL: https://github.com/apache/drill/pull/2040
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] vvysotskyi commented on a change in pull request #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps

Posted by GitBox <gi...@apache.org>.
vvysotskyi commented on a change in pull request #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps
URL: https://github.com/apache/drill/pull/2040#discussion_r403725422
 
 

 ##########
 File path: contrib/udfs/src/main/java/org/apache/drill/exec/udfs/TimeBucketFunctions.java
 ##########
 @@ -97,9 +99,88 @@ public void eval() {
       long timestamp = inputDate.value;
 
       // Get the interval in milliseconds
-      long intervalToAdd = interval.value;
+      long groupByInterval = interval.value;
 
-      out.value = timestamp - (timestamp % intervalToAdd);
+      out.value = timestamp - (timestamp % groupByInterval);
+    }
+  }
+
+  /**
+   * This function is used for facilitating time series analysis by creating buckets of time intervals.  See
+   * https://blog.timescale.com/blog/simplified-time-series-analytics-using-the-time_bucket-function/ for usage. The function takes two arguments:
+   * 1. The timestamp (as a Drill timestamp)
+   * 2. The desired bucket interval IN milliseconds
+   *
+   * The function returns a BIGINT of the nearest time bucket.
+   */
+  @FunctionTemplate(name = "time_bucket",
+    scope = FunctionTemplate.FunctionScope.SIMPLE,
+    nulls = FunctionTemplate.NullHandling.NULL_IF_NULL)
+  public static class TimestampTimeBucketFunction implements DrillSimpleFunc {
+
+    @Param
+    TimeStampHolder inputDate;
+
+    @Param
+    BigIntHolder interval;
+
+    @Output
+    TimeStampHolder out;
+
+    @Override
+    public void setup() {
+    }
+
+    @Override
+    public void eval() {
+      // Get the timestamp in milliseconds
+      long timestamp = inputDate.value;
+
+      // Get the interval in milliseconds
+      long groupByInterval = interval.value;
+
+      java.time.Instant instant = java.time.Instant.ofEpochMilli(timestamp - (timestamp % groupByInterval));
+      java.time.LocalDateTime localDate = instant.atZone(java.time.ZoneId.of("UTC")).toLocalDateTime();
+
+      out.value = localDate.atZone(java.time.ZoneId.of("UTC")).toInstant().toEpochMilli();
 
 Review comment:
   @cgivre, could you please explain, what happens here? Initially, you calculate the required milliseconds, after that creates `Instant` instance based on that, converts it to `LocalDateTime` at `UTC` timezone, converts it to `ZonedDateTime`, converts it to `Instant` and after that converts back to milliseconds.
   
   Are all these transformations required? Usually, UDF shouldn't apply timezone to the values they handle.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] cgivre commented on issue #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps

Posted by GitBox <gi...@apache.org>.
cgivre commented on issue #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps
URL: https://github.com/apache/drill/pull/2040#issuecomment-604990458
 
 
   @paul-rogers Thank you very much for the review. 
   
   @arina-ielchiieva  Commits are squashed.  We should be ready to commit. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] cgivre commented on a change in pull request #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps

Posted by GitBox <gi...@apache.org>.
cgivre commented on a change in pull request #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps
URL: https://github.com/apache/drill/pull/2040#discussion_r398987931
 
 

 ##########
 File path: contrib/udfs/src/main/java/org/apache/drill/exec/udfs/TimeBucketFunctions.java
 ##########
 @@ -97,9 +99,85 @@ public void eval() {
       long timestamp = inputDate.value;
 
       // Get the interval in milliseconds
-      long intervalToAdd = interval.value;
+      long groupByInterval = interval.value;
 
-      out.value = timestamp - (timestamp % intervalToAdd);
+      out.value = timestamp - (timestamp % groupByInterval);
+    }
+  }
+
+  /**
+   * This function is used for facilitating time series analysis by creating buckets of time intervals.  See
+   * https://blog.timescale.com/blog/simplified-time-series-analytics-using-the-time_bucket-function/ for usage. The function takes two arguments:
+   * 1. The timestamp (as a Drill timestamp)
+   * 2. The desired bucket interval IN milliseconds
+   *
+   * The function returns a BIGINT of the nearest time bucket.
+   */
+  @FunctionTemplate(name = "time_bucket",
+    scope = FunctionTemplate.FunctionScope.SIMPLE,
+    nulls = FunctionTemplate.NullHandling.NULL_IF_NULL)
+  public static class TimestampTimeBucketFunction implements DrillSimpleFunc {
+
+    @Param
+    TimeStampHolder inputDate;
+
+    @Param
+    BigIntHolder interval;
+
+    @Output
+    BigIntHolder out;
 
 Review comment:
   Fixed

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] vvysotskyi commented on issue #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps

Posted by GitBox <gi...@apache.org>.
vvysotskyi commented on issue #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps
URL: https://github.com/apache/drill/pull/2040#issuecomment-609444297
 
 
   @cgivre, @arina-ielchiieva, sorry for the delay, I have missed a letter about this PR earlier.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] vvysotskyi commented on issue #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps

Posted by GitBox <gi...@apache.org>.
vvysotskyi commented on issue #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps
URL: https://github.com/apache/drill/pull/2040#issuecomment-606196720
 
 
   @cgivre, this pull request introduces unit tests failure:
   ```
   [ERROR] org.apache.drill.exec.udfs.TestTimeBucketFunction.testTimeBucketTimestamp  Time elapsed: 2.584 s  <<< ERROR!
   java.lang.Exception: 
   at position 0 column '`high`' mismatched values, expected: 2020-03-27T01:30(LocalDateTime) but received 2020-03-26T18:30(LocalDateTime)
   
   Expected Records near verification failure:
   Record Number: 0 { `high` : 2020-03-27T01:30, }
   
   
   Actual Records near verification failure:
   Record Number: 0 { `high` : 2020-03-26T18:30, }
   
   For query: SELECT time_bucket(CAST(1585272833845 AS TIMESTAMP), 300000) AS high FROM (values(1))
   	at org.apache.drill.exec.udfs.TestTimeBucketFunction.testTimeBucketTimestamp(TestTimeBucketFunction.java:116)
   Caused by: java.lang.Exception: 
   at position 0 column '`high`' mismatched values, expected: 2020-03-27T01:30(LocalDateTime) but received 2020-03-26T18:30(LocalDateTime)
   
   Expected Records near verification failure:
   Record Number: 0 { `high` : 2020-03-27T01:30, }
   
   
   Actual Records near verification failure:
   Record Number: 0 { `high` : 2020-03-26T18:30, }
   
   	at org.apache.drill.exec.udfs.TestTimeBucketFunction.testTimeBucketTimestamp(TestTimeBucketFunction.java:116)
   Caused by: java.lang.Exception: at position 0 column '`high`' mismatched values, expected: 2020-03-27T01:30(LocalDateTime) but received 2020-03-26T18:30(LocalDateTime)
   	at org.apache.drill.exec.udfs.TestTimeBucketFunction.testTimeBucketTimestamp(TestTimeBucketFunction.java:116)
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] cgivre commented on issue #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps

Posted by GitBox <gi...@apache.org>.
cgivre commented on issue #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps
URL: https://github.com/apache/drill/pull/2040#issuecomment-609528521
 
 
   @vvysotskyi 
   Thanks for the review.  I made the requested changes to the PR.  If this is ok, I'll squash commits. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] cgivre commented on a change in pull request #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps

Posted by GitBox <gi...@apache.org>.
cgivre commented on a change in pull request #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps
URL: https://github.com/apache/drill/pull/2040#discussion_r403792240
 
 

 ##########
 File path: contrib/udfs/src/main/java/org/apache/drill/exec/udfs/TimeBucketFunctions.java
 ##########
 @@ -97,9 +99,88 @@ public void eval() {
       long timestamp = inputDate.value;
 
       // Get the interval in milliseconds
-      long intervalToAdd = interval.value;
+      long groupByInterval = interval.value;
 
-      out.value = timestamp - (timestamp % intervalToAdd);
+      out.value = timestamp - (timestamp % groupByInterval);
+    }
+  }
+
+  /**
+   * This function is used for facilitating time series analysis by creating buckets of time intervals.  See
+   * https://blog.timescale.com/blog/simplified-time-series-analytics-using-the-time_bucket-function/ for usage. The function takes two arguments:
+   * 1. The timestamp (as a Drill timestamp)
+   * 2. The desired bucket interval IN milliseconds
+   *
+   * The function returns a BIGINT of the nearest time bucket.
+   */
+  @FunctionTemplate(name = "time_bucket",
+    scope = FunctionTemplate.FunctionScope.SIMPLE,
+    nulls = FunctionTemplate.NullHandling.NULL_IF_NULL)
+  public static class TimestampTimeBucketFunction implements DrillSimpleFunc {
+
+    @Param
+    TimeStampHolder inputDate;
+
+    @Param
+    BigIntHolder interval;
+
+    @Output
+    TimeStampHolder out;
+
+    @Override
+    public void setup() {
+    }
+
+    @Override
+    public void eval() {
+      // Get the timestamp in milliseconds
+      long timestamp = inputDate.value;
+
+      // Get the interval in milliseconds
+      long groupByInterval = interval.value;
+
+      java.time.Instant instant = java.time.Instant.ofEpochMilli(timestamp - (timestamp % groupByInterval));
+      java.time.LocalDateTime localDate = instant.atZone(java.time.ZoneId.of("UTC")).toLocalDateTime();
+
+      out.value = localDate.atZone(java.time.ZoneId.of("UTC")).toInstant().toEpochMilli();
 
 Review comment:
   Nope.  Not sure why I did that, but I simplified and tested this with a few different timezones on my computer. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] vvysotskyi edited a comment on issue #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps

Posted by GitBox <gi...@apache.org>.
vvysotskyi edited a comment on issue #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps
URL: https://github.com/apache/drill/pull/2040#issuecomment-606196720
 
 
   @cgivre, this pull request introduces unit tests failure:
   ```
   [ERROR] org.apache.drill.exec.udfs.TestTimeBucketFunction.testTimeBucketTimestamp  Time elapsed: 2.584 s  <<< ERROR!
   java.lang.Exception: 
   at position 0 column '`high`' mismatched values, expected: 2020-03-27T01:30(LocalDateTime) but received 2020-03-26T18:30(LocalDateTime)
   
   Expected Records near verification failure:
   Record Number: 0 { `high` : 2020-03-27T01:30, }
   
   
   Actual Records near verification failure:
   Record Number: 0 { `high` : 2020-03-26T18:30, }
   
   For query: SELECT time_bucket(CAST(1585272833845 AS TIMESTAMP), 300000) AS high FROM (values(1))
   	at org.apache.drill.exec.udfs.TestTimeBucketFunction.testTimeBucketTimestamp(TestTimeBucketFunction.java:116)
   Caused by: java.lang.Exception: 
   at position 0 column '`high`' mismatched values, expected: 2020-03-27T01:30(LocalDateTime) but received 2020-03-26T18:30(LocalDateTime)
   
   Expected Records near verification failure:
   Record Number: 0 { `high` : 2020-03-27T01:30, }
   
   
   Actual Records near verification failure:
   Record Number: 0 { `high` : 2020-03-26T18:30, }
   
   	at org.apache.drill.exec.udfs.TestTimeBucketFunction.testTimeBucketTimestamp(TestTimeBucketFunction.java:116)
   Caused by: java.lang.Exception: at position 0 column '`high`' mismatched values, expected: 2020-03-27T01:30(LocalDateTime) but received 2020-03-26T18:30(LocalDateTime)
   	at org.apache.drill.exec.udfs.TestTimeBucketFunction.testTimeBucketTimestamp(TestTimeBucketFunction.java:116)
   ```
   Looks like it may be timezone-dependent failure.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] arina-ielchiieva commented on issue #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps

Posted by GitBox <gi...@apache.org>.
arina-ielchiieva commented on issue #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps
URL: https://github.com/apache/drill/pull/2040#issuecomment-609407971
 
 
   @vvysotskyi is this PR ready to be merged?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] paul-rogers commented on a change in pull request #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps

Posted by GitBox <gi...@apache.org>.
paul-rogers commented on a change in pull request #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps
URL: https://github.com/apache/drill/pull/2040#discussion_r398732279
 
 

 ##########
 File path: contrib/udfs/src/main/java/org/apache/drill/exec/udfs/TimeBucketFunctions.java
 ##########
 @@ -102,4 +104,80 @@ public void eval() {
       out.value = timestamp - (timestamp % intervalToAdd);
     }
   }
+
+  /**
+   * This function is used for facilitating time series analysis by creating buckets of time intervals.  See
+   * https://blog.timescale.com/blog/simplified-time-series-analytics-using-the-time_bucket-function/ for usage. The function takes two arguments:
+   * 1. The timestamp (as a Drill timestamp)
+   * 2. The desired bucket interval IN milliseconds
+   *
+   * The function returns a BIGINT of the nearest time bucket.
+   */
+  @FunctionTemplate(name = "time_bucket",
+    scope = FunctionTemplate.FunctionScope.SIMPLE,
+    nulls = FunctionTemplate.NullHandling.NULL_IF_NULL)
+  public static class TimestampTimeBucketFunction implements DrillSimpleFunc {
+
+    @Param
+    TimeStampHolder inputDate;
+
+    @Param
+    BigIntHolder interval;
+
+    @Output
+    BigIntHolder out;
+
+    @Override
+    public void setup() {
+    }
+
+    @Override
+    public void eval() {
+      // Get the timestamp in milliseconds
+      long timestamp = inputDate.value;
+
+      // Get the interval in milliseconds
+      long intervalToAdd = interval.value;
+
+      out.value = timestamp - (timestamp % intervalToAdd);
+    }
+  }
+
+  /**
+   * This function is used for facilitating time series analysis by creating buckets of time intervals.  See
+   * https://blog.timescale.com/blog/simplified-time-series-analytics-using-the-time_bucket-function/ for usage. The function takes two arguments:
+   * 1. The timestamp (as a Drill timestamp)
+   * 2. The desired bucket interval IN milliseconds
+   *
+   * The function returns a BIGINT of the nearest time bucket.
+   */
+  @FunctionTemplate(name = "time_bucket",
+    scope = FunctionTemplate.FunctionScope.SIMPLE,
+    nulls = FunctionTemplate.NullHandling.NULL_IF_NULL)
+  public static class DoubleTimeBucketFunction implements DrillSimpleFunc {
+
+    @Param
+    Float8Holder inputDate;
+
+    @Param
+    BigIntHolder interval;
+
+    @Output
+    BigIntHolder out;
+
+    @Override
+    public void setup() {
+    }
+
+    @Override
+    public void eval() {
+      // Get the timestamp in milliseconds
+      long timestamp = (long)inputDate.value;
 
 Review comment:
   `Math.round(inputDate.value)`. We would want 4.99999999.... to be treated as 5.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] cgivre commented on issue #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps

Posted by GitBox <gi...@apache.org>.
cgivre commented on issue #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps
URL: https://github.com/apache/drill/pull/2040#issuecomment-606700977
 
 
   @vvysotskyi 
   Thanks for checking this.  I made a small fix and ran the unit tests on my local machine with the system set to different timezones, and they passed.  Please let me know if this solves the issue and I'll squash commits. 
   Thanks

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] paul-rogers commented on a change in pull request #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps

Posted by GitBox <gi...@apache.org>.
paul-rogers commented on a change in pull request #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps
URL: https://github.com/apache/drill/pull/2040#discussion_r398954740
 
 

 ##########
 File path: contrib/udfs/src/main/java/org/apache/drill/exec/udfs/TimeBucketFunctions.java
 ##########
 @@ -97,9 +99,85 @@ public void eval() {
       long timestamp = inputDate.value;
 
       // Get the interval in milliseconds
-      long intervalToAdd = interval.value;
+      long groupByInterval = interval.value;
 
-      out.value = timestamp - (timestamp % intervalToAdd);
+      out.value = timestamp - (timestamp % groupByInterval);
+    }
+  }
+
+  /**
+   * This function is used for facilitating time series analysis by creating buckets of time intervals.  See
+   * https://blog.timescale.com/blog/simplified-time-series-analytics-using-the-time_bucket-function/ for usage. The function takes two arguments:
+   * 1. The timestamp (as a Drill timestamp)
+   * 2. The desired bucket interval IN milliseconds
+   *
+   * The function returns a BIGINT of the nearest time bucket.
+   */
+  @FunctionTemplate(name = "time_bucket",
+    scope = FunctionTemplate.FunctionScope.SIMPLE,
+    nulls = FunctionTemplate.NullHandling.NULL_IF_NULL)
+  public static class TimestampTimeBucketFunction implements DrillSimpleFunc {
+
+    @Param
+    TimeStampHolder inputDate;
+
+    @Param
+    BigIntHolder interval;
+
+    @Output
+    BigIntHolder out;
 
 Review comment:
   Sorry; I wonder if the `TimeStamp` version should emit a timestamp. The interval is a valid ms-since-the-epoch number; seems to make sense to truncate a TS to another TS.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] vvysotskyi commented on a change in pull request #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps

Posted by GitBox <gi...@apache.org>.
vvysotskyi commented on a change in pull request #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps
URL: https://github.com/apache/drill/pull/2040#discussion_r403733469
 
 

 ##########
 File path: contrib/udfs/src/test/java/org/apache/drill/exec/udfs/TestTimeBucketFunction.java
 ##########
 @@ -92,6 +94,28 @@ public void testTimeBucket() throws Exception {
       .go();
   }
 
+  @Test
+  public void testDoubleTimeBucket() throws Exception {
+    String query = "SELECT time_bucket(CAST(1451606760 AS DOUBLE), 300000) AS high FROM (values(1))";
+    testBuilder()
+      .sqlQuery(query)
+      .ordered()
+      .baselineColumns("high")
+      .baselineValues(1451400000L)
+      .go();
+  }
+
+  @Test
+  public void testTimeBucketTimestamp() throws Exception {
+    String query = "SELECT time_bucket(CAST(1585272833845 AS TIMESTAMP), 300000) AS high FROM (values(1))";
 
 Review comment:
   Could you please replace it with string representation, it looks more friendly and helps to understand the source value:
   ```suggestion
       String query = "SELECT time_bucket(timestamp '2020-03-27 01:33:53.845', 300000) AS high";
   ```
   
   Also, there is no need to specify `FROM (values(1))` if it is not used.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] cgivre commented on a change in pull request #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps

Posted by GitBox <gi...@apache.org>.
cgivre commented on a change in pull request #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps
URL: https://github.com/apache/drill/pull/2040#discussion_r398982654
 
 

 ##########
 File path: contrib/udfs/src/main/java/org/apache/drill/exec/udfs/TimeBucketFunctions.java
 ##########
 @@ -97,9 +99,85 @@ public void eval() {
       long timestamp = inputDate.value;
 
       // Get the interval in milliseconds
-      long intervalToAdd = interval.value;
+      long groupByInterval = interval.value;
 
-      out.value = timestamp - (timestamp % intervalToAdd);
+      out.value = timestamp - (timestamp % groupByInterval);
+    }
+  }
+
+  /**
+   * This function is used for facilitating time series analysis by creating buckets of time intervals.  See
+   * https://blog.timescale.com/blog/simplified-time-series-analytics-using-the-time_bucket-function/ for usage. The function takes two arguments:
+   * 1. The timestamp (as a Drill timestamp)
+   * 2. The desired bucket interval IN milliseconds
+   *
+   * The function returns a BIGINT of the nearest time bucket.
+   */
+  @FunctionTemplate(name = "time_bucket",
+    scope = FunctionTemplate.FunctionScope.SIMPLE,
+    nulls = FunctionTemplate.NullHandling.NULL_IF_NULL)
+  public static class TimestampTimeBucketFunction implements DrillSimpleFunc {
+
+    @Param
+    TimeStampHolder inputDate;
+
+    @Param
+    BigIntHolder interval;
+
+    @Output
+    BigIntHolder out;
 
 Review comment:
   I was thinking the same thing actually... of course right after I submitted.  I'll convert to TS.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] paul-rogers commented on a change in pull request #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps

Posted by GitBox <gi...@apache.org>.
paul-rogers commented on a change in pull request #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps
URL: https://github.com/apache/drill/pull/2040#discussion_r398731150
 
 

 ##########
 File path: contrib/udfs/src/main/java/org/apache/drill/exec/udfs/TimeBucketFunctions.java
 ##########
 @@ -102,4 +104,80 @@ public void eval() {
       out.value = timestamp - (timestamp % intervalToAdd);
     }
   }
+
+  /**
+   * This function is used for facilitating time series analysis by creating buckets of time intervals.  See
+   * https://blog.timescale.com/blog/simplified-time-series-analytics-using-the-time_bucket-function/ for usage. The function takes two arguments:
+   * 1. The timestamp (as a Drill timestamp)
+   * 2. The desired bucket interval IN milliseconds
+   *
+   * The function returns a BIGINT of the nearest time bucket.
+   */
+  @FunctionTemplate(name = "time_bucket",
+    scope = FunctionTemplate.FunctionScope.SIMPLE,
+    nulls = FunctionTemplate.NullHandling.NULL_IF_NULL)
+  public static class TimestampTimeBucketFunction implements DrillSimpleFunc {
+
+    @Param
+    TimeStampHolder inputDate;
+
+    @Param
+    BigIntHolder interval;
+
+    @Output
+    BigIntHolder out;
+
+    @Override
+    public void setup() {
+    }
+
+    @Override
+    public void eval() {
+      // Get the timestamp in milliseconds
+      long timestamp = inputDate.value;
+
+      // Get the interval in milliseconds
+      long intervalToAdd = interval.value;
 
 Review comment:
   This is not an interval to "add" so much as a group-by interval.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] cgivre commented on issue #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps

Posted by GitBox <gi...@apache.org>.
cgivre commented on issue #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps
URL: https://github.com/apache/drill/pull/2040#issuecomment-610011770
 
 
   Thanks @vvysotskyi for the review.
   Commits squashed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] arina-ielchiieva commented on issue #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps

Posted by GitBox <gi...@apache.org>.
arina-ielchiieva commented on issue #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps
URL: https://github.com/apache/drill/pull/2040#issuecomment-604981406
 
 
   @cgivre please squash the commits.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services