You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/03/20 13:19:00 UTC

[jira] [Commented] (DRILL-7652) Add time_bucket() Function for Time Series Analysis

    [ https://issues.apache.org/jira/browse/DRILL-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063351#comment-17063351 ] 

ASF GitHub Bot commented on DRILL-7652:
---------------------------------------

cgivre commented on pull request #2033: DRILL-7652: Add time_bucket() function for time series analysis
URL: https://github.com/apache/drill/pull/2033
 
 
   # [DRILL-7652](https://issues.apache.org/jira/browse/DRILL-7652): Add time_bucket() function for Time Series Analysis
   
   ## Description
   
   This PR adds two UDFs which facilitate time series analysis.  This PR also includes updates to the `README.md` in the `contrib/udf` folder to reflect the new UDF.
   
   ## Documentation
   These functions are useful for doing time series analysis by grouping the data into arbitrary intervals.  See: https://blog.timescale.com/blog/simplified-time-series-analytics
   -using-the-time_bucket-function/ for more examples. 
   
   There are two versions of the function:
   * `time_bucket(<timestamp>, <interval>)`
   * `time_bucket_ns(<timestamp>,<interval>)`
   
   Both functions accept a `BIGINT` timestamp and an interval in milliseconds as arguments. The `time_bucket_ns()` function accepts timestamps in nanoseconds and `time_bucket
   ()` accepts timestamps in milliseconds.  Both return timestamps in the original format.
   
   ### Example:
   The query below calculates the average for the `cpu` metric for every five minute interval.
   
   ```sql
   SELECT time_bucket(time_stamp, 30000) AS five_min, avg(cpu)
     FROM metrics
     GROUP BY five_min
     ORDER BY five_min DESC LIMIT 12;
   ```
   
   ## Testing
   There are a series of unit tests included with this PR.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Add time_bucket() Function for Time Series Analysis
> ---------------------------------------------------
>
>                 Key: DRILL-7652
>                 URL: https://issues.apache.org/jira/browse/DRILL-7652
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: 1.17.0
>            Reporter: Charles Givre
>            Priority: Major
>             Fix For: 1.18.0
>
>
> These functions are useful for doing time series analysis by grouping the data into arbitrary intervals. See: https://blog.timescale.com/blog/simplified-time-series-analytics
> -using-the-time_bucket-function/ for more examples. 
> There are two versions of the function:
> * `time_bucket(<timestamp>, <interval>)`
> * `time_bucket_ns(<timestamp>,<interval>)`
> Both functions accept a `BIGINT` timestamp and an interval in milliseconds as arguments. The `time_bucket_ns()` function accepts timestamps in nanoseconds and `time_bucket
> ()` accepts timestamps in milliseconds. Both return timestamps in the original format.
> ### Example:
> The query below calculates the average for the `cpu` metric for every five minute interval.
> ```sql
> SELECT time_bucket(time_stamp, 30000) AS five_min, avg(cpu)
>  FROM metrics
>  GROUP BY five_min
>  ORDER BY five_min DESC LIMIT 12;
> ```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)