You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@drill.apache.org by "Charles Givre (Jira)" <ji...@apache.org> on 2020/03/20 13:14:00 UTC

[jira] [Created] (DRILL-7652) Add time_bucket() function for time series analysis.

Charles Givre created DRILL-7652:
------------------------------------

             Summary: Add time_bucket() function for time series analysis.
                 Key: DRILL-7652
                 URL: https://issues.apache.org/jira/browse/DRILL-7652
             Project: Apache Drill
          Issue Type: Improvement
    Affects Versions: 1.17.0
            Reporter: Charles Givre
             Fix For: 1.18.0


These functions are useful for doing time series analysis by grouping the data into arbitrary intervals. See: https://blog.timescale.com/blog/simplified-time-series-analytics
-using-the-time_bucket-function/ for more examples. 

There are two versions of the function:
* `time_bucket(<timestamp>, <interval>)`
* `time_bucket_ns(<timestamp>,<interval>)`

Both functions accept a `BIGINT` timestamp and an interval in milliseconds as arguments. The `time_bucket_ns()` function accepts timestamps in nanoseconds and `time_bucket
()` accepts timestamps in milliseconds. Both return timestamps in the original format.

### Example:
The query below calculates the average for the `cpu` metric for every five minute interval.

```sql
SELECT time_bucket(time_stamp, 30000) AS five_min, avg(cpu)
 FROM metrics
 GROUP BY five_min
 ORDER BY five_min DESC LIMIT 12;
```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)