You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/01/10 21:12:09 UTC
[GitHub] [pinot] weixiangsun commented on issue #7422: Support aggregate functions to better handle timeseries data
weixiangsun commented on issue #7422:
URL: https://github.com/apache/pinot/issues/7422#issuecomment-1009346917
Here is the document for Pre-Aggregation Gapfilling function design: https://docs.google.com/document/d/1FQbsTVywWLlmFaeUcYqDkb-XRWCpVhZe10BZyD2VzcU/edit#heading=h.brbishq9xbz1
@siddharthteotia @jackjlli @amrishlal @lakshmanan-v and I had couple of rounds of design meetings. Thanks a lot for the effort!
Here is the meeting minutes:
1. We went through the design document and discussed the query design and implementation design. We reached the consensus that the implementation looks good.
2. We debated about the query design. We came out two feasible options:
flat query option:
SELECT
PREAGGREGATEGAPFILL(DATETIMECONVERT(event_time, '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '15:MINUTES'), '1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss.SSS', '2021-10-01 09:00:00+00', '2021-10-01 12:00:00+00', '15:MINUTES', TIME_SERIES_ON(eventTime, lotId), FILTERING(is_occupied = 1), FILL(is_occupied,FILL_TYPE_PREVIOUS)) AS time_col,
SUM(is_occupied) AS occupied_slots_count
FROM parking_data
WHERE event_time >= '10/01/2021 09:00:00+00' AND event_time <= '10/01/2021 12:00:00+00'
GROUP BY time_col
ORDER BY time_col
and subquery option:
SELECT time_col, SUM(is_occupied) AS occupied_slots_count
FROM (
SELECT PREAGGREGATEGAPFILL(DATETIMECONVERT(event_time, '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '15:MINUTES'), '1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss.SSS', '2021-10-01 09:00:00+00', '2021-10-01 12:00:00+00', '15:MINUTES', TIME_SERIES_ON(eventTime, lotId), FILL(is_occupied,FILL_TYPE_PREVIOUS)) AS time_col, is_occupied
FROM parking_data
WHERE event_time >= '10/01/2021 09:00:00+00' AND event_time <= '10/01/2021 12:00:00+00')
WHERE isOccupied = 1
GROUP BY time_col
ORDER BY time_col
As for the first option (flat query option), the PreAggregateGapFill function contains the hint about the pre-aggregate gapfilling. Everything about pre-aggregation is contained inside this function. But its problem is that the first selection expression will generate more rows. The aggregation selection expression will reduce the number of rows. This is counterintuitive. The execution order is not explicit even though the document can provide more details. Also for first option, we need define FILTERING inside the PreAggregateGapFill function in order to filter out the gapfilled result. The second option
As for the second option (the subquery option), it separates the gapfilling step and the aggregation step. The inner query logically defines the gapfilling step. The aggregation step is defined by the outer query. Its advantage is that the execution step is more explicit than the flat query option. Its problem is that the subquery feature is not in place. It might conflict with the future implementation of subquery. The solution is to make it specific in order to prevent the conflict. When subquery is in place, pre-aggregation gapfilling can migrated to leverage the subquery feature or be compatible with subquery feature.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org