You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/01/10 21:12:09 UTC
[GitHub] [pinot] weixiangsun commented on issue #7422: Support aggregate functions to better handle timeseries data

weixiangsun commented on issue #7422:
URL: https://github.com/apache/pinot/issues/7422#issuecomment-1009346917


   Here is the document for Pre-Aggregation Gapfilling function design: https://docs.google.com/document/d/1FQbsTVywWLlmFaeUcYqDkb-XRWCpVhZe10BZyD2VzcU/edit#heading=h.brbishq9xbz1
   
   @siddharthteotia @jackjlli @amrishlal @lakshmanan-v and I had couple of rounds of design meetings. Thanks a lot for the effort!
   Here is the meeting minutes:
   1. We went through the design document and discussed the query design and implementation design. We reached the consensus that the implementation looks good. 
   2. We debated about the query design. We came out two feasible options:
   flat query option:
   SELECT 
   PREAGGREGATEGAPFILL(DATETIMECONVERT(event_time, '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '15:MINUTES'), '1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss.SSS', '2021-10-01 09:00:00+00', '2021-10-01 12:00:00+00', '15:MINUTES', TIME_SERIES_ON(eventTime, lotId), FILTERING(is_occupied = 1), FILL(is_occupied,FILL_TYPE_PREVIOUS)) AS time_col,
   SUM(is_occupied) AS occupied_slots_count
   FROM parking_data
   WHERE event_time >= '10/01/2021 09:00:00+00' AND  event_time <= '10/01/2021 12:00:00+00'
   GROUP BY time_col
   ORDER BY time_col
   
   and subquery option:
   SELECT time_col, SUM(is_occupied) AS occupied_slots_count
   FROM (
     SELECT PREAGGREGATEGAPFILL(DATETIMECONVERT(event_time, '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '15:MINUTES'), '1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss.SSS', '2021-10-01 09:00:00+00', '2021-10-01 12:00:00+00', '15:MINUTES', TIME_SERIES_ON(eventTime, lotId), FILL(is_occupied,FILL_TYPE_PREVIOUS)) AS time_col, is_occupied
     FROM parking_data
     WHERE event_time >= '10/01/2021 09:00:00+00' AND  event_time <= '10/01/2021 12:00:00+00')
   WHERE isOccupied = 1
   GROUP BY time_col
   ORDER BY time_col
   
   As for the first option (flat query option), the PreAggregateGapFill function contains the hint about the pre-aggregate gapfilling. Everything about pre-aggregation is contained inside this function. But its problem is that the first selection expression will generate more rows. The aggregation selection expression will reduce the number of rows. This is counterintuitive. The execution order is not explicit even though the document can provide more details. Also for first option, we need define FILTERING inside the PreAggregateGapFill function in order to filter out the gapfilled result. The second option 
   
   As for the second option (the subquery option), it separates the gapfilling step and the aggregation step. The inner query logically defines the gapfilling step. The aggregation step is defined by the outer query. Its advantage is that the execution step is more explicit than the flat query option. Its problem is that the subquery feature is not in place. It might conflict with the future implementation of subquery. The solution is to make it specific in order to prevent the conflict. When subquery is in place, pre-aggregation gapfilling can migrated to leverage the subquery feature or be compatible with subquery feature. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org