You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2020/12/18 20:15:24 UTC

[GitHub] [incubator-pinot] amitchopraait opened a new issue #6368: Smart query layer with rolled up data

amitchopraait opened a new issue #6368:
URL: https://github.com/apache/incubator-pinot/issues/6368


   Our use case is to provide users with capability to slice and dice data over varying time intervals. The user may look at certain metric over last month and then zoom in to a specific week, day or hour to further analyze the data.
   
   For this we plan to store raw segments, as well have rollup jobs (using minion) to have aggregated data for day, month etc. With this, we will only lose the granularity of the time column but will not lose any of the old dimensions.
   
   To take an example:
   -------------------------------
   
   Event Timestamp | Org | Device | Rule Id | Process Name | Process Hash | Count
   -- | -- | -- | -- | -- | -- | --
   2020/05/01 00:13:11 | Coke | Amit-01 | 111 | cmd.exe | 12345678 | 3
   2020/05/01 00:20:11 | Pepsi | Rahul-01 | 222 | java.exe | 98765432 | 1
   2020/05/01 00:30:11 | Coke | Amit-01 | 111 | cmd.exe | 12345678 | 1
   2020/05/01 00:44:11 | Coke | Amit-01 | 111 | cmd.exe | 12345678 | 1
   2020/05/01 00:55:11 | Coke | Amit-01 | 222 | java.exe | 98765432 | 1
   
   But if we rollup the data to hour granularity from second granularity in the above example, we will have the following data in rolled up segment. As you can see, no loss of dimensions, only loss of granularity of time:
   
   Event Timestamp | Org | Device | Rule Id | Process Name | Process Hash | Count
   -- | -- | -- | -- | -- | -- | --
   2020/05/01 0000 | Coke | Amit-01 | 111 | cmd.exe | 12345678 | 5
   2020/05/01 0000 | Pepsi | Rahul-01 | 222 | java.exe | 98765432 | 1
   2020/05/01 0000 | Coke | Amit-01 | 222 | java.exe | 98765432 | 1
   
   
   Now, given these raw as well rolled up segments (for day, week, hour), it would be great if the broker can understand and decide which segment to use, depending on the query time interval.
   
   Also attached a diagram to show the rollup and smart query pictorially
   <img width="737" alt="Screen Shot 2020-12-17 at 10 46 34 PM" src="https://user-images.githubusercontent.com/2099006/102657423-3e5b9f80-413b-11eb-871d-7be96880e65c.png">
    


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] fx19880617 edited a comment on issue #6368: Smart query layer with rolled up data

Posted by GitBox <gi...@apache.org>.
fx19880617 edited a comment on issue #6368:
URL: https://github.com/apache/incubator-pinot/issues/6368#issuecomment-748389769


   Just to think about this problem.
   
   I feel we need to pre-define the granularity of segment metadata, then in the broker routing strategy, we can implement the hierarchy of segments with different time granularity. And of course, the time boundary should be aligned always.
   
   Then for each query,  we can parse the time range from the query then pick the right segments to cover the entire time range.
   
   
   Apart from generating roll-up segments, have you tried Star-tree for this metrics aggregation use case? Though this is single-segment level aggregation. 
   https://docs.pinot.apache.org/basics/indexing/star-tree-index
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] fx19880617 commented on issue #6368: Smart query layer with rolled up data

Posted by GitBox <gi...@apache.org>.
fx19880617 commented on issue #6368:
URL: https://github.com/apache/incubator-pinot/issues/6368#issuecomment-748389769


   Just to think about this problem.
   
   I think we need to pre-define the granularity of segment metadata, then in the broker routing strategy, we can implement the hierarchy of segments with different time granularity.
   
   Then for each query,  we can parse the time range from the query then pick the right segments to cover the entire time range.
   
   
   Also, have you tried Star-tree for this metrics aggregation use case? Though this is single-segment level aggregation. 
   https://docs.pinot.apache.org/basics/indexing/star-tree-index
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] fx19880617 edited a comment on issue #6368: Smart query layer with rolled up data

Posted by GitBox <gi...@apache.org>.
fx19880617 edited a comment on issue #6368:
URL: https://github.com/apache/incubator-pinot/issues/6368#issuecomment-748389769


   Just to think about this problem.
   
   I feel we need to pre-define the granularity of segment metadata, then in the broker routing strategy, we can implement the hierarchy of segments with different time granularity. And of course, the time boundary should be aligned always.
   
   Then for each query,  we can parse the time range from the query then pick the right segments to cover the entire time range.
   
   
   Apart from generating roll-up segments, have you tried Star-tree for this metrics aggregation use case? Though this is still at single-segment level. 
   https://docs.pinot.apache.org/basics/indexing/star-tree-index
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org