You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2020/07/24 23:15:55 UTC

[GitHub] [incubator-pinot] fx19880617 opened a new issue #5753: Built-in jobs to move segments of hybrid tables from Realtime Servers to Offline Servers

fx19880617 opened a new issue #5753:
URL: https://github.com/apache/incubator-pinot/issues/5753


   We should try to have some built-in scheduled jobs to move segments from realtime servers to offline servers. So that users don't need to setup external jobs to push segments from external data sources.
   
   So that, we can keep limited capacity for realtime servers and keep adding new servers to offline cluster and rebalance segments.
   
   Also this could enable further supports on offline servers to do segment merge and rollups.
   
   cc: @kishoreg @snleee @Jackie-Jiang @mcvsubbu @npawar 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mangrrua commented on issue #5753: Built-in jobs to move segments of hybrid tables from Realtime Servers to Offline Servers

Posted by GitBox <gi...@apache.org>.
mangrrua commented on issue #5753:
URL: https://github.com/apache/incubator-pinot/issues/5753#issuecomment-665255240






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] npawar commented on issue #5753: Built-in jobs to move segments of hybrid tables from Realtime Servers to Offline Servers

Posted by GitBox <gi...@apache.org>.
npawar commented on issue #5753:
URL: https://github.com/apache/incubator-pinot/issues/5753#issuecomment-714661190


   Basic feature (move from realtime to offline using minions) is complete.
   Let's open new issues for enhancements related to using hadoop map reduce/spark etc.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu commented on issue #5753: Built-in jobs to move segments of hybrid tables from Realtime Servers to Offline Servers

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on issue #5753:
URL: https://github.com/apache/incubator-pinot/issues/5753#issuecomment-665753754


   > use case:
   > Most of the queries in our system are for recent time frame (last 7 days) but we want to retain data for much longer time period. For this, we want to deploy a tiered storage system where realtime servers have faster disk and more cpu and memory while offline servers have slower (and bigger) disks with less cpu and memory. This way, we can keep adding more offline servers as the data volume grow. Moving segments from realtime to offline table will provide us cost optimization.
   
   You can do this today by setting tagOverrideConfig, and moving the completed segments to any tagged host. Of course, this will move all completed segments, not just the older ones. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] kishoreg edited a comment on issue #5753: Built-in jobs to move segments of hybrid tables from Realtime Servers to Offline Servers

Posted by GitBox <gi...@apache.org>.
kishoreg edited a comment on issue #5753:
URL: https://github.com/apache/incubator-pinot/issues/5753#issuecomment-708503218


   Why do you say that? As long as you give enough buffer time for the events from previous time period to flow in, it should be ok right?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] npawar commented on issue #5753: Built-in jobs to move segments of hybrid tables from Realtime Servers to Offline Servers

Posted by GitBox <gi...@apache.org>.
npawar commented on issue #5753:
URL: https://github.com/apache/incubator-pinot/issues/5753#issuecomment-665126719


   This will be covered as part of tiered storage: https://github.com/apache/incubator-pinot/issues/5553


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] fx19880617 commented on issue #5753: Built-in jobs to move segments of hybrid tables from Realtime Servers to Offline Servers

Posted by GitBox <gi...@apache.org>.
fx19880617 commented on issue #5753:
URL: https://github.com/apache/incubator-pinot/issues/5753#issuecomment-665945918


   > @kishoreg I'm not sure if we should move realtime completed segments to offline servers, or to another set of realtime servers. For a realtime only table, I don't see the benefit of making it hybrid compared with just moving completed segments to another set of realtime servers, we also need to pay the extra cost for the extra filter for hybrid table.
   > We should treat realtime table as first class citizen and support merge/rollup/backfill etc. the same way as offline table.
   
   I think the major challenging here is to support atomic swap for a batch of segments.
   Also things like batch backfilled segments are usually time bounded, but realtime segments are not.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mayankshriv commented on issue #5753: Built-in jobs to move segments of hybrid tables from Realtime Servers to Offline Servers

Posted by GitBox <gi...@apache.org>.
mayankshriv commented on issue #5753:
URL: https://github.com/apache/incubator-pinot/issues/5753#issuecomment-664739660


   While periodically moving the segments from realtime to offline is a good idea, in many cases, it would also benefit to have the segment merge/rollup performed before moving to offline. This may require a config of its own on what kind of processing needs to be performed before moving the segments to offline. Would be good to start from a user story on the requirements, and then translate them into a design doc.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] fx19880617 commented on issue #5753: Built-in jobs to move segments of hybrid tables from Realtime Servers to Offline Servers

Posted by GitBox <gi...@apache.org>.
fx19880617 commented on issue #5753:
URL: https://github.com/apache/incubator-pinot/issues/5753#issuecomment-665344520


   > That will be great! If merge/rollup can be applied(@mayankshriv 's suggestion), users can have a lot of flexibility. Because generally, realtime segments represents minimal aggregation. Improve query performance, retain data in long-term and save some other costs.
   > 
   > For that, pinot ui can have a scheduler service(jobs can be set for a specified times, and config can be set etc. Also with api of course), so users can configure offline jobs for realtime to offline segments. At the backend, job(maybe apache spark or classical mapreduce) can process realtime segments in parallel, and produce offline segments.
   
   Right, ideally we should have multiple built-in jobs to handle the basic data loading/re-organizing workload and use hadoop/spark for advance/parallelism workload


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu commented on issue #5753: Built-in jobs to move segments of hybrid tables from Realtime Servers to Offline Servers

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on issue #5753:
URL: https://github.com/apache/incubator-pinot/issues/5753#issuecomment-708520151


   I mis-worded it. The results will be the same, but the segments in each data center may not be the same, right? I am not sure if the m to n segment reduction and time boundary computation has the exact same predictable results all the time. In that case, maybe we are fine, but things may change during software upgrade, for example.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] npawar commented on issue #5753: Built-in jobs to move segments of hybrid tables from Realtime Servers to Offline Servers

Posted by GitBox <gi...@apache.org>.
npawar commented on issue #5753:
URL: https://github.com/apache/incubator-pinot/issues/5753#issuecomment-714784977


   Documentation: https://docs.pinot.apache.org/operators/operating-pinot/pinot-managed-offline-flows


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu commented on issue #5753: Built-in jobs to move segments of hybrid tables from Realtime Servers to Offline Servers

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on issue #5753:
URL: https://github.com/apache/incubator-pinot/issues/5753#issuecomment-708496354


   I just realized that if we have multiple data centers, this technique will not produce the same results across the data centers. Something worth noting.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] singalravi commented on issue #5753: Built-in jobs to move segments of hybrid tables from Realtime Servers to Offline Servers

Posted by GitBox <gi...@apache.org>.
singalravi commented on issue #5753:
URL: https://github.com/apache/incubator-pinot/issues/5753#issuecomment-664844703


   use case:
   Most of the queries in our system are for recent time frame (last 7 days) but we want to retain data for much longer time period. For this, we want to deploy a tiered storage system where realtime servers have faster disk and more cpu and memory while offline servers have slower (and bigger) disks with less cpu and memory. This way, we can keep adding more offline servers as the data volume grow. Moving segments from realtime to offline table will provide us cost optimization.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] kishoreg commented on issue #5753: Built-in jobs to move segments of hybrid tables from Realtime Servers to Offline Servers

Posted by GitBox <gi...@apache.org>.
kishoreg commented on issue #5753:
URL: https://github.com/apache/incubator-pinot/issues/5753#issuecomment-708503218


   Why do you say that? As long as you give enough buffer, it should be ok right?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] kishoreg commented on issue #5753: Built-in jobs to move segments of hybrid tables from Realtime Servers to Offline Servers

Posted by GitBox <gi...@apache.org>.
kishoreg commented on issue #5753:
URL: https://github.com/apache/incubator-pinot/issues/5753#issuecomment-665860439


   @Jackie-Jiang Interesting idea. I am all for removing the distinction between real-time and offline tables. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] npawar commented on issue #5753: Built-in jobs to move segments of hybrid tables from Realtime Servers to Offline Servers

Posted by GitBox <gi...@apache.org>.
npawar commented on issue #5753:
URL: https://github.com/apache/incubator-pinot/issues/5753#issuecomment-682229196


   Design doc for this: https://docs.google.com/document/d/1-e_9aHQB4HXS38ONtofdxNvMsGmAoYfSnc2LP88MbIc/edit#
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] npawar removed a comment on issue #5753: Built-in jobs to move segments of hybrid tables from Realtime Servers to Offline Servers

Posted by GitBox <gi...@apache.org>.
npawar removed a comment on issue #5753:
URL: https://github.com/apache/incubator-pinot/issues/5753#issuecomment-665126719


   This will be covered as part of tiered storage: https://github.com/apache/incubator-pinot/issues/5553


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on issue #5753: Built-in jobs to move segments of hybrid tables from Realtime Servers to Offline Servers

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on issue #5753:
URL: https://github.com/apache/incubator-pinot/issues/5753#issuecomment-665818695






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] npawar closed issue #5753: Built-in jobs to move segments of hybrid tables from Realtime Servers to Offline Servers

Posted by GitBox <gi...@apache.org>.
npawar closed issue #5753:
URL: https://github.com/apache/incubator-pinot/issues/5753


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org