You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "chenboat (via GitHub)" <gi...@apache.org> on 2024/02/28 17:58:41 UTC

[I] Auto-tuning Pinot real-time segment size based on actual stream data consumption [pinot]

chenboat opened a new issue, #12513:
URL: https://github.com/apache/pinot/issues/12513

   Currently Pinot's adaptive realtime segment sizing algorithm (as documented [here](https://www.linkedin.com/blog/engineering/open-source/auto-tuning-pinot)  makes the segment sizes converge to a target byte size based on the following assumption. It adjusts the **rows** of new segments based on the rows in the previous segments.
   
   > We assume that the ratio of segment size to number of rows is a constant for each table (say, R).
   
   This assumption may not be valid for the spiky traffic uses (e.g., search log data ingestion because log data volume depends on services state and can be highly volatile). Our result using the adaptive sizing algorithm shows that segments varied a lot because the data size per row changes.
   
   We propose to change to segment size prediction based on actual stream data consumed instead -- which is a more accurate measure than the row count. After one server replicas finishes conuming the target number of bytes, it can commit the segments and work with the rest of the replicas to either catch up to the offset reached (if they have not done so) or ask them to download and replace the finished segment. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [I] Auto-tuning Pinot real-time segment size based on actual stream data consumption [pinot]

Posted by "chenboat (via GitHub)" <gi...@apache.org>.
chenboat commented on issue #12513:
URL: https://github.com/apache/pinot/issues/12513#issuecomment-1972172243

   My proposal is to use number of *bytes* consumed in the previous segment instead of the number of *rows* consumed to determine the current segment consumption target.
   
   In particular, 
   (1) The number of bytes consumed is recorded as part of the zkMetadata per segments
   (2) When a new segment is created, there is a new mode/config to allow the segment to consume until the byte limit.
   (3) When one server replica reaches that limit, it will commit the segment.
   (4) The rest of the replicas will download and replace their current segments.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [I] Auto-tuning Pinot real-time segment size based on actual stream data consumption [pinot]

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on issue #12513:
URL: https://github.com/apache/pinot/issues/12513#issuecomment-1970166086

   The challenge here is how to estimate the segment size during consumption. Do you have a solution in mind?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [I] Auto-tuning Pinot real-time segment size based on actual stream data consumption [pinot]

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on issue #12513:
URL: https://github.com/apache/pinot/issues/12513#issuecomment-1970164750

   Here is another related issue with this algorithm: #12509 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org