You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2021/01/08 02:02:41 UTC

[GitHub] [incubator-pinot] fx19880617 opened a new issue #6422: Allow segment generation with limited memory

fx19880617 opened a new issue #6422:
URL: https://github.com/apache/incubator-pinot/issues/6422


   Current segment creation requires unbounded memory as we use hashmap in AbstractColumnStatisticsCollector implementations.
   
   This may fail minion tasks also future connectors like flink/spark/presto data sink.
   
   The goal here is to allow configure a bounded memory size for segment creation.
   
   @Jackie-Jiang 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on issue #6422: Allow segment generation with a bounded memory

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on issue #6422:
URL: https://github.com/apache/incubator-pinot/issues/6422#issuecomment-756977627


   I think you are referring to the hash-set for deduplicating the values? We also stores the values in an array for sorting purpose.
   We can use the off-heap data structure to reduce the memory usage, but that will also be inefficient.
   I have plan to unify the segment creation for offline and real-time segment, and we might be able to borrow the off-heap data structure in real-time segments.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on issue #6422: Allow segment generation with a bounded memory

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on issue #6422:
URL: https://github.com/apache/incubator-pinot/issues/6422#issuecomment-756977627


   I think you are referring to the hash-set for deduplicating the values? We also stores the values in an array for sorting purpose.
   We can use the off-heap data structure to reduce the memory usage, but that will also be inefficient.
   I have plan to unify the segment creation for offline and real-time segment, and we might be able to borrow the off-heap data structure in real-time segments.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org