You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@orc.apache.org by "wgtmac (via GitHub)" <gi...@apache.org> on 2023/02/03 07:09:45 UTC

[GitHub] [orc] wgtmac commented on issue #1240: Huge memory taken for each field when exporting

wgtmac commented on issue #1240:
URL: https://github.com/apache/orc/issues/1240#issuecomment-1415187527

   > > Hello, it seems there were commits referencing this issue. Is this issue now fixed ?
   > 
   > @LouisClt Thanks for your follow-up.
   > 
   > We have implemented a block-based buffer called `BlockBuffer` (by @coderex2522) and used it to replace the output buffer in the `CompressionStream`. It can decrease the memory footprint to some extent.
   > 
   > IMO, the next step is to use it to replace the input buffer of the `CompressionStream` which has the size of `compressionBlockSize` per stream.
   
   To be precise, the `rawInputBuffer` of every CompressionStream is fixed to the compression block size which is 1M by default. Writer with many columns will suffer from large memory footprint and nothing can be done to alleviate it.
   
   I have created a JIRA to track it: https://issues.apache.org/jira/browse/ORC-1365
   
   cc @coderex2522 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org