You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/25 02:36:01 UTC

[GitHub] [arrow] yli1994 commented on issue #14726: pq.read_table("parquet files path", memory_map=True) still consume large memory space(200G file cost 200G memory and slow)

yli1994 commented on issue #14726:
URL: https://github.com/apache/arrow/issues/14726#issuecomment-1326961628

   > Parquet files are written with compression turned on by default, which means that usually the size on disk is much (depending on the data several times!) smaller than the actual in-memory size of the data.
   > 
   > Can you confirm if the file is written with compression?
   > 
   > cc @jorisvandenbossche
   
   Hi @assignUser ,
   
   I wrote parquet both in "snappy" and "zstd" compression format, and the sizes are 202G and 158G respectively, but what I expect is that "memory map" reading method should not increase the memory occupied.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org