You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/08/19 20:42:33 UTC

[GitHub] [arrow-datafusion] alamb commented on issue #899: Track memory usage for each individual operator

alamb commented on issue #899:
URL: https://github.com/apache/arrow-datafusion/issues/899#issuecomment-902231127


   I was kind of imagining we would have to do something like manually registering memory allocations. the `malloc_size_of` trait is a cool idea. 
   
   While it would be likely be crazy complicated to do this for all allocations, I think all the built in DataFusion operators use most of their memory in intermediate RecordBatches and a potential single large structure (e.g. the hash tables in hash_join and hash_aggregate) If we captured these large sources I think that would get us most of the value


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org