You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "mingmwang (via GitHub)" <gi...@apache.org> on 2023/04/25 10:38:08 UTC

[GitHub] [arrow-datafusion] mingmwang commented on issue #1570: Memory Limited GroupBy (Externalized / Spill)

mingmwang commented on issue #1570:
URL: https://github.com/apache/arrow-datafusion/issues/1570#issuecomment-1521564406

   @alamb @yjshen 
   Can we make the `GroupState` and the Accumulator states serializable ? 
   With this approach, we do not need to do any sort when spiiling data to disks. And when we read the data back, we reconstruct our raw hash table quickly from the hash values and indexes, because our hashmap is very lightweight, the hash value can be re-calculated from grouping rows, or we can cache the hash value inside the `GroupState` to avoid the re-calculating.
    
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org