You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/25 10:12:21 UTC

[GitHub] [arrow-rs] ritchie46 opened a new issue #347: Reduce memory of concat kernel

ritchie46 opened a new issue #347:
URL: https://github.com/apache/arrow-rs/issues/347


   The `concat` kernel concats multiple arrays to contiguous memory. It has the potential to only allocate the required memory. However for (large)-utf8 and for (large)-list it does not do so and relies on exponential allocation to get the required memory.
   
   Precomputing the needed capacity of buffers would be faster and less memory heavy.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] ritchie46 commented on issue #347: Reduce memory of concat kernel

Posted by GitBox <gi...@apache.org>.
ritchie46 commented on issue #347:
URL: https://github.com/apache/arrow-rs/issues/347#issuecomment-850951895


   I think we can apply the same logic to arrays with child data.
   
   * [ ] list
   * [ ] large-list
   * [ ] struct
   * [ ] dictionary
   
   come to mind.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] jorgecarleitao commented on issue #347: Reduce memory of concat kernel

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on issue #347:
URL: https://github.com/apache/arrow-rs/issues/347#issuecomment-850961026


   I agree that it would be great to have a method to specify all capacities.
   
   The required capacity is usually dependent on the problem over which `MutableArrayData` is used for. Therefore, I suggest that the calculation of which capacity to use to be made by the user of `MutableArrayData`, and not inside `MutableArrayData` itself.
   
   In this context, a way to address this is to allow something like
   
   ```rust
   MutableArrayData::with_capacities(capacities: Capacities);
   
   enum Capacities {
      Binary(usize, Option<usize>),  // Binary, Utf8
      List(usize, Option<Box<Capacities>>),
      ...
   }
   ```
   
   and let `MutableArrayData` panic if the capacity variant is incompatible with the arrays' `DataType`.
   
   This gives users the freedom to pass other capacities.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] Dandandan commented on issue #347: Reduce memory of concat kernel

Posted by GitBox <gi...@apache.org>.
Dandandan commented on issue #347:
URL: https://github.com/apache/arrow-rs/issues/347#issuecomment-850894038


   @ritchie46 is this closed in your eyes by #348?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org