You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/06/28 08:28:42 UTC

[GitHub] [arrow-rs] jhorstmann opened a new issue #503: Incorrect memory usage calculation for dictionary arrays

jhorstmann opened a new issue #503:
URL: https://github.com/apache/arrow-rs/issues/503


   **Describe the bug**
   The memory usage calculation for dictionary arrays seems to report twice the actual used memory.
   
   ```
   fn get_array_memory_size(&self) -> usize {
       self.data.get_array_memory_size()
           + self.keys.get_array_memory_size()
           + self.values.get_array_memory_size()
           + mem::size_of_val(self)
   }
   ```
   
   In the above code, `self.keys` and `self.values` are pointing to the same data that is in `self.data` and so would be counted twice.
   
   <!--
   **To Reproduce**
   Steps to reproduce the behavior:
   
   **Expected behavior**
   A clear and concise description of what you expected to happen.
   -->
   **Additional context**
   
   The memory usage calculation could actually work the same for all array types and could be implemented as a default method on the `Array` trait.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb closed issue #503: Incorrect memory usage calculation for dictionary arrays

Posted by GitBox <gi...@apache.org>.
alamb closed issue #503:
URL: https://github.com/apache/arrow-rs/issues/503


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] jhorstmann commented on issue #503: Incorrect memory usage calculation for dictionary arrays

Posted by GitBox <gi...@apache.org>.
jhorstmann commented on issue #503:
URL: https://github.com/apache/arrow-rs/issues/503#issuecomment-869890226


   @e-dard I'm just about to open a PR :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] e-dard commented on issue #503: Incorrect memory usage calculation for dictionary arrays

Posted by GitBox <gi...@apache.org>.
e-dard commented on issue #503:
URL: https://github.com/apache/arrow-rs/issues/503#issuecomment-869829579


   @jhorstmann 👋 I recently noticed this discrepency whislt doing some testing for a project that uses Arrow dictionaries. Did you want to take this ticket otherwise I'm happy to fix it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] jhorstmann commented on issue #503: Incorrect memory usage calculation for dictionary arrays

Posted by GitBox <gi...@apache.org>.
jhorstmann commented on issue #503:
URL: https://github.com/apache/arrow-rs/issues/503#issuecomment-869616253


   Implementations in `FixedSizeListArray` and `UnionArray` also seem to be counting child arrays twice.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] jhorstmann commented on issue #503: Incorrect memory usage calculation for dictionary arrays

Posted by GitBox <gi...@apache.org>.
jhorstmann commented on issue #503:
URL: https://github.com/apache/arrow-rs/issues/503#issuecomment-869890226


   @e-dard I'm just about to open a PR :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] e-dard commented on issue #503: Incorrect memory usage calculation for dictionary arrays

Posted by GitBox <gi...@apache.org>.
e-dard commented on issue #503:
URL: https://github.com/apache/arrow-rs/issues/503#issuecomment-869829579


   @jhorstmann 👋 I recently noticed this discrepency whislt doing some testing for a project that uses Arrow dictionaries. Did you want to take this ticket otherwise I'm happy to fix it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org