You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/07/22 18:04:13 UTC

[GitHub] [arrow] westonpace edited a comment on issue #10776: Capacity error: array cannot contain more than 2147483646 bytes, have 2147489180

westonpace edited a comment on issue #10776:
URL: https://github.com/apache/arrow/issues/10776#issuecomment-885123242


   List arrays and string arrays cannot have more than 2GB.  This is because they are represented as two arrays.  A values array and an offsets array.
   
   ```
           0  1  2  3  4  5  6  7  8  9  10 11 12 13       
   Values: s  t  r  i  n  g  1  s  t  r  i  n  g  2
   Offsets: 0, 7, 14
   ```
   
   The offsets point to the beginning (and end) of each string.  Since the offsets array is int32 the maximum offset is 2GB and so the values array cannot have more than 2GB bytes of values.
   
   Normally, when this limit is hit, a good workaround is to split your data into smaller record batches (you can still represent it as a single table) but it will depend on what you are trying to do.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org