You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/07/23 02:01:13 UTC

[GitHub] [arrow] lz19970205 commented on issue #10776: Capacity error: array cannot contain more than 2147483646 bytes, have 2147489180

lz19970205 commented on issue #10776:
URL: https://github.com/apache/arrow/issues/10776#issuecomment-885349007


   > List arrays and string arrays cannot have more than 2GB. This is because they are represented as two arrays. A values array and an offsets array.
   > 
   > ```
   >         0  1  2  3  4  5  6  7  8  9  10 11 12 13       
   > Values: s  t  r  i  n  g  1  s  t  r  i  n  g  2
   > Offsets: 0, 7, 14
   > ```
   > 
   > The offsets point to the beginning (and end) of each string. Since the offsets array is int32 the maximum offset is 2GB and so the values array cannot have more than 2GB bytes of values.
   > 
   > Normally, when this limit is hit, a good workaround is to split your data into smaller record batches (you can still represent it as a single table) but it will depend on what you are trying to do.
   
   Thanks for the reply!
   So you mean there is a big array in my data?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org