You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/09/30 13:09:56 UTC

[GitHub] [arrow] jorisvandenbossche commented on pull request #8271: ARROW-9991: [C++] split kernels for strings/binary

jorisvandenbossche commented on pull request #8271:
URL: https://github.com/apache/arrow/pull/8271#issuecomment-701380049


   > This also raised a question: Should both the list entry and the string array entry have a missing/null value if the input string contains a null value? I think it should because if we ask for the underlying string array, and the string value that the missing list entry points to is not a missing value, it will be an empty string, which seems odd to me.
   
   Buf if the original string was a null, then the output also contains a (top-level) null, and not a list with a null?
   
   And then such top-level nulls are typically not put as an entry in the actual values array (or at least when arrow is building up a list array itself, the format might also not allow otherwise). Eg:
   
   ```
   In [10]: arr = pa.array([["a", "b"], None, ["c"], [None]])
   
   In [11]: arr
   Out[11]: 
   <pyarrow.lib.ListArray object at 0x7f2316c1d108>
   [
     [
       "a",
       "b"
     ],
     null,
     [
       "c"
     ],
     [
       null
     ]
   ]
   
   In [12]: arr.values
   Out[12]: 
   <pyarrow.lib.StringArray object at 0x7f2316c1d5e8>
   [
     "a",
     "b",
     "c",
     null
   ]
   ```
   
   (so only the null that was inside a list is represented as null in the underlying values array)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org