You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "AlenkaF (via GitHub)" <gi...@apache.org> on 2023/06/27 04:05:19 UTC

[GitHub] [arrow] AlenkaF commented on issue #36308: [Python] Conversion of numpy array of bytes (dtype='S') ignores length

AlenkaF commented on issue #36308:
URL: https://github.com/apache/arrow/issues/36308#issuecomment-1608760885

   The logic of conversion to strings in an array constructor does look faulty to me. In the case you mention the bytes get handled correctly if using the default conversion data type which is `binary`:
   
   ```python
   >>> c = pa.array(a)
   >>> c
   <pyarrow.lib.BinaryArray object at 0x115dd6440>
   [
     61,
     6162,
     616263
   ]
   >>> c.to_numpy(zero_copy_only=False)
   array([b'a', b'ab', b'abc'], dtype=object)
   ```
   
   if we cast the `BinaryArray` to the string dtype and call `to_numpy()` we get the expected result (with an object dtype):
   
   ```python
   >>> c.cast(pa.string())
   <pyarrow.lib.StringArray object at 0x115dd6500>
   [
     "a",
     "ab",
     "abc"
   ]
   >>> c.cast(pa.string()).to_numpy(zero_copy_only=False)
   array(['a', 'ab', 'abc'], dtype=object)
   ```
   
   But I suggest we wait for an extra opinion before you start working on a PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org