You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "agagrins (via GitHub)" <gi...@apache.org> on 2023/03/27 10:47:45 UTC

[GitHub] [arrow] agagrins opened a new issue, #34739: Cannot append nullable string columns to table

agagrins opened a new issue, #34739:
URL: https://github.com/apache/arrow/issues/34739

   ### Describe the bug, including details regarding any error messages, version, and platform.
   
   ## Issue Description
   
   It seems not possible to be able to append a nullable column to a table.
   
   Start off with a table. This works fine.
   ```
   $ python
   >>> import pyarrow
   >>> table = pyarrow.Table.from_pylist([{"a": None}], pyarrow.schema([pyarrow.field("a", pyarrow.string(), nullable=True)]))
   ```
   
   Then try to add another similar column. This does work.
   ```
   >>> table = table.append_column(pyarrow.field("b", pyarrow.string(), nullable=True), [["x"]])
   ```
   
   Then try to add another similar column. This does **not** work.
   ```
   >>> table = table.append_column(pyarrow.field("c", pyarrow.string(), nullable=True), [[None]])
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File "pyarrow/table.pxi", line 4495, in pyarrow.lib.Table.append_column
     File "pyarrow/table.pxi", line 4452, in pyarrow.lib.Table.add_column
     File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
     File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
   pyarrow.lib.ArrowInvalid: Field type did not match data type
   ```
   
   ## System Info
   
   This is on Windows 11 under WSL2
   
   ```
   >>> pyarrow.show_versions()
   pyarrow version info
   --------------------
   Package kind              : python-wheel-manylinux2014
   Arrow C++ library version : 11.0.0
   Arrow C++ compiler        : GNU 10.2.1
   Arrow C++ compiler flags  :  -fdiagnostics-color=always
   Arrow C++ git revision    :
   Arrow C++ git description :
   Arrow C++ build type      : release
   ```
   
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] agagrins closed issue #34739: [Python] Cannot append nullable string columns to table

Posted by "agagrins (via GitHub)" <gi...@apache.org>.
agagrins closed issue #34739: [Python] Cannot append nullable string columns to table
URL: https://github.com/apache/arrow/issues/34739


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] agagrins commented on issue #34739: [Python] Cannot append nullable string columns to table

Posted by "agagrins (via GitHub)" <gi...@apache.org>.
agagrins commented on issue #34739:
URL: https://github.com/apache/arrow/issues/34739#issuecomment-1485091246

   Ok, great explanation!
   
   My first thought was then "But what happens if you only have NULL values at the moment?"
   
   The answer is that you can pre-create the chunked array
   ```
   table = table.append_column(pyarrow.field("c", pyarrow.string(), nullable=True), pyarrow.chunked_array([[None]], type=pyarrow.string()))
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on issue #34739: [Python] Cannot append nullable string columns to table

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on issue #34739:
URL: https://github.com/apache/arrow/issues/34739#issuecomment-1485064943

   The issue in the example that doesn't work is that a `NullArray` is created (in [table.add_column](https://github.com/apache/arrow/blob/main/python/pyarrow/table.pxi#L4562-L4565)) as the only element in the column being appended is `None`. `NullArray` is of type `pa.null()` and not `pa.string()` and so we get an `ArrowInvalid` error:
   
   ```python
   >>> pa.chunked_array([["x"]])
   <pyarrow.lib.ChunkedArray object at 0x11672b600>
   [
     [
       "x"
     ]
   ]
   >>> pa.chunked_array([["x"]]).chunk(0)
   <pyarrow.lib.StringArray object at 0x11671ae60>
   [
     "x"
   ]
   >>> pa.chunked_array([[None]])
   <pyarrow.lib.ChunkedArray object at 0x11672b740>
   [
   1 nulls
   ]
   >>> pa.chunked_array([[None]]).chunk(0)
   <pyarrow.lib.NullArray object at 0x11671ae60>
   1 nulls
   ```
   
   That will not happen if you have examples with more than one row and not all elements of a column missing:
   
   ```python
   >>> pa.chunked_array([[None, "x"]]).chunk(0)
   <pyarrow.lib.StringArray object at 0x11671af80>
   [
     null,
     "x"
   ]
   ```
   
   ```python
   import pyarrow as pa
   table = pa.Table.from_pylist([{"a": None}, {"a": "first"}], pa.schema([pa.field("a", pa.string(), nullable=True)]))
   table = table.append_column(pa.field("b", pa.string(), nullable=True), [["x", "y"]])
   table = table.append_column(pa.field("n", pa.string(), nullable=True), [[None, "second"]])
   table
   # pyarrow.Table
   # a: string
   # b: string
   # n: string
   # ----
   # a: [[null,"first"]]
   # b: [["x","y"]]
   # n: [[null,"second"]]
   table.schema.field("n")
   # pyarrow.Field<n: string>
   table.schema.field("n").nullable
   # True
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org