You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Weston Pace (Jira)" <ji...@apache.org> on 2022/01/03 16:48:00 UTC
[jira] [Created] (ARROW-15237) [C++] Add cast to Null from any type?
Weston Pace created ARROW-15237:
-----------------------------------
Summary: [C++] Add cast to Null from any type?
Key: ARROW-15237
URL: https://issues.apache.org/jira/browse/ARROW-15237
Project: Apache Arrow
Issue Type: Improvement
Components: C++
Reporter: Weston Pace
The "cannot cast to null" error came up during a dataset operation in this SO question: https://stackoverflow.com/questions/70566660/parquet-with-null-columns-on-pyarrow/70568419#70568419
Although I suspect casting to null is generally a sign that the user is doing something wrong (why throw away data?) there may be some corner cases where it is desired and it may be nice just for consistency.
Simple reproduction (admittedly, the best answer here would probably be to use the schema from tab2):
{code}
import os
import pyarrow as pa
import pyarrow.dataset as ds
import pyarrow.parquet as pq
tab = pa.Table.from_pydict({'x': [1, 2, 3], 'y': [None, None, None]})
tab2 = pa.Table.from_pydict({'x': [4, 5, 6], 'y': ['x', 'y', 'z']})
os.makedirs('/tmp/null_first_dataset', exist_ok=True)
pq.write_table(tab, '/tmp/null_first_dataset/0.parquet')
pq.write_table(tab2, '/tmp/null_first_dataset/1.parquet')
dataset = ds.dataset('/tmp/null_first_dataset')
tab = dataset.to_table()
print(tab)
{code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)