You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Weston Pace (Jira)" <ji...@apache.org> on 2022/01/03 16:48:00 UTC

[jira] [Created] (ARROW-15237) [C++] Add cast to Null from any type?

Weston Pace created ARROW-15237:
-----------------------------------

             Summary: [C++] Add cast to Null from any type?
                 Key: ARROW-15237
                 URL: https://issues.apache.org/jira/browse/ARROW-15237
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++
            Reporter: Weston Pace


The "cannot cast to null" error came up during a dataset operation in this SO question: https://stackoverflow.com/questions/70566660/parquet-with-null-columns-on-pyarrow/70568419#70568419

Although I suspect casting to null is generally a sign that the user is doing something wrong (why throw away data?) there may be some corner cases where it is desired and it may be nice just for consistency.

Simple reproduction (admittedly, the best answer here would probably be to use the schema from tab2):

{code}
import os

import pyarrow as pa
import pyarrow.dataset as ds
import pyarrow.parquet as pq

tab = pa.Table.from_pydict({'x': [1, 2, 3], 'y': [None, None, None]})
tab2 = pa.Table.from_pydict({'x': [4, 5, 6], 'y': ['x', 'y', 'z']})

os.makedirs('/tmp/null_first_dataset', exist_ok=True)
pq.write_table(tab, '/tmp/null_first_dataset/0.parquet')
pq.write_table(tab2, '/tmp/null_first_dataset/1.parquet')

dataset = ds.dataset('/tmp/null_first_dataset')
tab = dataset.to_table()
print(tab)
{code} 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)