You are viewing a plain text version of this content. The canonical link for it is here.

Posted to jira@arrow.apache.org by "Weston Pace (Jira)" <ji...@apache.org> on 2022/01/03 17:28:00 UTC

[jira] [Closed] (ARROW-15237) [C++] Add cast to Null from any type?

     [ https://issues.apache.org/jira/browse/ARROW-15237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Weston Pace closed ARROW-15237.
-------------------------------
    Resolution: Won't Fix

> [C++] Add cast to Null from any type?
> -------------------------------------
>
>                 Key: ARROW-15237
>                 URL: https://issues.apache.org/jira/browse/ARROW-15237
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Weston Pace
>            Priority: Major
>
> The "cannot cast to null" error came up during a dataset operation in this SO question: https://stackoverflow.com/questions/70566660/parquet-with-null-columns-on-pyarrow/70568419#70568419
> Although I suspect casting to null is generally a sign that the user is doing something wrong (why throw away data?) there may be some corner cases where it is desired and it may be nice just for consistency.
> Simple reproduction (admittedly, the best answer here would probably be to use the schema from tab2):
> {code}
> import os
> import pyarrow as pa
> import pyarrow.dataset as ds
> import pyarrow.parquet as pq
> tab = pa.Table.from_pydict({'x': [1, 2, 3], 'y': [None, None, None]})
> tab2 = pa.Table.from_pydict({'x': [4, 5, 6], 'y': ['x', 'y', 'z']})
> os.makedirs('/tmp/null_first_dataset', exist_ok=True)
> pq.write_table(tab, '/tmp/null_first_dataset/0.parquet')
> pq.write_table(tab2, '/tmp/null_first_dataset/1.parquet')
> dataset = ds.dataset('/tmp/null_first_dataset')
> tab = dataset.to_table()
> print(tab)
> {code} 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)