You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Micah Kornfield (Jira)" <ji...@apache.org> on 2021/02/06 23:22:00 UTC
[jira] [Updated] (ARROW-11353) [C++][Python][Parquet] We should
allow for overriding to types by providing a schema
[ https://issues.apache.org/jira/browse/ARROW-11353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Micah Kornfield updated ARROW-11353:
------------------------------------
Summary: [C++][Python][Parquet] We should allow for overriding to types by providing a schema (was: [C++][Python][Parquet] We should allow for overriding to large types by providing a schema)
> [C++][Python][Parquet] We should allow for overriding to types by providing a schema
> -------------------------------------------------------------------------------------
>
> Key: ARROW-11353
> URL: https://issues.apache.org/jira/browse/ARROW-11353
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++, Python
> Reporter: Micah Kornfield
> Priority: Major
>
> {{The following shouldn't throw}}
> {{>>> import pyarrow as pa}}
> {{>>> import pyarrow.parquet as pq}}
> {{>>> import pyarrow.dataset as ds}}
> {{>>> pa.__version__}}
> {{'2.0.0'}}
> {{>>> schema = pa.schema([pa.field("utf8", pa.utf8())])}}
> {{>>> table = pa.Table.from_pydict(\{"utf8": ["foo", "bar"]}, schema)}}
> {{>>> pq.write_table(table, "/tmp/example.parquet")}}
> {{>>> large_schema = pa.schema([pa.field("utf8", pa.large_utf8())])}}
> {{>>> ds.dataset("/tmp/example.parquet", schema=large_schema,}}
> {{format="parquet").to_table()}}
> {{Traceback (most recent call last):}}
> {{ File "<stdin>", line 1, in <module>}}
> {{ File "pyarrow/_dataset.pyx", line 405, in}}
> {{pyarrow._dataset.Dataset.to_table}}
> {{ File "pyarrow/_dataset.pyx", line 2262, in}}
> {{pyarrow._dataset.Scanner.to_table}}
> {{ File "pyarrow/error.pxi", line 122, in}}
> {{pyarrow.lib.pyarrow_internal_check_status}}
> {{ File "pyarrow/error.pxi", line 107, in pyarrow.lib.check_status}}
> {{pyarrow.lib.ArrowTypeError: fields had matching names but differing types.}}
> {{From: utf8: string To: utf8: large_string}}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)