You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Micah Kornfield (Jira)" <ji...@apache.org> on 2021/02/06 23:22:00 UTC

[jira] [Updated] (ARROW-11353) [C++][Python][Parquet] We should allow for overriding to types by providing a schema

     [ https://issues.apache.org/jira/browse/ARROW-11353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Micah Kornfield updated ARROW-11353:
------------------------------------
    Summary: [C++][Python][Parquet] We should allow for overriding to  types by providing a schema  (was: [C++][Python][Parquet] We should allow for overriding to large types by providing a schema)

> [C++][Python][Parquet] We should allow for overriding to  types by providing a schema
> -------------------------------------------------------------------------------------
>
>                 Key: ARROW-11353
>                 URL: https://issues.apache.org/jira/browse/ARROW-11353
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, Python
>            Reporter: Micah Kornfield
>            Priority: Major
>
> {{The following shouldn't throw}}
> {{>>> import pyarrow as pa}}
> {{>>> import pyarrow.parquet as pq}}
> {{>>> import pyarrow.dataset as ds}}
> {{>>> pa.__version__}}
> {{'2.0.0'}}
> {{>>> schema = pa.schema([pa.field("utf8", pa.utf8())])}}
> {{>>> table = pa.Table.from_pydict(\{"utf8": ["foo", "bar"]}, schema)}}
> {{>>> pq.write_table(table, "/tmp/example.parquet")}}
> {{>>> large_schema = pa.schema([pa.field("utf8", pa.large_utf8())])}}
> {{>>> ds.dataset("/tmp/example.parquet", schema=large_schema,}}
> {{format="parquet").to_table()}}
> {{Traceback (most recent call last):}}
> {{  File "<stdin>", line 1, in <module>}}
> {{  File "pyarrow/_dataset.pyx", line 405, in}}
> {{pyarrow._dataset.Dataset.to_table}}
> {{  File "pyarrow/_dataset.pyx", line 2262, in}}
> {{pyarrow._dataset.Scanner.to_table}}
> {{  File "pyarrow/error.pxi", line 122, in}}
> {{pyarrow.lib.pyarrow_internal_check_status}}
> {{  File "pyarrow/error.pxi", line 107, in pyarrow.lib.check_status}}
> {{pyarrow.lib.ArrowTypeError: fields had matching names but differing types.}}
> {{From: utf8: string To: utf8: large_string}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)