You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Micah Kornfield (Jira)" <ji...@apache.org> on 2021/01/23 03:49:00 UTC

[jira] [Created] (ARROW-11353) [C++][Python][Parquet] We should allow for overriding to large types by providing a schema

Micah Kornfield created ARROW-11353:
---------------------------------------

             Summary: [C++][Python][Parquet] We should allow for overriding to large types by providing a schema
                 Key: ARROW-11353
                 URL: https://issues.apache.org/jira/browse/ARROW-11353
             Project: Apache Arrow
          Issue Type: Bug
          Components: C++, Python
            Reporter: Micah Kornfield


{{The following shouldn't throw}}

{{>>> import pyarrow as pa}}
{{>>> import pyarrow.parquet as pq}}
{{>>> import pyarrow.dataset as ds}}
{{>>> pa.__version__}}
{{'2.0.0'}}
{{>>> schema = pa.schema([pa.field("utf8", pa.utf8())])}}
{{>>> table = pa.Table.from_pydict(\{"utf8": ["foo", "bar"]}, schema)}}
{{>>> pq.write_table(table, "/tmp/example.parquet")}}
{{>>> large_schema = pa.schema([pa.field("utf8", pa.large_utf8())])}}
{{>>> ds.dataset("/tmp/example.parquet", schema=large_schema,}}
{{format="parquet").to_table()}}
{{Traceback (most recent call last):}}
{{  File "<stdin>", line 1, in <module>}}
{{  File "pyarrow/_dataset.pyx", line 405, in}}
{{pyarrow._dataset.Dataset.to_table}}
{{  File "pyarrow/_dataset.pyx", line 2262, in}}
{{pyarrow._dataset.Scanner.to_table}}
{{  File "pyarrow/error.pxi", line 122, in}}
{{pyarrow.lib.pyarrow_internal_check_status}}
{{  File "pyarrow/error.pxi", line 107, in pyarrow.lib.check_status}}
{{pyarrow.lib.ArrowTypeError: fields had matching names but differing types.}}
{{From: utf8: string To: utf8: large_string}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)