You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Antoine Pitrou (JIRA)" <ji...@apache.org> on 2018/04/13 10:34:00 UTC

[jira] [Created] (PARQUET-1269) [C++] Scanning fails with list columns

Antoine Pitrou created PARQUET-1269:
---------------------------------------

             Summary: [C++] Scanning fails with list columns
                 Key: PARQUET-1269
                 URL: https://issues.apache.org/jira/browse/PARQUET-1269
             Project: Parquet
          Issue Type: Bug
          Components: parquet-cpp
            Reporter: Antoine Pitrou


{code:python}
>>> list_arr = pa.array([[1, 2], [3, 4, 5]])
>>> int_arr = pa.array([10, 11])
>>> table = pa.Table.from_arrays([int_arr, list_arr], ['ints', 'lists'])
>>> bio = io.BytesIO()
>>> pq.write_table(table, bio)
>>> bio.seek(0)
0
>>> reader = pq.ParquetReader()
>>> reader.open(bio)
>>> reader.scan_contents()
Traceback (most recent call last):
  File "<ipython-input-23-58e977f6d60b>", line 1, in <module>
    reader.scan_contents()
  File "_parquet.pyx", line 753, in pyarrow._parquet.ParquetReader.scan_contents
  File "error.pxi", line 79, in pyarrow.lib.check_status
ArrowIOError: Parquet error: Total rows among columns do not match
{code}

ScanFileContents() claims it returns the "number of semantic rows" but apparently it actually counts the number of physical elements?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)