You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "quentin lhoest (Jira)" <ji...@apache.org> on 2020/05/18 14:08:00 UTC
[jira] [Created] (BEAM-10022) [Python] Error with `WriteToParquet`
with empty buffer
quentin lhoest created BEAM-10022:
-------------------------------------
Summary: [Python] Error with `WriteToParquet` with empty buffer
Key: BEAM-10022
URL: https://issues.apache.org/jira/browse/BEAM-10022
Project: Beam
Issue Type: Bug
Components: io-py-parquet
Affects Versions: 2.20.0
Reporter: quentin lhoest
While using `WriteToParquet` I encounter this issue
{noformat}
File "/usr/local/lib/python3.7/site-packages/apache_beam/io/iobase.py", line 1066, in finish_bundle
self.writer.close(),
File "/usr/local/lib/python3.7/site-packages/apache_beam/io/filebasedsink.py", line 423, in close
self.sink.close(self.temp_handle)
File "/usr/local/lib/python3.7/site-packages/apache_beam/io/parquetio.py", line 538, in close
self._flush_buffer()
File "/usr/local/lib/python3.7/site-packages/apache_beam/io/parquetio.py", line 570, in _flush_buffer
size = size + b.size
AttributeError: 'NoneType' object has no attribute 'size'
{noformat}
This is because when instantiating an empty array `array=pa.array([])`, then `array.buffers()` returns `[None]`. However right now `_flush_buffer` always assume that buffers are not empty when incrementing the `size`.
One simple fix would be simply to add `if b is not None:` before incrementing `size`
--
This message was sent by Atlassian Jira
(v8.3.4#803005)