You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "quentin lhoest (Jira)" <ji...@apache.org> on 2020/05/18 14:08:00 UTC

[jira] [Created] (BEAM-10022) [Python] Error with `WriteToParquet` with empty buffer

quentin lhoest created BEAM-10022:
-------------------------------------

             Summary: [Python] Error with `WriteToParquet` with empty buffer
                 Key: BEAM-10022
                 URL: https://issues.apache.org/jira/browse/BEAM-10022
             Project: Beam
          Issue Type: Bug
          Components: io-py-parquet
    Affects Versions: 2.20.0
            Reporter: quentin lhoest


While using `WriteToParquet` I encounter this issue

{noformat}

File "/usr/local/lib/python3.7/site-packages/apache_beam/io/iobase.py", line 1066, in finish_bundle
 self.writer.close(),
 File "/usr/local/lib/python3.7/site-packages/apache_beam/io/filebasedsink.py", line 423, in close
 self.sink.close(self.temp_handle)
 File "/usr/local/lib/python3.7/site-packages/apache_beam/io/parquetio.py", line 538, in close
 self._flush_buffer()
 File "/usr/local/lib/python3.7/site-packages/apache_beam/io/parquetio.py", line 570, in _flush_buffer
 size = size + b.size
AttributeError: 'NoneType' object has no attribute 'size'

{noformat}


This is because when instantiating an empty array `array=pa.array([])`, then `array.buffers()` returns `[None]`. However right now `_flush_buffer` always assume that buffers are not empty when incrementing the `size`.

One simple fix would be simply to add `if b is not None:` before incrementing `size`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)