You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/12/15 14:25:27 UTC
[GitHub] [iceberg] joshuarobinson opened a new issue, #6435: PyIceberg: Avro decode EOF error
joshuarobinson opened a new issue, #6435:
URL: https://github.com/apache/iceberg/issues/6435
### Feature Request / Improvement
In reading manifests for a table for a table scan in PyIceberg 0.2.0, I get an EOFError.
Table was originally written in June 2022 with the most recent version of Spark/Iceberg at that time. (3.2.3 and 0.14.1 I think)
Full tracebook of the error:
```
Traceback (most recent call last):
File "/read.py", line 16, in <module>
print(tbl.scan(selected_fields=("time", "icao24","call_sign","origin_country")).to_arrow().to_pandas().head())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pyiceberg/table/__init__.py", line 350, in to_arrow
for task in self.plan_files():
File "/usr/local/lib/python3.11/site-packages/pyiceberg/table/__init__.py", line 335, in plan_files
yield from (FileScanTask(file) for file in matching_partition_files)
File "/usr/local/lib/python3.11/site-packages/pyiceberg/table/__init__.py", line 335, in <genexpr>
yield from (FileScanTask(file) for file in matching_partition_files)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pyiceberg/manifest.py", line 149, in <genexpr>
return (entry.data_file for entry in live_entries(input_file))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pyiceberg/manifest.py", line 145, in <genexpr>
return (entry for entry in read_manifest_entry(input_file) if entry.status != ManifestEntryStatus.DELETED)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pyiceberg/manifest.py", line 139, in read_manifest_entry
for record in reader:
File "/usr/local/lib/python3.11/site-packages/pyiceberg/avro/file.py", line 178, in __next__
return self.__next__()
^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pyiceberg/avro/file.py", line 170, in __next__
return next(self.block)
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pyiceberg/avro/file.py", line 106, in __next__
return self.reader.read(self.block_decoder)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pyiceberg/avro/reader.py", line 283, in read
result[pos] = field.read(decoder)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pyiceberg/avro/reader.py", line 283, in read
result[pos] = field.read(decoder)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pyiceberg/avro/reader.py", line 267, in read
return self.option.read(decoder)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pyiceberg/avro/reader.py", line 329, in read
read_items[key] = self.value.read(decoder)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pyiceberg/avro/reader.py", line 234, in read
return decoder.read_bytes()
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pyiceberg/avro/decoder.py", line 122, in read_bytes
return self.read(self.read_int())
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pyiceberg/avro/decoder.py", line 58, in read
raise EOFError
EOFError
```
The table can be read successfully with Trino.
for interest, the schema of my table is
```
table {
1: time: optional long
2: icao24: optional string
3: callsign: optional string
4: origin_country: optional string
5: time_position: optional double
6: last_contact: optional long
7: longitude: optional double
8: latitude: optional double
9: baro_altitude: optional double
10: on_ground: optional boolean
11: velocity: optional double
12: true_track: optional double
13: vertical_rate: optional double
14: geo_altitude: optional double
15: squawk: optional string
16: spi: optional boolean
17: position_source: optional long
}
```
### Query engine
None
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] Fokko commented on issue #6435: PyIceberg: Avro decode EOF error
Posted by GitBox <gi...@apache.org>.
Fokko commented on issue #6435:
URL: https://github.com/apache/iceberg/issues/6435#issuecomment-1373416695
Able to reproduce this locally:
```
>>> from pyiceberg.manifest import read_manifest_entry
>>> list(read_manifest_entry(io.new_input('/Users/fokkodriesprong/Desktop/iceberg_6435_metadata/c8562c72-8cf0-4474-b75d-399938b6e335-m0.avro')))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/fokkodriesprong/Desktop/iceberg/python/pyiceberg/manifest.py", line 139, in read_manifest_entry
for record in reader:
File "/Users/fokkodriesprong/Desktop/iceberg/python/pyiceberg/avro/file.py", line 194, in __next__
return self.__next__()
File "/Users/fokkodriesprong/Desktop/iceberg/python/pyiceberg/avro/file.py", line 186, in __next__
return next(self.block)
File "/Users/fokkodriesprong/Desktop/iceberg/python/pyiceberg/avro/file.py", line 113, in __next__
return self.reader.read(self.block_decoder)
File "/Users/fokkodriesprong/Desktop/iceberg/python/pyiceberg/avro/reader.py", line 260, in read
struct.set(pos, field.read(decoder)) # later: pass reuse in here
File "/Users/fokkodriesprong/Desktop/iceberg/python/pyiceberg/avro/reader.py", line 260, in read
struct.set(pos, field.read(decoder)) # later: pass reuse in here
File "/Users/fokkodriesprong/Desktop/iceberg/python/pyiceberg/avro/reader.py", line 233, in read
return self.option.read(decoder)
File "/Users/fokkodriesprong/Desktop/iceberg/python/pyiceberg/avro/reader.py", line 313, in read
read_items[key] = self.value.read(decoder)
File "/Users/fokkodriesprong/Desktop/iceberg/python/pyiceberg/avro/reader.py", line 200, in read
return decoder.read_bytes()
File "/Users/fokkodriesprong/Desktop/iceberg/python/pyiceberg/avro/decoder.py", line 118, in read_bytes
return self.read(self.read_int())
File "/Users/fokkodriesprong/Desktop/iceberg/python/pyiceberg/avro/decoder.py", line 54, in read
raise EOFError
EOFError
```
Thanks for sharing the Avro files
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] joshuarobinson commented on issue #6435: PyIceberg: Avro decode EOF error
Posted by GitBox <gi...@apache.org>.
joshuarobinson commented on issue #6435:
URL: https://github.com/apache/iceberg/issues/6435#issuecomment-1353182160
The table in question has only one snapshot. I'm attaching the json and avro metadata files for this table.
[iceberg_6435_metadata.zip](https://github.com/apache/iceberg/files/10237824/iceberg_6435_metadata.zip)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] Fokko closed issue #6435: PyIceberg: Avro decode EOF error
Posted by GitBox <gi...@apache.org>.
Fokko closed issue #6435: PyIceberg: Avro decode EOF error
URL: https://github.com/apache/iceberg/issues/6435
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org