You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/12/15 14:25:27 UTC

[GitHub] [iceberg] joshuarobinson opened a new issue, #6435: PyIceberg: Avro decode EOF error

joshuarobinson opened a new issue, #6435:
URL: https://github.com/apache/iceberg/issues/6435

   ### Feature Request / Improvement
   
   In reading manifests for a table for a table scan in PyIceberg 0.2.0, I get an EOFError.
   
   Table was originally written in June 2022 with the most recent version of Spark/Iceberg at that time. (3.2.3 and 0.14.1 I think)
   
   Full tracebook of the error:
   ```
   Traceback (most recent call last):
     File "/read.py", line 16, in <module>
       print(tbl.scan(selected_fields=("time", "icao24","call_sign","origin_country")).to_arrow().to_pandas().head())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File "/usr/local/lib/python3.11/site-packages/pyiceberg/table/__init__.py", line 350, in to_arrow
       for task in self.plan_files():
     File "/usr/local/lib/python3.11/site-packages/pyiceberg/table/__init__.py", line 335, in plan_files
       yield from (FileScanTask(file) for file in matching_partition_files)
     File "/usr/local/lib/python3.11/site-packages/pyiceberg/table/__init__.py", line 335, in <genexpr>
       yield from (FileScanTask(file) for file in matching_partition_files)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File "/usr/local/lib/python3.11/site-packages/pyiceberg/manifest.py", line 149, in <genexpr>
       return (entry.data_file for entry in live_entries(input_file))
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File "/usr/local/lib/python3.11/site-packages/pyiceberg/manifest.py", line 145, in <genexpr>
       return (entry for entry in read_manifest_entry(input_file) if entry.status != ManifestEntryStatus.DELETED)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File "/usr/local/lib/python3.11/site-packages/pyiceberg/manifest.py", line 139, in read_manifest_entry
       for record in reader:
     File "/usr/local/lib/python3.11/site-packages/pyiceberg/avro/file.py", line 178, in __next__
       return self.__next__()
              ^^^^^^^^^^^^^^^
     File "/usr/local/lib/python3.11/site-packages/pyiceberg/avro/file.py", line 170, in __next__
       return next(self.block)
              ^^^^^^^^^^^^^^^^
     File "/usr/local/lib/python3.11/site-packages/pyiceberg/avro/file.py", line 106, in __next__
       return self.reader.read(self.block_decoder)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File "/usr/local/lib/python3.11/site-packages/pyiceberg/avro/reader.py", line 283, in read
       result[pos] = field.read(decoder)
                     ^^^^^^^^^^^^^^^^^^^
     File "/usr/local/lib/python3.11/site-packages/pyiceberg/avro/reader.py", line 283, in read
       result[pos] = field.read(decoder)
                     ^^^^^^^^^^^^^^^^^^^
     File "/usr/local/lib/python3.11/site-packages/pyiceberg/avro/reader.py", line 267, in read
       return self.option.read(decoder)
              ^^^^^^^^^^^^^^^^^^^^^^^^^
     File "/usr/local/lib/python3.11/site-packages/pyiceberg/avro/reader.py", line 329, in read
       read_items[key] = self.value.read(decoder)
                         ^^^^^^^^^^^^^^^^^^^^^^^^
     File "/usr/local/lib/python3.11/site-packages/pyiceberg/avro/reader.py", line 234, in read
       return decoder.read_bytes()
              ^^^^^^^^^^^^^^^^^^^^
     File "/usr/local/lib/python3.11/site-packages/pyiceberg/avro/decoder.py", line 122, in read_bytes
       return self.read(self.read_int())
              ^^^^^^^^^^^^^^^^^^^^^^^^^^
     File "/usr/local/lib/python3.11/site-packages/pyiceberg/avro/decoder.py", line 58, in read
       raise EOFError
   EOFError
   ```
   
   The table can be read successfully with Trino.
   
   for interest, the schema of my table is 
   ```
   table {
     1: time: optional long
     2: icao24: optional string
     3: callsign: optional string
     4: origin_country: optional string
     5: time_position: optional double
     6: last_contact: optional long
     7: longitude: optional double
     8: latitude: optional double
     9: baro_altitude: optional double
     10: on_ground: optional boolean
     11: velocity: optional double
     12: true_track: optional double
     13: vertical_rate: optional double
     14: geo_altitude: optional double
     15: squawk: optional string
     16: spi: optional boolean
     17: position_source: optional long
   }
   ```
   
   ### Query engine
   
   None


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on issue #6435: PyIceberg: Avro decode EOF error

Posted by GitBox <gi...@apache.org>.
Fokko commented on issue #6435:
URL: https://github.com/apache/iceberg/issues/6435#issuecomment-1373416695

   Able to reproduce this locally:
   ```
   >>> from pyiceberg.manifest import read_manifest_entry
   >>> list(read_manifest_entry(io.new_input('/Users/fokkodriesprong/Desktop/iceberg_6435_metadata/c8562c72-8cf0-4474-b75d-399938b6e335-m0.avro')))
   
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File "/Users/fokkodriesprong/Desktop/iceberg/python/pyiceberg/manifest.py", line 139, in read_manifest_entry
       for record in reader:
     File "/Users/fokkodriesprong/Desktop/iceberg/python/pyiceberg/avro/file.py", line 194, in __next__
       return self.__next__()
     File "/Users/fokkodriesprong/Desktop/iceberg/python/pyiceberg/avro/file.py", line 186, in __next__
       return next(self.block)
     File "/Users/fokkodriesprong/Desktop/iceberg/python/pyiceberg/avro/file.py", line 113, in __next__
       return self.reader.read(self.block_decoder)
     File "/Users/fokkodriesprong/Desktop/iceberg/python/pyiceberg/avro/reader.py", line 260, in read
       struct.set(pos, field.read(decoder))  # later: pass reuse in here
     File "/Users/fokkodriesprong/Desktop/iceberg/python/pyiceberg/avro/reader.py", line 260, in read
       struct.set(pos, field.read(decoder))  # later: pass reuse in here
     File "/Users/fokkodriesprong/Desktop/iceberg/python/pyiceberg/avro/reader.py", line 233, in read
       return self.option.read(decoder)
     File "/Users/fokkodriesprong/Desktop/iceberg/python/pyiceberg/avro/reader.py", line 313, in read
       read_items[key] = self.value.read(decoder)
     File "/Users/fokkodriesprong/Desktop/iceberg/python/pyiceberg/avro/reader.py", line 200, in read
       return decoder.read_bytes()
     File "/Users/fokkodriesprong/Desktop/iceberg/python/pyiceberg/avro/decoder.py", line 118, in read_bytes
       return self.read(self.read_int())
     File "/Users/fokkodriesprong/Desktop/iceberg/python/pyiceberg/avro/decoder.py", line 54, in read
       raise EOFError
   EOFError
   ```
   Thanks for sharing the Avro files


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] joshuarobinson commented on issue #6435: PyIceberg: Avro decode EOF error

Posted by GitBox <gi...@apache.org>.
joshuarobinson commented on issue #6435:
URL: https://github.com/apache/iceberg/issues/6435#issuecomment-1353182160

   The table in question has only one snapshot. I'm attaching the json and avro metadata files for this table.
   [iceberg_6435_metadata.zip](https://github.com/apache/iceberg/files/10237824/iceberg_6435_metadata.zip)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko closed issue #6435: PyIceberg: Avro decode EOF error

Posted by GitBox <gi...@apache.org>.
Fokko closed issue #6435: PyIceberg: Avro decode EOF error
URL: https://github.com/apache/iceberg/issues/6435


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org