You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2023/01/10 10:01:09 UTC

[GitHub] [iceberg] a-agmon opened a new issue, #6553: PyIceberg fails reading Avro file

a-agmon opened a new issue, #6553:
URL: https://github.com/apache/iceberg/issues/6553

   ### Apache Iceberg version
   
   1.1.0 (latest release)
   
   ### Query engine
   
   Other
   
   ### Please describe the bug 🐞
   
   Running simple scan with AWS Glue fails with `EOFError`. 
   Using .yaml config as follows 
   ```
   catalog:
    default:
     type: glue
   ```
    and the following code:
   ```
       catalog = load_catalog("default")
       table = catalog.load_table("db.table")
       print(table.location())
       scan = table.scan(
           selected_fields=("A", "B", "C")
       )
       for f in scan.plan_files():
           print(f)
   ```
   the script prints the S3 location of the table so there shouldn't be a problem with AWS S3 access. 
   The script fails on the print with: 
   
   ```
     File "....../pyiceberg/avro/decoder.py", line 54, in read
       raise EOFError
   EOFError
   ```
   pyIceberg installed using` pip3 install pyiceberg[glue,duckdb]`
   
   Full stacktrace 
   
   ```
   Traceback (most recent call last):
     File /work/pyiceberg/main.py", line 13, in <module>
       for f in scan.plan_files():
     File "/work/pyiceberg/venv/lib/python3.10/site-packages/pyiceberg/table/__init__.py", line 334, in plan_files
       yield from (FileScanTask(file) for file in matching_partition_files)
     File "//work/pyiceberg/venv/lib/python3.10/site-packages/pyiceberg/table/__init__.py", line 334, in <genexpr>
       yield from (FileScanTask(file) for file in matching_partition_files)
     File "work/pyiceberg/venv/lib/python3.10/site-packages/pyiceberg/manifest.py", line 149, in <genexpr>
       return (entry.data_file for entry in live_entries(input_file))
     File "/work/pyiceberg/venv/lib/python3.10/site-packages/pyiceberg/manifest.py", line 145, in <genexpr>
       return (entry for entry in read_manifest_entry(input_file) if entry.status != ManifestEntryStatus.DELETED)
     File "/work/pyiceberg/venv/lib/python3.10/site-packages/pyiceberg/manifest.py", line 139, in read_manifest_entry
       for record in reader:
     File "work/pyiceberg/venv/lib/python3.10/site-packages/pyiceberg/avro/file.py", line 178, in __next__
       return self.__next__()
     File "/work/pyiceberg/venv/lib/python3.10/site-packages/pyiceberg/avro/file.py", line 170, in __next__
       return next(self.block)
     File "work/pyiceberg/venv/lib/python3.10/site-packages/pyiceberg/avro/file.py", line 106, in __next__
       return self.reader.read(self.block_decoder)
     File "work/pyiceberg/venv/lib/python3.10/site-packages/pyiceberg/avro/reader.py", line 284, in read
       result[pos] = field.read(decoder)
     File "work/pyiceberg/venv/lib/python3.10/site-packages/pyiceberg/avro/reader.py", line 284, in read
       result[pos] = field.read(decoder)
     File "work/pyiceberg/venv/lib/python3.10/site-packages/pyiceberg/avro/reader.py", line 268, in read
       return self.option.read(decoder)
     File "/work/pyiceberg/venv/lib/python3.10/site-packages/pyiceberg/avro/reader.py", line 330, in read
       read_items[key] = self.value.read(decoder)
     File "/work/pyiceberg/venv/lib/python3.10/site-packages/pyiceberg/avro/reader.py", line 235, in read
       return decoder.read_bytes()
     File "/work/pyiceberg/venv/lib/python3.10/site-packages/pyiceberg/avro/decoder.py", line 118, in read_bytes
       return self.read(self.read_int())
     File "/work/pyiceberg/venv/lib/python3.10/site-packages/pyiceberg/avro/decoder.py", line 54, in read
       raise EOFError
   EOFError
   
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on issue #6553: PyIceberg fails reading Avro file with EOFError

Posted by GitBox <gi...@apache.org>.
nastra commented on issue #6553:
URL: https://github.com/apache/iceberg/issues/6553#issuecomment-1377026499

   @a-agmon thanks for reporting. This seems to be a duplicate of https://github.com/apache/iceberg/issues/6435 and was fixed by https://github.com/apache/iceberg/pull/6532. The fix for this should be released with pyiceberg 0.3.0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra closed issue #6553: PyIceberg fails reading Avro file with EOFError

Posted by GitBox <gi...@apache.org>.
nastra closed issue #6553: PyIceberg fails reading Avro file with EOFError
URL: https://github.com/apache/iceberg/issues/6553


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org