You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "Guillem96 (via GitHub)" <gi...@apache.org> on 2023/02/13 09:07:53 UTC

[GitHub] [iceberg] Guillem96 commented on issue #6820: Cannot create a PyArrow dataframe after an scan

Guillem96 commented on issue #6820:
URL: https://github.com/apache/iceberg/issues/6820#issuecomment-1427582965

   Thank you for the quick reply.
   
   If I update my catalog config to:
   
   ```
   catalog:
     default:
       type: glue
       py-io-impl: pyiceberg.io.pyarrow.PyArrowFileIO
   ```
   
   I'm getting this error when loading the table:
   
   ```
   Traceback (most recent call last):
     File "test_iceberg.py", line 6, in <module>
       table = catalog.load_table(("...", "..."))
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File ".../lib/python3.11/site-packages/pyiceberg/catalog/glue.py", line 278, in load_table
       return self._convert_glue_to_iceberg(load_table_response.get(PROP_GLUE_TABLE, {}))
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File ".../lib/python3.11/site-packages/pyiceberg/catalog/glue.py", line 180, in _convert_glue_to_iceberg
       metadata = FromInputFile.table_metadata(file)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File ".../lib/python3.11/site-packages/pyiceberg/serializers.py", line 56, in table_metadata
       with input_file.open() as input_stream:
            ^^^^^^^^^^^^^^^^^
     File ".../lib/python3.11/site-packages/pyiceberg/io/pyarrow.py", line 197, in open
       input_file = self._filesystem.open_input_file(self._path)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File "pyarrow/_fs.pyx", line 770, in pyarrow._fs.FileSystem.open_input_file
     File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
     File "pyarrow/error.pxi", line 115, in pyarrow.lib.check_status
   OSError: When reading information for key 'iceberg_zstd_v3/metadata/00001-4c2dec16-77be-4a9d-8a47-00f74d2fd881.metadata.json' in bucket 'bucket-name': AWS Error UNKNOWN (HTTP status 301) during HeadObject operation: No response body.
   
   ```
   
   I have AWS admin rights and I can check the the `HeadObject` api calls works fine in the same environment by:
   
   ```
   $ aws s3api head-object --bucket bucket-name --key iceberg_zstd_v3/metadata/00001-4c2dec16-77be-4a9d-8a47-00f74d2fd881.metadata.json
   {
       "AcceptRanges": "bytes",
       "LastModified": "2023-02-08T15:46:48+00:00",
       "ContentLength": 2974,
       "ContentType": "application/octet-stream",
       "ServerSideEncryption": "aws:kms",
       "Metadata": {},
       "SSEKMSKeyId": "..."
   }
   ```
   
   So you know what's going on? Why I cannot read the table metadata if I change the `py-io-impl`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org