You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/12/21 11:48:35 UTC

[GitHub] [iceberg] Fokko opened a new pull request, #6468: Python: Parse UUID as binary in PyArrow

Fokko opened a new pull request, #6468:
URL: https://github.com/apache/iceberg/pull/6468

   It causes to throw a cast exception:
   ```
   ---------------------------------------------------------------------------
   ArrowNotImplementedError                  Traceback (most recent call last)
   Input In [1], in <cell line: 7>()
         3 cat = load_catalog('rest')
         5 tbl = cat.load_table('fokko.uuid_partitioned')
   ----> 7 tbl.scan().to_arrow()
   
   File ~/Desktop/iceberg/python/pyiceberg/table/__init__.py:386, in DataScan.to_arrow(self)
       375 from pyarrow.dataset import dataset
       377 ds = dataset(
       378     source=locations,
       379     filesystem=fs,
      (...)
       383     schema=schema_to_pyarrow(self.table.schema()),
       384 )
   --> 386 return ds.to_table(filter=pyarrow_filter, columns=columns)
   
   File /opt/homebrew/lib/python3.9/site-packages/pyarrow/_dataset.pyx:304, in pyarrow._dataset.Dataset.to_table()
   
   File /opt/homebrew/lib/python3.9/site-packages/pyarrow/_dataset.pyx:2549, in pyarrow._dataset.Scanner.to_table()
   
   File /opt/homebrew/lib/python3.9/site-packages/pyarrow/error.pxi:144, in pyarrow.lib.pyarrow_internal_check_status()
   
   File /opt/homebrew/lib/python3.9/site-packages/pyarrow/error.pxi:121, in pyarrow.lib.check_status()
   
   ArrowNotImplementedError: Unsupported cast from fixed_size_binary[16] to extension<pyiceberg.uuid<UuidType>> (no available cast function for target type)
   ```
   
   I tried to deserialize it into the custom type, but I didn't get to the buffers. I decided to keep it a fixed[16] because:
   
   - It would be a major performance penalty to move this to Python
   - Also downstream have limited support for UUID


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #6468: Python: Parse UUID as binary in PyArrow

Posted by GitBox <gi...@apache.org>.
nastra commented on code in PR #6468:
URL: https://github.com/apache/iceberg/pull/6468#discussion_r1054467944


##########
python/pyiceberg/io/pyarrow.py:
##########
@@ -366,7 +352,7 @@ def visit_string(self, _: StringType) -> pa.DataType:
         return pa.string()
 
     def visit_uuid(self, _: UUIDType) -> pa.DataType:
-        return UuidType()
+        return pa.binary(16)

Review Comment:
   this makes sense, we're doing the same in https://github.com/apache/iceberg/blob/c07f2aabc0a1d02f068ecf1514d2479c0fbdd3b0/arrow/src/main/java/org/apache/iceberg/arrow/vectorized/VectorizedArrowReader.java#L325-L327
   which was introduced by #2739



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #6468: Python: Parse UUID as binary in PyArrow

Posted by GitBox <gi...@apache.org>.
nastra commented on code in PR #6468:
URL: https://github.com/apache/iceberg/pull/6468#discussion_r1054467944


##########
python/pyiceberg/io/pyarrow.py:
##########
@@ -366,7 +352,7 @@ def visit_string(self, _: StringType) -> pa.DataType:
         return pa.string()
 
     def visit_uuid(self, _: UUIDType) -> pa.DataType:
-        return UuidType()
+        return pa.binary(16)

Review Comment:
   this makes sense, we're doing the same in https://github.com/apache/iceberg/blob/c07f2aabc0a1d02f068ecf1514d2479c0fbdd3b0/arrow/src/main/java/org/apache/iceberg/arrow/vectorized/VectorizedArrowReader.java#L325-L327



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #6468: Python: Parse UUID as binary in PyArrow

Posted by GitBox <gi...@apache.org>.
rdblue merged PR #6468:
URL: https://github.com/apache/iceberg/pull/6468


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org