You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/12/21 11:48:35 UTC
[GitHub] [iceberg] Fokko opened a new pull request, #6468: Python: Parse UUID as binary in PyArrow
Fokko opened a new pull request, #6468:
URL: https://github.com/apache/iceberg/pull/6468
It causes to throw a cast exception:
```
---------------------------------------------------------------------------
ArrowNotImplementedError Traceback (most recent call last)
Input In [1], in <cell line: 7>()
3 cat = load_catalog('rest')
5 tbl = cat.load_table('fokko.uuid_partitioned')
----> 7 tbl.scan().to_arrow()
File ~/Desktop/iceberg/python/pyiceberg/table/__init__.py:386, in DataScan.to_arrow(self)
375 from pyarrow.dataset import dataset
377 ds = dataset(
378 source=locations,
379 filesystem=fs,
(...)
383 schema=schema_to_pyarrow(self.table.schema()),
384 )
--> 386 return ds.to_table(filter=pyarrow_filter, columns=columns)
File /opt/homebrew/lib/python3.9/site-packages/pyarrow/_dataset.pyx:304, in pyarrow._dataset.Dataset.to_table()
File /opt/homebrew/lib/python3.9/site-packages/pyarrow/_dataset.pyx:2549, in pyarrow._dataset.Scanner.to_table()
File /opt/homebrew/lib/python3.9/site-packages/pyarrow/error.pxi:144, in pyarrow.lib.pyarrow_internal_check_status()
File /opt/homebrew/lib/python3.9/site-packages/pyarrow/error.pxi:121, in pyarrow.lib.check_status()
ArrowNotImplementedError: Unsupported cast from fixed_size_binary[16] to extension<pyiceberg.uuid<UuidType>> (no available cast function for target type)
```
I tried to deserialize it into the custom type, but I didn't get to the buffers. I decided to keep it a fixed[16] because:
- It would be a major performance penalty to move this to Python
- Also downstream have limited support for UUID
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] nastra commented on a diff in pull request #6468: Python: Parse UUID as binary in PyArrow
Posted by GitBox <gi...@apache.org>.
nastra commented on code in PR #6468:
URL: https://github.com/apache/iceberg/pull/6468#discussion_r1054467944
##########
python/pyiceberg/io/pyarrow.py:
##########
@@ -366,7 +352,7 @@ def visit_string(self, _: StringType) -> pa.DataType:
return pa.string()
def visit_uuid(self, _: UUIDType) -> pa.DataType:
- return UuidType()
+ return pa.binary(16)
Review Comment:
this makes sense, we're doing the same in https://github.com/apache/iceberg/blob/c07f2aabc0a1d02f068ecf1514d2479c0fbdd3b0/arrow/src/main/java/org/apache/iceberg/arrow/vectorized/VectorizedArrowReader.java#L325-L327
which was introduced by #2739
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] nastra commented on a diff in pull request #6468: Python: Parse UUID as binary in PyArrow
Posted by GitBox <gi...@apache.org>.
nastra commented on code in PR #6468:
URL: https://github.com/apache/iceberg/pull/6468#discussion_r1054467944
##########
python/pyiceberg/io/pyarrow.py:
##########
@@ -366,7 +352,7 @@ def visit_string(self, _: StringType) -> pa.DataType:
return pa.string()
def visit_uuid(self, _: UUIDType) -> pa.DataType:
- return UuidType()
+ return pa.binary(16)
Review Comment:
this makes sense, we're doing the same in https://github.com/apache/iceberg/blob/c07f2aabc0a1d02f068ecf1514d2479c0fbdd3b0/arrow/src/main/java/org/apache/iceberg/arrow/vectorized/VectorizedArrowReader.java#L325-L327
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] rdblue merged pull request #6468: Python: Parse UUID as binary in PyArrow
Posted by GitBox <gi...@apache.org>.
rdblue merged PR #6468:
URL: https://github.com/apache/iceberg/pull/6468
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org