You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "grobgl (via GitHub)" <gi...@apache.org> on 2023/05/18 13:16:13 UTC
[GitHub] [iceberg] grobgl opened a new issue, #7644: Python: Pickle support
grobgl opened a new issue, #7644:
URL: https://github.com/apache/iceberg/issues/7644
### Feature Request / Improvement
Right now, various classes in PyIceberg cannot be pickled. This is due `__new__` lacking a corresponding `__getnewargs__` in various places ([pickle docs](https://docs.python.org/3/library/pickle.html#object.__getnewargs__)). Pickle support is required by, e.g., Dask to serialise tasks as they are distributed to workers.
Here is a simple example:
```python
import pickle
from pyiceberg.expressions import And, EqualTo
ex = And(EqualTo('name', 'test'), EqualTo('value', 0))
pickle.loads(pickle.dumps(ex))
```
This results in the following exception:
```
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: And.__new__() missing 2 required positional arguments: 'left' and 'right'
```
I'm happy to provide a fix, but I'd like to hear your thoughts first.
### Query engine
Other
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] Fokko commented on issue #7644: Python: Pickle support
Posted by "Fokko (via GitHub)" <gi...@apache.org>.
Fokko commented on issue #7644:
URL: https://github.com/apache/iceberg/issues/7644#issuecomment-1590762684
@grobgl Sorry for the lack of attention. I was going through https://github.com/apache/iceberg/issues/5800#issuecomment-1553073273 and I think it makes sense to add this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] Fokko closed issue #7644: Python: Pickle support
Posted by "Fokko (via GitHub)" <gi...@apache.org>.
Fokko closed issue #7644: Python: Pickle support
URL: https://github.com/apache/iceberg/issues/7644
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] Fokko commented on issue #7644: Python: Pickle support
Posted by "Fokko (via GitHub)" <gi...@apache.org>.
Fokko commented on issue #7644:
URL: https://github.com/apache/iceberg/issues/7644#issuecomment-1553340015
I still have to look into your comment in #5800, but I don't want to distribute the planning process. A `DataScan` produces a set of tasks using the `.plan_files()` method. You probably want to fan out those tasks across the workers. The tasks are heavy lifting, see the `project_table` method in `pyarrow.py`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org