You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "grobgl (via GitHub)" <gi...@apache.org> on 2023/05/18 13:16:13 UTC

[GitHub] [iceberg] grobgl opened a new issue, #7644: Python: Pickle support

grobgl opened a new issue, #7644:
URL: https://github.com/apache/iceberg/issues/7644

   ### Feature Request / Improvement
   
   Right now, various classes in PyIceberg cannot be pickled. This is due `__new__` lacking a corresponding `__getnewargs__` in various places ([pickle docs](https://docs.python.org/3/library/pickle.html#object.__getnewargs__)). Pickle support is required by, e.g., Dask to serialise tasks as they are distributed to workers.
   
   Here is a simple example:
   ```python
   import pickle
   from pyiceberg.expressions import And, EqualTo
   
   ex = And(EqualTo('name', 'test'), EqualTo('value', 0))
   pickle.loads(pickle.dumps(ex))
   ```
   
   This results in the following exception:
   ```
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
   TypeError: And.__new__() missing 2 required positional arguments: 'left' and 'right'
   ```
   
   I'm happy to provide a fix, but I'd like to hear your thoughts first.
   
   ### Query engine
   
   Other


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on issue #7644: Python: Pickle support

Posted by "Fokko (via GitHub)" <gi...@apache.org>.
Fokko commented on issue #7644:
URL: https://github.com/apache/iceberg/issues/7644#issuecomment-1590762684

   @grobgl Sorry for the lack of attention. I was going through https://github.com/apache/iceberg/issues/5800#issuecomment-1553073273 and I think it makes sense to add this. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko closed issue #7644: Python: Pickle support

Posted by "Fokko (via GitHub)" <gi...@apache.org>.
Fokko closed issue #7644: Python: Pickle support
URL: https://github.com/apache/iceberg/issues/7644


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on issue #7644: Python: Pickle support

Posted by "Fokko (via GitHub)" <gi...@apache.org>.
Fokko commented on issue #7644:
URL: https://github.com/apache/iceberg/issues/7644#issuecomment-1553340015

   I still have to look into your comment in #5800, but I don't want to distribute the planning process. A `DataScan` produces a set of tasks using the `.plan_files()` method. You probably want to fan out those tasks across the workers. The tasks are heavy lifting, see the `project_table` method in `pyarrow.py`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org