You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "ARF (Jira)" <ji...@apache.org> on 2019/09/08 13:40:00 UTC
[jira] [Updated] (ARROW-6486) [Python] Allow subclassing &
monkey-patching of Table
[ https://issues.apache.org/jira/browse/ARROW-6486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ARF updated ARROW-6486:
-----------------------
Description:
Currently, many classes in ``pyarrow`` behave strangely to the Python user: they are neither subclassable not monkey-patchable.
{{>>> import pyarrow as pa}}
{{>>> class MyTable(pa.Table):}}
{{... pass}}
{{...}}
{{>>> table = MyTable.from_arrays([], [])}}
{{>>> type(table)}}
{{<class 'pyarrow.lib.Table'>}}
The factory method did not return an instance of our subclass...
Never mind, let's monkey-patch {{Table}}:
{{>>> pa.TableOriginal = pa.Table}}
{{>>> pa.Table = MyTable}}
{{>>> table = pa.Table.from_arrays([], [])}}
{{>>> type(table)}}
{{<class 'pyarrow.lib.Table'>}}
\{{}}
OK, that did not work either.
Let's be sneaky:
{{>>> table.__class__ = MyTable}}
{{Traceback (most recent call last):}}
\{{ File "<stdin>", line 1, in <module>}}
{{TypeError: __class__ assignment only supported for heap types or ModuleType subclasses}}
{{>>>}}
There is currently no way to modify or extend the behaviour of a {{Table}} instance. Users can use only what {{pyarrow}} provides out of the box. - This is likely to be a source of frustration for many python users.
The attached PR remedies this for the {{Table}} class:
{{>>> import pyarrow as pa}}
{{>>> class MyTable(pa.Table):}}
{{... pass}}
{{...}}
{{>>> table = MyTable.from_arrays([], [])}}
{{>>> type(table)}}
{{<class '__main__.MyTable'>}}
{{>>>}}
{{>>> pa.TableOriginal = pa.Table}}
{{>>> pa.Table = MyTable}}
{{>>> table = pa.Table.from_arrays([], [])}}
{{>>> type(table)}}
{{<class '__main__.MyTable'>}}
{{>>>}}
Ideally, these modifications would be extended to the other cython-defined classes of {{pyarrow}}, but given that Table is likely to be the interface that most users begin their interaction with, I thought this would be a good start.
Keeping the changes limited to a single class should also keep merge conflicts manageable.
was:
Currently, many classes in ``pyarrow`` behave strangely to the Python user: they are neither subclassable not monkey-patchable.
{{>>> import pyarrow as pa}}
{{>>> class MyTable(pa.Table):}}
{{... pass}}
{{...}}
{{>>> table = MyTable.from_arrays([], [])}}
{{>>> type(table)}}
{{<class 'pyarrow.lib.Table'>}}
The factory method did not return an instance of our subclass...
Never mind, let's monkey-patch {{Table}}:
{{}}
{{>>> pa.TableOriginal = pa.Table}}
{{>>> pa.Table = MyTable}}
{{>>> table = pa.Table.from_arrays([], [])}}
{{>>> type(table)}}
{{<class 'pyarrow.lib.Table'>}}
{{}}
OK, that did not work either.
Let's be sneaky:
{{>>> table.__class__ = MyTable}}
{{Traceback (most recent call last):}}
{{ File "<stdin>", line 1, in <module>}}
{{TypeError: __class__ assignment only supported for heap types or ModuleType subclasses}}
{{>>>}}
There is currently no way to modify or extend the behaviour of a {{Table}} instance. Users can use only what {{pyarrow}} provides out of the box. - This is likely to be a source of frustration for many python users.
The attached PR remedies this for the {{Table}} class:
{{>>> import pyarrow as pa}}
{{>>> class MyTable(pa.Table):}}
{{... pass}}
{{...}}
{{>>> table = MyTable.from_arrays([], [])}}
{{>>> type(table)}}
{{<class '__main__.MyTable'>}}
{{>>>}}
{{>>> pa.TableOriginal = pa.Table}}
{{>>> pa.Table = MyTable}}
{{>>> table = pa.Table.from_arrays([], [])}}
{{>>> type(table)}}
{{<class '__main__.MyTable'>}}
{{>>>}}
Ideally, these modifications would be extended to the other cython-defined classes of {{pyarrow}}, but given that Table is likely to be the interface that most users begin their interaction with, I thought this would be a good start.
Keeping the changes limited to a single class should also keep merge conflicts manageable.
> [Python] Allow subclassing & monkey-patching of Table
> -----------------------------------------------------
>
> Key: ARROW-6486
> URL: https://issues.apache.org/jira/browse/ARROW-6486
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Python
> Reporter: ARF
> Priority: Major
>
> Currently, many classes in ``pyarrow`` behave strangely to the Python user: they are neither subclassable not monkey-patchable.
>
> {{>>> import pyarrow as pa}}
> {{>>> class MyTable(pa.Table):}}
> {{... pass}}
> {{...}}
> {{>>> table = MyTable.from_arrays([], [])}}
> {{>>> type(table)}}
> {{<class 'pyarrow.lib.Table'>}}
> The factory method did not return an instance of our subclass...
> Never mind, let's monkey-patch {{Table}}:
>
> {{>>> pa.TableOriginal = pa.Table}}
> {{>>> pa.Table = MyTable}}
> {{>>> table = pa.Table.from_arrays([], [])}}
> {{>>> type(table)}}
> {{<class 'pyarrow.lib.Table'>}}
> \{{}}
>
> OK, that did not work either.
> Let's be sneaky:
> {{>>> table.__class__ = MyTable}}
> {{Traceback (most recent call last):}}
> \{{ File "<stdin>", line 1, in <module>}}
> {{TypeError: __class__ assignment only supported for heap types or ModuleType subclasses}}
> {{>>>}}
>
> There is currently no way to modify or extend the behaviour of a {{Table}} instance. Users can use only what {{pyarrow}} provides out of the box. - This is likely to be a source of frustration for many python users.
>
> The attached PR remedies this for the {{Table}} class:
> {{>>> import pyarrow as pa}}
> {{>>> class MyTable(pa.Table):}}
> {{... pass}}
> {{...}}
> {{>>> table = MyTable.from_arrays([], [])}}
> {{>>> type(table)}}
> {{<class '__main__.MyTable'>}}
> {{>>>}}
> {{>>> pa.TableOriginal = pa.Table}}
> {{>>> pa.Table = MyTable}}
> {{>>> table = pa.Table.from_arrays([], [])}}
> {{>>> type(table)}}
> {{<class '__main__.MyTable'>}}
> {{>>>}}
>
> Ideally, these modifications would be extended to the other cython-defined classes of {{pyarrow}}, but given that Table is likely to be the interface that most users begin their interaction with, I thought this would be a good start.
> Keeping the changes limited to a single class should also keep merge conflicts manageable.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)