You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Mitar <mm...@gmail.com> on 2018/03/04 09:08:11 UTC

How to properly serialize subclasses of supported classes

Hi!

I have a subclass of numpy and another of pandas which add a metadata
attribute to them. Moreover, I have a subclass of typing.List as a
Python generic with this metadata attribute as well.

Now, it seems if I serialize this to plasma store and back I get
standard numpy, pandas, or list back, respectively.

My question is: how can I make it so that proper subclasses are
returned, including the custom metadata attribute?

I tried to use pyarrow_lib._default_serialization_context.register_type
but it does not seem to work. Moreover, I still worry that even if I
create a serialization for a custom class, if anyone makes a subclass
and tries to store it plasma store they will get back the custom class
and not a subclass.

This is how I am testing:

https://gitlab.com/datadrivendiscovery/metadata/blob/plasma/tests/test_plasma.py#L50

And here is the code for custom numpy class and attempt at registering
custom serialization:

https://gitlab.com/datadrivendiscovery/metadata/blob/plasma/d3m_metadata/container/numpy.py#L135

It looks like custom serialization is not called.


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m

Re: How to properly serialize subclasses of supported classes

Posted by Robert Nishihara <ro...@gmail.com>.
We just chatted offline. Should be fixed by
https://github.com/apache/arrow/pull/1704.

On Mon, Mar 5, 2018 at 3:42 AM Mitar <mm...@gmail.com> wrote:

> Hi!
>
> You mean, this explains why a subclass of list is not being matched? Maybe.
>
> But I do not get why my custom serialization for ndarray subclass is
> never called.
>
> Or how hard would it be to automatically serialize/deserialize into
> subclasses so that I would not have to have a custom serialization for
> ndarray but the existing ndarray serialization would work, casting it
> into a proper subclass.
>
>
> Mitar
>
> On Sun, Mar 4, 2018 at 2:39 PM, Robert Nishihara
> <ro...@gmail.com> wrote:
> > The issue is probably this line
> >
> >
> https://github.com/apache/arrow/blob/8b1c8118b017a941f0102709d72df7e5a9783aa4/cpp/src/arrow/python/python_to_arrow.cc#L504
> >
> > which uses PyList_Check instead of PyList_CheckExact. Changing it to the
> > exact form will cause it to use the custom serializer for subclasses of
> > list.
> >
> > On Sun, Mar 4, 2018 at 1:08 AM Mitar <mm...@gmail.com> wrote:
> >>
> >> Hi!
> >>
> >> I have a subclass of numpy and another of pandas which add a metadata
> >> attribute to them. Moreover, I have a subclass of typing.List as a
> >> Python generic with this metadata attribute as well.
> >>
> >> Now, it seems if I serialize this to plasma store and back I get
> >> standard numpy, pandas, or list back, respectively.
> >>
> >> My question is: how can I make it so that proper subclasses are
> >> returned, including the custom metadata attribute?
> >>
> >> I tried to use pyarrow_lib._default_serialization_context.register_type
> >> but it does not seem to work. Moreover, I still worry that even if I
> >> create a serialization for a custom class, if anyone makes a subclass
> >> and tries to store it plasma store they will get back the custom class
> >> and not a subclass.
> >>
> >> This is how I am testing:
> >>
> >>
> >>
> https://gitlab.com/datadrivendiscovery/metadata/blob/plasma/tests/test_plasma.py#L50
> >>
> >> And here is the code for custom numpy class and attempt at registering
> >> custom serialization:
> >>
> >>
> >>
> https://gitlab.com/datadrivendiscovery/metadata/blob/plasma/d3m_metadata/container/numpy.py#L135
> >>
> >> It looks like custom serialization is not called.
> >>
> >>
> >> Mitar
> >>
> >> --
> >> http://mitar.tnode.com/
> >> https://twitter.com/mitar_m
>
>
>
> --
> http://mitar.tnode.com/
> https://twitter.com/mitar_m
>

Re: How to properly serialize subclasses of supported classes

Posted by Mitar <mm...@gmail.com>.
Hi!

You mean, this explains why a subclass of list is not being matched? Maybe.

But I do not get why my custom serialization for ndarray subclass is
never called.

Or how hard would it be to automatically serialize/deserialize into
subclasses so that I would not have to have a custom serialization for
ndarray but the existing ndarray serialization would work, casting it
into a proper subclass.


Mitar

On Sun, Mar 4, 2018 at 2:39 PM, Robert Nishihara
<ro...@gmail.com> wrote:
> The issue is probably this line
>
> https://github.com/apache/arrow/blob/8b1c8118b017a941f0102709d72df7e5a9783aa4/cpp/src/arrow/python/python_to_arrow.cc#L504
>
> which uses PyList_Check instead of PyList_CheckExact. Changing it to the
> exact form will cause it to use the custom serializer for subclasses of
> list.
>
> On Sun, Mar 4, 2018 at 1:08 AM Mitar <mm...@gmail.com> wrote:
>>
>> Hi!
>>
>> I have a subclass of numpy and another of pandas which add a metadata
>> attribute to them. Moreover, I have a subclass of typing.List as a
>> Python generic with this metadata attribute as well.
>>
>> Now, it seems if I serialize this to plasma store and back I get
>> standard numpy, pandas, or list back, respectively.
>>
>> My question is: how can I make it so that proper subclasses are
>> returned, including the custom metadata attribute?
>>
>> I tried to use pyarrow_lib._default_serialization_context.register_type
>> but it does not seem to work. Moreover, I still worry that even if I
>> create a serialization for a custom class, if anyone makes a subclass
>> and tries to store it plasma store they will get back the custom class
>> and not a subclass.
>>
>> This is how I am testing:
>>
>>
>> https://gitlab.com/datadrivendiscovery/metadata/blob/plasma/tests/test_plasma.py#L50
>>
>> And here is the code for custom numpy class and attempt at registering
>> custom serialization:
>>
>>
>> https://gitlab.com/datadrivendiscovery/metadata/blob/plasma/d3m_metadata/container/numpy.py#L135
>>
>> It looks like custom serialization is not called.
>>
>>
>> Mitar
>>
>> --
>> http://mitar.tnode.com/
>> https://twitter.com/mitar_m



-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m

Re: How to properly serialize subclasses of supported classes

Posted by Robert Nishihara <ro...@gmail.com>.
The issue is probably this line

https://github.com/apache/arrow/blob/8b1c8118b017a941f0102709d72df7e5a9783aa4/cpp/src/arrow/python/python_to_arrow.cc#L504

which uses *PyList_Check* instead of *PyList_CheckExact*. Changing it to
the exact form will cause it to use the custom serializer for subclasses of
list.

On Sun, Mar 4, 2018 at 1:08 AM Mitar <mm...@gmail.com> wrote:

> Hi!
>
> I have a subclass of numpy and another of pandas which add a metadata
> attribute to them. Moreover, I have a subclass of typing.List as a
> Python generic with this metadata attribute as well.
>
> Now, it seems if I serialize this to plasma store and back I get
> standard numpy, pandas, or list back, respectively.
>
> My question is: how can I make it so that proper subclasses are
> returned, including the custom metadata attribute?
>
> I tried to use pyarrow_lib._default_serialization_context.register_type
> but it does not seem to work. Moreover, I still worry that even if I
> create a serialization for a custom class, if anyone makes a subclass
> and tries to store it plasma store they will get back the custom class
> and not a subclass.
>
> This is how I am testing:
>
>
> https://gitlab.com/datadrivendiscovery/metadata/blob/plasma/tests/test_plasma.py#L50
>
> And here is the code for custom numpy class and attempt at registering
> custom serialization:
>
>
> https://gitlab.com/datadrivendiscovery/metadata/blob/plasma/d3m_metadata/container/numpy.py#L135
>
> It looks like custom serialization is not called.
>
>
> Mitar
>
> --
> http://mitar.tnode.com/
> https://twitter.com/mitar_m
>