You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@arrow.apache.org by Philip Moore via user <us...@arrow.apache.org> on 2023/05/11 17:35:39 UTC

PyArrow 12 serialization

Hello,

                I’m attempting to use Apache Superset with PyArrow 12.0.0 – and it has this section of code:


if use_msgpack:
    with stats_timing(
        "sqllab.query.results_backend_pa_serialization", stats_logger
    ):
        data = (
            pa.default_serialization_context()
            .serialize(result_set.pa_table)
            .to_buffer()
            .to_pybytes()
        )

    # expand when loading data from results backend
    all_columns, expanded_columns = (selected_columns, [])


                That code worked fine in PyArrow 11.0.0 – but it appears that “default_serialization_context()” is removed in PyArrow 12.

                Can you advise on what this code should look like for use with PyArrow 12?

                Thank you.

Phil

Re: PyArrow 12 serialization

Posted by Philip Moore via user <us...@arrow.apache.org>.

Thanks, Will – it works great!

From: Will Jones <wi...@gmail.com>
Date: Thursday, May 11, 2023 at 1:44 PM
To: user@arrow.apache.org <us...@arrow.apache.org>, Philip Moore <ph...@voltrondata.com>
Subject: Re: PyArrow 12 serialization
Hi Phil,

It looks like you are trying to serialize a PyArrow table to Python bytes.

This function (from [1]) will give you a PyArrow Buffer object:

def write_ipc_buffer(table: pa.Table) -> pa.Buffer:
    sink = pa.BufferOutputStream()

    with pa.ipc.new_stream(sink, table.schema) as writer:
        writer.write_table(table)

    return sink.getvalue()

Then you can call to_pybytes() on that buffer. You will later be able to read that with:

    reader = pa.BufferReader(buffer)
    table = pa.ipc.open_stream(reader).read_all()

Best,

Will Jones

[1] https://github.com/wjones127/arrow-ipc-bench/blob/89d68b4d7cfcb3f5d28b6000abdb801c93198bbf/share_arrow.py#L19-L25

On Thu, May 11, 2023 at 10:35 AM Philip Moore via user <us...@arrow.apache.org>> wrote:
Hello,

                I’m attempting to use Apache Superset with PyArrow 12.0.0 – and it has this section of code:

if use_msgpack:
    with stats_timing(
        "sqllab.query.results_backend_pa_serialization", stats_logger
    ):
        data = (
            pa.default_serialization_context()
            .serialize(result_set.pa_table)
            .to_buffer()
            .to_pybytes()
        )

    # expand when loading data from results backend
    all_columns, expanded_columns = (selected_columns, [])


                That code worked fine in PyArrow 11.0.0 – but it appears that “default_serialization_context()” is removed in PyArrow 12.

                Can you advise on what this code should look like for use with PyArrow 12?

                Thank you.

Phil

Re: PyArrow 12 serialization

Posted by Will Jones <wi...@gmail.com>.

Hi Phil,

It looks like you are trying to serialize a PyArrow table to Python bytes.

This function (from [1]) will give you a PyArrow Buffer object:

def write_ipc_buffer(table: pa.Table) -> pa.Buffer:
    sink = pa.BufferOutputStream()

    with pa.ipc.new_stream(sink, table.schema) as writer:
        writer.write_table(table)

    return sink.getvalue()

Then you can call to_pybytes() on that buffer. You will later be able to
read that with:

    reader = pa.BufferReader(buffer)
    table = pa.ipc.open_stream(reader).read_all()

Best,

Will Jones

[1]
https://github.com/wjones127/arrow-ipc-bench/blob/89d68b4d7cfcb3f5d28b6000abdb801c93198bbf/share_arrow.py#L19-L25

On Thu, May 11, 2023 at 10:35 AM Philip Moore via user <
user@arrow.apache.org> wrote:

> Hello,
>
>
>
>                 I’m attempting to use Apache Superset with PyArrow 12.0.0
> – and it has this section of code:
>
>
> if use_msgpack:
>     with stats_timing(
>         "sqllab.query.results_backend_pa_serialization", stats_logger
>     ):
>         data = (
>             pa.default_serialization_context()
>             .serialize(result_set.pa_table)
>             .to_buffer()
>             .to_pybytes()
>         )
>
>     # expand when loading data from results backend
>     all_columns, expanded_columns = (selected_columns, [])
>
>
>
>
>
>                 That code worked fine in PyArrow 11.0.0 – but it appears
> that “default_serialization_context()” is removed in PyArrow 12.
>
>
>
>                 Can you advise on what this code should look like for use
> with PyArrow 12?
>
>
>
>                 Thank you.
>
>
>
> Phil
>