You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@arrow.apache.org by Philip Moore via user <us...@arrow.apache.org> on 2023/05/11 17:35:39 UTC
PyArrow 12 serialization
Hello,
I’m attempting to use Apache Superset with PyArrow 12.0.0 – and it has this section of code:
if use_msgpack:
with stats_timing(
"sqllab.query.results_backend_pa_serialization", stats_logger
):
data = (
pa.default_serialization_context()
.serialize(result_set.pa_table)
.to_buffer()
.to_pybytes()
)
# expand when loading data from results backend
all_columns, expanded_columns = (selected_columns, [])
That code worked fine in PyArrow 11.0.0 – but it appears that “default_serialization_context()” is removed in PyArrow 12.
Can you advise on what this code should look like for use with PyArrow 12?
Thank you.
Phil
Re: PyArrow 12 serialization
Posted by Philip Moore via user <us...@arrow.apache.org>.
Thanks, Will – it works great!
From: Will Jones <wi...@gmail.com>
Date: Thursday, May 11, 2023 at 1:44 PM
To: user@arrow.apache.org <us...@arrow.apache.org>, Philip Moore <ph...@voltrondata.com>
Subject: Re: PyArrow 12 serialization
Hi Phil,
It looks like you are trying to serialize a PyArrow table to Python bytes.
This function (from [1]) will give you a PyArrow Buffer object:
def write_ipc_buffer(table: pa.Table) -> pa.Buffer:
sink = pa.BufferOutputStream()
with pa.ipc.new_stream(sink, table.schema) as writer:
writer.write_table(table)
return sink.getvalue()
Then you can call to_pybytes() on that buffer. You will later be able to read that with:
reader = pa.BufferReader(buffer)
table = pa.ipc.open_stream(reader).read_all()
Best,
Will Jones
[1] https://github.com/wjones127/arrow-ipc-bench/blob/89d68b4d7cfcb3f5d28b6000abdb801c93198bbf/share_arrow.py#L19-L25
On Thu, May 11, 2023 at 10:35 AM Philip Moore via user <us...@arrow.apache.org>> wrote:
Hello,
I’m attempting to use Apache Superset with PyArrow 12.0.0 – and it has this section of code:
if use_msgpack:
with stats_timing(
"sqllab.query.results_backend_pa_serialization", stats_logger
):
data = (
pa.default_serialization_context()
.serialize(result_set.pa_table)
.to_buffer()
.to_pybytes()
)
# expand when loading data from results backend
all_columns, expanded_columns = (selected_columns, [])
That code worked fine in PyArrow 11.0.0 – but it appears that “default_serialization_context()” is removed in PyArrow 12.
Can you advise on what this code should look like for use with PyArrow 12?
Thank you.
Phil
Re: PyArrow 12 serialization
Posted by Will Jones <wi...@gmail.com>.
Hi Phil,
It looks like you are trying to serialize a PyArrow table to Python bytes.
This function (from [1]) will give you a PyArrow Buffer object:
def write_ipc_buffer(table: pa.Table) -> pa.Buffer:
sink = pa.BufferOutputStream()
with pa.ipc.new_stream(sink, table.schema) as writer:
writer.write_table(table)
return sink.getvalue()
Then you can call to_pybytes() on that buffer. You will later be able to
read that with:
reader = pa.BufferReader(buffer)
table = pa.ipc.open_stream(reader).read_all()
Best,
Will Jones
[1]
https://github.com/wjones127/arrow-ipc-bench/blob/89d68b4d7cfcb3f5d28b6000abdb801c93198bbf/share_arrow.py#L19-L25
On Thu, May 11, 2023 at 10:35 AM Philip Moore via user <
user@arrow.apache.org> wrote:
> Hello,
>
>
>
> I’m attempting to use Apache Superset with PyArrow 12.0.0
> – and it has this section of code:
>
>
> if use_msgpack:
> with stats_timing(
> "sqllab.query.results_backend_pa_serialization", stats_logger
> ):
> data = (
> pa.default_serialization_context()
> .serialize(result_set.pa_table)
> .to_buffer()
> .to_pybytes()
> )
>
> # expand when loading data from results backend
> all_columns, expanded_columns = (selected_columns, [])
>
>
>
>
>
> That code worked fine in PyArrow 11.0.0 – but it appears
> that “default_serialization_context()” is removed in PyArrow 12.
>
>
>
> Can you advise on what this code should look like for use
> with PyArrow 12?
>
>
>
> Thank you.
>
>
>
> Phil
>