You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Radu Teodorescu <ra...@yahoo.com.INVALID> on 2020/08/26 20:46:23 UTC

conversion between pyspark.DataFrame and pyarrow.Table

Hi,
I noticed that arrow is mentioned as an optional intermediary format for converting between pandas DFs and spark DFs. Is there a way to explicitly convert an pyarrow Table to a spark DataFrame and the other way around.
Absent that, going pysprak->pandas->pyarrow and back works but it’s obviously suboptimal.
Thank you
Radu

Re: conversion between pyspark.DataFrame and pyarrow.Table

Posted by Bryan Cutler <cu...@gmail.com>.
There isn't a direct conversion to/from Spark, I made
https://issues.apache.org/jira/browse/SPARK-29040 a while ago for
conversion to Spark from an Arrow table. If possible, make a comment there
for your use case which might help get support for it.

Bryan

On Mon, Aug 31, 2020, 9:12 PM Micah Kornfield <em...@gmail.com> wrote:

> Hi Radu,
> I'm not a spark expert, but I haven't seen any documentation on direct
> conversion.  You might be better off asking the user@spark or dev@spark
> mailing lists.
>
> Thanks,
> Micah
>
>
> On Wed, Aug 26, 2020 at 1:46 PM Radu Teodorescu
> <ra...@yahoo.com.invalid> wrote:
>
> > Hi,
> > I noticed that arrow is mentioned as an optional intermediary format for
> > converting between pandas DFs and spark DFs. Is there a way to explicitly
> > convert an pyarrow Table to a spark DataFrame and the other way around.
> > Absent that, going pysprak->pandas->pyarrow and back works but it’s
> > obviously suboptimal.
> > Thank you
> > Radu
>

Re: conversion between pyspark.DataFrame and pyarrow.Table

Posted by Micah Kornfield <em...@gmail.com>.
Hi Radu,
I'm not a spark expert, but I haven't seen any documentation on direct
conversion.  You might be better off asking the user@spark or dev@spark
mailing lists.

Thanks,
Micah


On Wed, Aug 26, 2020 at 1:46 PM Radu Teodorescu
<ra...@yahoo.com.invalid> wrote:

> Hi,
> I noticed that arrow is mentioned as an optional intermediary format for
> converting between pandas DFs and spark DFs. Is there a way to explicitly
> convert an pyarrow Table to a spark DataFrame and the other way around.
> Absent that, going pysprak->pandas->pyarrow and back works but it’s
> obviously suboptimal.
> Thank you
> Radu