You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Isaac Myers <is...@protonmail.com.INVALID> on 2019/10/29 14:08:33 UTC

PySpark read Arrow C++ tables with unsigned field types?

Fields with unsigned types written with Arrow C++ can't be read by PySpark, due to Spark's lack of support unsigned types (per https://issues.apache.org/jira/browse/SPARK-10113). There is already an issue to address the same problem when writing tables with unsigned fields using PyArrow (https://issues.apache.org/jira/browse/ARROW-1988). Are there any plans to address this issue for Arrow C++?

Sent with [ProtonMail](https://protonmail.com) Secure Email.

Re: PySpark read Arrow C++ tables with unsigned field types?

Posted by Wes McKinney <we...@gmail.com>.
I don't think there is a JIRA issue to that effect, but you can
certainly create one describing what kind of C++ API you would be
looking for.

This is not a priority for me personally or my colleagues, at least,
but might be of interest to others in the community. Apache projects
are communities of volunteers fundamentally.

On Tue, Oct 29, 2019 at 11:32 AM Isaac Myers
<is...@protonmail.com.invalid> wrote:
>
> Wes,
>
> We use Arrow C++ (not PyArrow) exclusively for writing and PySpark for manipulation and analysis. I'm wondering if there are any plans for Arrow C++ to implement something similar to flavor='spark' in PyArrow.
>
>
>
> Sent with ProtonMail Secure Email.
>
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Tuesday, October 29, 2019 8:47 AM, Wes McKinney <we...@gmail.com> wrote:
>
> > hi Isaac -- you are more than welcome to submit a PR to cause unsigned
> > types to be written as signed integers when using flavor='spark' from
> > pyarrow. The simplest thing would be to do the casting of unsigned
> > types to signed prior to writing the Parquet file
> >
> > -   Wes
> >
> >     On Tue, Oct 29, 2019 at 9:09 AM Isaac Myers
> >     isaacmyers@protonmail.com.invalid wrote:
> >
> >
> > > Fields with unsigned types written with Arrow C++ can't be read by PySpark, due to Spark's lack of support unsigned types (per https://issues.apache.org/jira/browse/SPARK-10113). There is already an issue to address the same problem when writing tables with unsigned fields using PyArrow (https://issues.apache.org/jira/browse/ARROW-1988). Are there any plans to address this issue for Arrow C++?
> > > Sent with ProtonMail Secure Email.
>
>

Re: PySpark read Arrow C++ tables with unsigned field types?

Posted by Isaac Myers <is...@protonmail.com.INVALID>.
Wes,

We use Arrow C++ (not PyArrow) exclusively for writing and PySpark for manipulation and analysis. I'm wondering if there are any plans for Arrow C++ to implement something similar to flavor='spark' in PyArrow.



Sent with ProtonMail Secure Email.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Tuesday, October 29, 2019 8:47 AM, Wes McKinney <we...@gmail.com> wrote:

> hi Isaac -- you are more than welcome to submit a PR to cause unsigned
> types to be written as signed integers when using flavor='spark' from
> pyarrow. The simplest thing would be to do the casting of unsigned
> types to signed prior to writing the Parquet file
>
> -   Wes
>
>     On Tue, Oct 29, 2019 at 9:09 AM Isaac Myers
>     isaacmyers@protonmail.com.invalid wrote:
>
>
> > Fields with unsigned types written with Arrow C++ can't be read by PySpark, due to Spark's lack of support unsigned types (per https://issues.apache.org/jira/browse/SPARK-10113). There is already an issue to address the same problem when writing tables with unsigned fields using PyArrow (https://issues.apache.org/jira/browse/ARROW-1988). Are there any plans to address this issue for Arrow C++?
> > Sent with ProtonMail Secure Email.



Re: PySpark read Arrow C++ tables with unsigned field types?

Posted by Wes McKinney <we...@gmail.com>.
hi Isaac -- you are more than welcome to submit a PR to cause unsigned
types to be written as signed integers when using flavor='spark' from
pyarrow. The simplest thing would be to do the casting of unsigned
types to signed prior to writing the Parquet file

- Wes

On Tue, Oct 29, 2019 at 9:09 AM Isaac Myers
<is...@protonmail.com.invalid> wrote:
>
> Fields with unsigned types written with Arrow C++ can't be read by PySpark, due to Spark's lack of support unsigned types (per https://issues.apache.org/jira/browse/SPARK-10113). There is already an issue to address the same problem when writing tables with unsigned fields using PyArrow (https://issues.apache.org/jira/browse/ARROW-1988). Are there any plans to address this issue for Arrow C++?
>
> Sent with [ProtonMail](https://protonmail.com) Secure Email.