You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@arrow.apache.org by Sandy Ryza <sa...@gmail.com> on 2020/06/01 15:45:37 UTC

Re: check whether pandas type is convertible to arrow type

Ah - I hadn't thought about how the object dtype complicates things:

What I'm trying to do at a higher level is maybe wacky:

   - I want a set of parquet files to be read/written by PySpark and Pandas
   interchangeably.
   - For each file, I want to to specify, in code, the column types
   expected in the file.
   - Before writing out a Pandas DataFrame to a file, I want to check
   whether it matches the expected column types for the file.  I don't need to
   provably catch every violation, but the more I can catch, the better.
   - I'm considering using pyarrow types for expressing the expected column
   types for each file.

Does that make sense?  Is there a different way you'd advise accomplishing
this?

On 2020/05/30 15:07:05, Wes McKinney <w....@gmail.com> wrote:
> I don't think there is specifically (one could be added in theory). Is>
> the goal to determine whether `pyarrow.array(pandas_object)` will>
> succeed or not, or something else? Since a lot of pandas data is>
> opaquely represented with object dtype it can be tricky unless you>
> want to go to the expense of using `pandas.lib.infer_dtype` to>
> determine the effective logical type of the values.>
>
> On Fri, May 29, 2020 at 4:18 PM Sandy Ryza <sa...@gmail.com> wrote:>
> >>
> > Hi all,>
> >>
> > If I have a pandas dtype and an arrow type, is there a pyarrow API that
allows me to check whether the pandas dtype is convertible to the arrow
type?>
> >>
> > It seems like "arrow_type.to_pandas_dtype() == pandas_dtype" would work
in most cases, because pandas dtypes tend to be at least as wide as
equivalent arrow types, but I'm wondering whether there's something more
principled.>
> >>
> > Any help much appreciated,>
> > Sandy>
> >>
>

Re: check whether pandas type is convertible to arrow type

Posted by Wes McKinney <we...@gmail.com>.
You can specify an explicit Arrow schema when converting a
pandas.DataFrame to pyarrow.Table or RecordBatch. So it might be
better to write out the schema you want (kind of like when you write
the schema in SQL with CREATE TABLE ...) and then ensure that pandas
objects are coerced into that?

On Mon, Jun 1, 2020 at 10:45 AM Sandy Ryza <sa...@gmail.com> wrote:
>
> Ah - I hadn't thought about how the object dtype complicates things:
>
> What I'm trying to do at a higher level is maybe wacky:
>
> I want a set of parquet files to be read/written by PySpark and Pandas interchangeably.
> For each file, I want to to specify, in code, the column types expected in the file.
> Before writing out a Pandas DataFrame to a file, I want to check whether it matches the expected column types for the file.  I don't need to provably catch every violation, but the more I can catch, the better.
> I'm considering using pyarrow types for expressing the expected column types for each file.
>
> Does that make sense?  Is there a different way you'd advise accomplishing this?
>
> On 2020/05/30 15:07:05, Wes McKinney <w....@gmail.com> wrote:
> > I don't think there is specifically (one could be added in theory). Is>
> > the goal to determine whether `pyarrow.array(pandas_object)` will>
> > succeed or not, or something else? Since a lot of pandas data is>
> > opaquely represented with object dtype it can be tricky unless you>
> > want to go to the expense of using `pandas.lib.infer_dtype` to>
> > determine the effective logical type of the values.>
> >
> > On Fri, May 29, 2020 at 4:18 PM Sandy Ryza <sa...@gmail.com> wrote:>
> > >>
> > > Hi all,>
> > >>
> > > If I have a pandas dtype and an arrow type, is there a pyarrow API that allows me to check whether the pandas dtype is convertible to the arrow type?>
> > >>
> > > It seems like "arrow_type.to_pandas_dtype() == pandas_dtype" would work in most cases, because pandas dtypes tend to be at least as wide as equivalent arrow types, but I'm wondering whether there's something more principled.>
> > >>
> > > Any help much appreciated,>
> > > Sandy>
> > >>
> >