You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Jerald Alex <vm...@gmail.com> on 2023/05/02 08:03:43 UTC

[Python] Casting struct to map

Hi Experts,

Can anyone please highlight if it is possible to cast struct to map type?

I tried the following but it seems to  be producing an error as below.

pyarrow.lib.ArrowNotImplementedError: Unsupported cast from
struct<first_name: string, last_name: string> to map using function cast_map

Note: Snippet is just an example to show the problem.

Code Snippet:

table_schema = pa.schema([pa.field("id", pa.int32()), pa.field("names",
pa.map_(pa.string(), pa.string()))])

table_data = [{"id": 1,"names": {"first_name": "Tyler", "last_name":
"Brady"}},
{"id": 2,"names": {"first_name": "Walsh", "last_name": "Weaver"}}]

tbl = pa.Table.from_pylist(table_data)
print(tbl)
print(tbl.cast(table_schema))
print(tbl)

Error :

id: int64
names: struct<first_name: string, last_name: string>
  child 0, first_name: string
  child 1, last_name: string
----
id: [[1,2]]
names: [
  -- is_valid: all not null
  -- child 0 type: string
["Tyler","Walsh"]
  -- child 1 type: string
["Brady","Weaver"]]
Traceback (most recent call last):
  File "/Users/
infant.alex@cognitedata.com/Documents/Github/HubOcean/demo/pyarrow_types.py",
line 220, in <module>
    print(tbl.cast(table_schema))
  File "pyarrow/table.pxi", line 3489, in pyarrow.lib.Table.cast
  File "pyarrow/table.pxi", line 523, in pyarrow.lib.ChunkedArray.cast
  File "/Users/
infant.alex@cognitedata.com/Library/Caches/pypoetry/virtualenvs/demo-LzMA3Hsd-py3.10/lib/python3.10/site-packages/pyarrow/compute.py",
line 391, in cast
    return call_function("cast", [arr], options)
  File "pyarrow/_compute.pyx", line 560, in pyarrow._compute.call_function
  File "pyarrow/_compute.pyx", line 355, in pyarrow._compute.Function.call
  File "pyarrow/error.pxi", line 144, in
pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 121, in pyarrow.lib.check_status
pyarrow.lib.ArrowNotImplementedError: Unsupported cast from
struct<first_name: string, last_name: string> to map using function cast_map

Regards,
Alex Vincent

Re: [Python] Casting struct to map

Posted by Weston Pace <we...@gmail.com>.
No, struct array is not naturally castable to map.  It's not something that
can be done zero-copy and I don't think anyone has encountered this need
before.  Let me make sure I understand.

The goal is to go from a type of STRUCT<N1: T, N2: T, N3: T, ... NZ: T>,
where every key in the struct has the same type, to a MAP<STRING: T>, where
each record will have Z map entries?  This seems like it could be expressed
as a compute function.  I don't think it would be very natural as a cast
since it has a pretty strict requirement that all fields in the struct have
the same type and so it will be pretty limited.  I think you could have a
compute function as well that went the opposite direction.

I do agree with Alenka, if there is any way to create your original input
data as a map then that will have better performance.

On Wed, May 3, 2023 at 4:58 AM Jerald Alex <vm...@gmail.com> wrote:

> Hi Alenka,
>
> Great! Thank you so much for your inputs.
>
> I have indeed tried to use schema when creating a table from a pylist and
> it worked but in my use case, I wouldn't know the table schema beforehand
> especially for the other columns -  I need to do transformations before I
> can cast it to the expected schema. Please let me know if you have any
> other thoughts.
>
> Regards,
> Infant Alex
>
> On Wed, May 3, 2023 at 9:43 AM Alenka Frim <alenka@voltrondata.com
> .invalid>
> wrote:
>
> > Hi Alex,
> >
> > passing the schema to from_pylist() method on the Table should work for
> > your example (not sure if it solves your initial problem?)
> >
> > import pyarrow as pa
> >
> > table_schema = pa.schema([pa.field("id", pa.int32()),
> > pa.field("names", pa.map_(pa.string(), pa.string()))])
> >
> > table_data = [{"id": 1,"names": {"first_name": "Tyler",
> "last_name":"Brady"
> > }},
> > {"id": 2,"names": {"first_name": "Walsh", "last_name": "Weaver"}}]
> >
> > pa.Table.from_pylist(table_data, schema=table_schema)
> > # pyarrow.Table
> > # id: int32
> > # names: map<string, string>
> > # child 0, entries: struct<key: string not null, value: string> not null
> > # child 0, key: string not null
> > # child 1, value: string
> > # ----
> > # id: [[1,2]]
> > # names:
> >
> >
> [[keys:["first_name","last_name"]values:["Tyler","Brady"],keys:["first_name","last_name"]values:["Walsh","Weaver"]]]
> >
> >
> > Best, Alenka
> >
> > On Wed, May 3, 2023 at 9:13 AM Jerald Alex <vm...@gmail.com> wrote:
> >
> > > Any inputs on this please?
> > >
> > > On Tue, May 2, 2023 at 10:03 AM Jerald Alex <vm...@gmail.com>
> wrote:
> > >
> > > > Hi Experts,
> > > >
> > > > Can anyone please highlight if it is possible to cast struct to map
> > type?
> > > >
> > > > I tried the following but it seems to  be producing an error as
> below.
> > > >
> > > > pyarrow.lib.ArrowNotImplementedError: Unsupported cast from
> > > > struct<first_name: string, last_name: string> to map using function
> > > cast_map
> > > >
> > > > Note: Snippet is just an example to show the problem.
> > > >
> > > > Code Snippet:
> > > >
> > > > table_schema = pa.schema([pa.field("id", pa.int32()),
> pa.field("names",
> > > > pa.map_(pa.string(), pa.string()))])
> > > >
> > > > table_data = [{"id": 1,"names": {"first_name": "Tyler", "last_name":
> > > > "Brady"}},
> > > > {"id": 2,"names": {"first_name": "Walsh", "last_name": "Weaver"}}]
> > > >
> > > > tbl = pa.Table.from_pylist(table_data)
> > > > print(tbl)
> > > > print(tbl.cast(table_schema))
> > > > print(tbl)
> > > >
> > > > Error :
> > > >
> > > > id: int64
> > > > names: struct<first_name: string, last_name: string>
> > > >   child 0, first_name: string
> > > >   child 1, last_name: string
> > > > ----
> > > > id: [[1,2]]
> > > > names: [
> > > >   -- is_valid: all not null
> > > >   -- child 0 type: string
> > > > ["Tyler","Walsh"]
> > > >   -- child 1 type: string
> > > > ["Brady","Weaver"]]
> > > > Traceback (most recent call last):
> > > >   File "/Users/
> > > >
> > >
> >
> infant.alex@cognitedata.com/Documents/Github/HubOcean/demo/pyarrow_types.py
> > > ",
> > > > line 220, in <module>
> > > >     print(tbl.cast(table_schema))
> > > >   File "pyarrow/table.pxi", line 3489, in pyarrow.lib.Table.cast
> > > >   File "pyarrow/table.pxi", line 523, in
> pyarrow.lib.ChunkedArray.cast
> > > >   File "/Users/
> > > >
> > >
> >
> infant.alex@cognitedata.com/Library/Caches/pypoetry/virtualenvs/demo-LzMA3Hsd-py3.10/lib/python3.10/site-packages/pyarrow/compute.py
> > > ",
> > > > line 391, in cast
> > > >     return call_function("cast", [arr], options)
> > > >   File "pyarrow/_compute.pyx", line 560, in
> > > pyarrow._compute.call_function
> > > >   File "pyarrow/_compute.pyx", line 355, in
> > > pyarrow._compute.Function.call
> > > >   File "pyarrow/error.pxi", line 144, in
> > > > pyarrow.lib.pyarrow_internal_check_status
> > > >   File "pyarrow/error.pxi", line 121, in pyarrow.lib.check_status
> > > > pyarrow.lib.ArrowNotImplementedError: Unsupported cast from
> > > > struct<first_name: string, last_name: string> to map using function
> > > cast_map
> > > >
> > > > Regards,
> > > > Alex Vincent
> > > >
> > >
> >
>

Re: [Python] Casting struct to map

Posted by Jerald Alex <vm...@gmail.com>.
Hi Alenka,

Great! Thank you so much for your inputs.

I have indeed tried to use schema when creating a table from a pylist and
it worked but in my use case, I wouldn't know the table schema beforehand
especially for the other columns -  I need to do transformations before I
can cast it to the expected schema. Please let me know if you have any
other thoughts.

Regards,
Infant Alex

On Wed, May 3, 2023 at 9:43 AM Alenka Frim <al...@voltrondata.com.invalid>
wrote:

> Hi Alex,
>
> passing the schema to from_pylist() method on the Table should work for
> your example (not sure if it solves your initial problem?)
>
> import pyarrow as pa
>
> table_schema = pa.schema([pa.field("id", pa.int32()),
> pa.field("names", pa.map_(pa.string(), pa.string()))])
>
> table_data = [{"id": 1,"names": {"first_name": "Tyler", "last_name":"Brady"
> }},
> {"id": 2,"names": {"first_name": "Walsh", "last_name": "Weaver"}}]
>
> pa.Table.from_pylist(table_data, schema=table_schema)
> # pyarrow.Table
> # id: int32
> # names: map<string, string>
> # child 0, entries: struct<key: string not null, value: string> not null
> # child 0, key: string not null
> # child 1, value: string
> # ----
> # id: [[1,2]]
> # names:
>
> [[keys:["first_name","last_name"]values:["Tyler","Brady"],keys:["first_name","last_name"]values:["Walsh","Weaver"]]]
>
>
> Best, Alenka
>
> On Wed, May 3, 2023 at 9:13 AM Jerald Alex <vm...@gmail.com> wrote:
>
> > Any inputs on this please?
> >
> > On Tue, May 2, 2023 at 10:03 AM Jerald Alex <vm...@gmail.com> wrote:
> >
> > > Hi Experts,
> > >
> > > Can anyone please highlight if it is possible to cast struct to map
> type?
> > >
> > > I tried the following but it seems to  be producing an error as below.
> > >
> > > pyarrow.lib.ArrowNotImplementedError: Unsupported cast from
> > > struct<first_name: string, last_name: string> to map using function
> > cast_map
> > >
> > > Note: Snippet is just an example to show the problem.
> > >
> > > Code Snippet:
> > >
> > > table_schema = pa.schema([pa.field("id", pa.int32()), pa.field("names",
> > > pa.map_(pa.string(), pa.string()))])
> > >
> > > table_data = [{"id": 1,"names": {"first_name": "Tyler", "last_name":
> > > "Brady"}},
> > > {"id": 2,"names": {"first_name": "Walsh", "last_name": "Weaver"}}]
> > >
> > > tbl = pa.Table.from_pylist(table_data)
> > > print(tbl)
> > > print(tbl.cast(table_schema))
> > > print(tbl)
> > >
> > > Error :
> > >
> > > id: int64
> > > names: struct<first_name: string, last_name: string>
> > >   child 0, first_name: string
> > >   child 1, last_name: string
> > > ----
> > > id: [[1,2]]
> > > names: [
> > >   -- is_valid: all not null
> > >   -- child 0 type: string
> > > ["Tyler","Walsh"]
> > >   -- child 1 type: string
> > > ["Brady","Weaver"]]
> > > Traceback (most recent call last):
> > >   File "/Users/
> > >
> >
> infant.alex@cognitedata.com/Documents/Github/HubOcean/demo/pyarrow_types.py
> > ",
> > > line 220, in <module>
> > >     print(tbl.cast(table_schema))
> > >   File "pyarrow/table.pxi", line 3489, in pyarrow.lib.Table.cast
> > >   File "pyarrow/table.pxi", line 523, in pyarrow.lib.ChunkedArray.cast
> > >   File "/Users/
> > >
> >
> infant.alex@cognitedata.com/Library/Caches/pypoetry/virtualenvs/demo-LzMA3Hsd-py3.10/lib/python3.10/site-packages/pyarrow/compute.py
> > ",
> > > line 391, in cast
> > >     return call_function("cast", [arr], options)
> > >   File "pyarrow/_compute.pyx", line 560, in
> > pyarrow._compute.call_function
> > >   File "pyarrow/_compute.pyx", line 355, in
> > pyarrow._compute.Function.call
> > >   File "pyarrow/error.pxi", line 144, in
> > > pyarrow.lib.pyarrow_internal_check_status
> > >   File "pyarrow/error.pxi", line 121, in pyarrow.lib.check_status
> > > pyarrow.lib.ArrowNotImplementedError: Unsupported cast from
> > > struct<first_name: string, last_name: string> to map using function
> > cast_map
> > >
> > > Regards,
> > > Alex Vincent
> > >
> >
>

Re: [Python] Casting struct to map

Posted by Alenka Frim <al...@voltrondata.com.INVALID>.
Hi Alex,

passing the schema to from_pylist() method on the Table should work for
your example (not sure if it solves your initial problem?)

import pyarrow as pa

table_schema = pa.schema([pa.field("id", pa.int32()),
pa.field("names", pa.map_(pa.string(), pa.string()))])

table_data = [{"id": 1,"names": {"first_name": "Tyler", "last_name":"Brady"
}},
{"id": 2,"names": {"first_name": "Walsh", "last_name": "Weaver"}}]

pa.Table.from_pylist(table_data, schema=table_schema)
# pyarrow.Table
# id: int32
# names: map<string, string>
# child 0, entries: struct<key: string not null, value: string> not null
# child 0, key: string not null
# child 1, value: string
# ----
# id: [[1,2]]
# names:
[[keys:["first_name","last_name"]values:["Tyler","Brady"],keys:["first_name","last_name"]values:["Walsh","Weaver"]]]


Best, Alenka

On Wed, May 3, 2023 at 9:13 AM Jerald Alex <vm...@gmail.com> wrote:

> Any inputs on this please?
>
> On Tue, May 2, 2023 at 10:03 AM Jerald Alex <vm...@gmail.com> wrote:
>
> > Hi Experts,
> >
> > Can anyone please highlight if it is possible to cast struct to map type?
> >
> > I tried the following but it seems to  be producing an error as below.
> >
> > pyarrow.lib.ArrowNotImplementedError: Unsupported cast from
> > struct<first_name: string, last_name: string> to map using function
> cast_map
> >
> > Note: Snippet is just an example to show the problem.
> >
> > Code Snippet:
> >
> > table_schema = pa.schema([pa.field("id", pa.int32()), pa.field("names",
> > pa.map_(pa.string(), pa.string()))])
> >
> > table_data = [{"id": 1,"names": {"first_name": "Tyler", "last_name":
> > "Brady"}},
> > {"id": 2,"names": {"first_name": "Walsh", "last_name": "Weaver"}}]
> >
> > tbl = pa.Table.from_pylist(table_data)
> > print(tbl)
> > print(tbl.cast(table_schema))
> > print(tbl)
> >
> > Error :
> >
> > id: int64
> > names: struct<first_name: string, last_name: string>
> >   child 0, first_name: string
> >   child 1, last_name: string
> > ----
> > id: [[1,2]]
> > names: [
> >   -- is_valid: all not null
> >   -- child 0 type: string
> > ["Tyler","Walsh"]
> >   -- child 1 type: string
> > ["Brady","Weaver"]]
> > Traceback (most recent call last):
> >   File "/Users/
> >
> infant.alex@cognitedata.com/Documents/Github/HubOcean/demo/pyarrow_types.py
> ",
> > line 220, in <module>
> >     print(tbl.cast(table_schema))
> >   File "pyarrow/table.pxi", line 3489, in pyarrow.lib.Table.cast
> >   File "pyarrow/table.pxi", line 523, in pyarrow.lib.ChunkedArray.cast
> >   File "/Users/
> >
> infant.alex@cognitedata.com/Library/Caches/pypoetry/virtualenvs/demo-LzMA3Hsd-py3.10/lib/python3.10/site-packages/pyarrow/compute.py
> ",
> > line 391, in cast
> >     return call_function("cast", [arr], options)
> >   File "pyarrow/_compute.pyx", line 560, in
> pyarrow._compute.call_function
> >   File "pyarrow/_compute.pyx", line 355, in
> pyarrow._compute.Function.call
> >   File "pyarrow/error.pxi", line 144, in
> > pyarrow.lib.pyarrow_internal_check_status
> >   File "pyarrow/error.pxi", line 121, in pyarrow.lib.check_status
> > pyarrow.lib.ArrowNotImplementedError: Unsupported cast from
> > struct<first_name: string, last_name: string> to map using function
> cast_map
> >
> > Regards,
> > Alex Vincent
> >
>

Re: [Python] Casting struct to map

Posted by Jerald Alex <vm...@gmail.com>.
Any inputs on this please?

On Tue, May 2, 2023 at 10:03 AM Jerald Alex <vm...@gmail.com> wrote:

> Hi Experts,
>
> Can anyone please highlight if it is possible to cast struct to map type?
>
> I tried the following but it seems to  be producing an error as below.
>
> pyarrow.lib.ArrowNotImplementedError: Unsupported cast from
> struct<first_name: string, last_name: string> to map using function cast_map
>
> Note: Snippet is just an example to show the problem.
>
> Code Snippet:
>
> table_schema = pa.schema([pa.field("id", pa.int32()), pa.field("names",
> pa.map_(pa.string(), pa.string()))])
>
> table_data = [{"id": 1,"names": {"first_name": "Tyler", "last_name":
> "Brady"}},
> {"id": 2,"names": {"first_name": "Walsh", "last_name": "Weaver"}}]
>
> tbl = pa.Table.from_pylist(table_data)
> print(tbl)
> print(tbl.cast(table_schema))
> print(tbl)
>
> Error :
>
> id: int64
> names: struct<first_name: string, last_name: string>
>   child 0, first_name: string
>   child 1, last_name: string
> ----
> id: [[1,2]]
> names: [
>   -- is_valid: all not null
>   -- child 0 type: string
> ["Tyler","Walsh"]
>   -- child 1 type: string
> ["Brady","Weaver"]]
> Traceback (most recent call last):
>   File "/Users/
> infant.alex@cognitedata.com/Documents/Github/HubOcean/demo/pyarrow_types.py",
> line 220, in <module>
>     print(tbl.cast(table_schema))
>   File "pyarrow/table.pxi", line 3489, in pyarrow.lib.Table.cast
>   File "pyarrow/table.pxi", line 523, in pyarrow.lib.ChunkedArray.cast
>   File "/Users/
> infant.alex@cognitedata.com/Library/Caches/pypoetry/virtualenvs/demo-LzMA3Hsd-py3.10/lib/python3.10/site-packages/pyarrow/compute.py",
> line 391, in cast
>     return call_function("cast", [arr], options)
>   File "pyarrow/_compute.pyx", line 560, in pyarrow._compute.call_function
>   File "pyarrow/_compute.pyx", line 355, in pyarrow._compute.Function.call
>   File "pyarrow/error.pxi", line 144, in
> pyarrow.lib.pyarrow_internal_check_status
>   File "pyarrow/error.pxi", line 121, in pyarrow.lib.check_status
> pyarrow.lib.ArrowNotImplementedError: Unsupported cast from
> struct<first_name: string, last_name: string> to map using function cast_map
>
> Regards,
> Alex Vincent
>