You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@arrow.apache.org by Micah Kornfield <em...@gmail.com> on 2020/04/02 04:54:23 UTC

Re: Join operation on attributes from arrow structs

Hi Hasara,
There isn't current functionality in C++/Python to do this (
https://issues.apache.org/jira/browse/ARROW-4630 is the issue tracking
this).

Also how nested attributes in json format are mapped into buffers once
> converted in arrow format?

I'm not sure I understand this question?

Thanks,
Micah

On Sun, Mar 22, 2020 at 10:09 PM Hasara Maithree <
hasaramaithreedesilva@gmail.com> wrote:

> Hi all,
>
> Assume I have a json file named 'my_data.json' as below.
>
> *{"a": [1, 2], "b": {"c": true, "d": "1991-02-03"}}
> {"a": [3, 4, 5], "b": {"c": false, "d": "2019-04-01"**}}*
>
> If I need to do a join operation based on attribute d, can I do it
> directly from arrow structs? ( or are there any efficient alternatives?)
> Also how nested attributes in json format are mapped into buffers once
> converted in arrow format? (example taken from documentation)
>
> >>> table = json.read_json("my_data.json")>>> table
> pyarrow.Table
> a: list<item: int64>
>   child 0, item: int64
> b: struct<c: bool, d: timestamp[s]>
>   child 0, c: bool
>   child 1, d: timestamp[s]>>> table.to_pandas()
>            a                                       b0     [1, 2]
> {'c': True, 'd': 1991-02-03 00:00:00}1  [3, 4, 5]  {'c': False, 'd':
> 2019-04-01 00:00:00}
>
>
> Thank You
>

Re: Join operation on attributes from arrow structs

Posted by Francois Saint-Jacques <fs...@gmail.com>.

They're mapped with the StructType/StructArray, which is also columnar
representation, e.g. one buffer per field in the sub-object. If you
have varying/incompatible types, a field will be promoted to a
UnionType.

François

On Thu, Apr 2, 2020 at 12:54 AM Micah Kornfield <em...@gmail.com> wrote:
>
> Hi Hasara,
> There isn't current functionality in C++/Python to do this (
> https://issues.apache.org/jira/browse/ARROW-4630 is the issue tracking
> this).
>
> Also how nested attributes in json format are mapped into buffers once
> > converted in arrow format?
>
> I'm not sure I understand this question?
>
> Thanks,
> Micah
>
> On Sun, Mar 22, 2020 at 10:09 PM Hasara Maithree <
> hasaramaithreedesilva@gmail.com> wrote:
>
> > Hi all,
> >
> > Assume I have a json file named 'my_data.json' as below.
> >
> > *{"a": [1, 2], "b": {"c": true, "d": "1991-02-03"}}
> > {"a": [3, 4, 5], "b": {"c": false, "d": "2019-04-01"**}}*
> >
> > If I need to do a join operation based on attribute d, can I do it
> > directly from arrow structs? ( or are there any efficient alternatives?)
> > Also how nested attributes in json format are mapped into buffers once
> > converted in arrow format? (example taken from documentation)
> >
> > >>> table = json.read_json("my_data.json")>>> table
> > pyarrow.Table
> > a: list<item: int64>
> >   child 0, item: int64
> > b: struct<c: bool, d: timestamp[s]>
> >   child 0, c: bool
> >   child 1, d: timestamp[s]>>> table.to_pandas()
> >            a                                       b0     [1, 2]
> > {'c': True, 'd': 1991-02-03 00:00:00}1  [3, 4, 5]  {'c': False, 'd':
> > 2019-04-01 00:00:00}
> >
> >
> > Thank You
> >