You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@arrow.apache.org by Michael Lavina <mi...@factset.com> on 2021/07/21 17:09:22 UTC

How Pandas/Perspective represent table pivots in arrow

Hello Apache Arrow Team,

I am looking at ways my company can create an SDK that can share apache arrow data while preserving table pivots. I was looking at how Pandas and Perspective do it and it seems like

For row_pivots

Pandas just sorts the data into a flat arrow structure

Perspective actually generates a rowPath for each row

Does Pandas generate a row_path per row that I can reference?

For column_pivots

Pandas and perspective both seem to create new arrays whose names denote the column_path i.e.

Share price per monthj
Ticker | January       |   February  |
            | start | end | start | end |
Goog  | 200   | 244 | 246   | 260 |
F          | 35.   | 35    | 35.    | 50.  |


Would be represented like

| ticker | (January, start) | (January, end) | (February, start) | (February, end)

Why isn’t Pandas and perspective for that matter use Structs to denote that the start of the month ticker price is a child of the column January instead of hard coding that information in the name of the column?

Is that practice documented anywhere so that if we were to create an SDK for internal use it could be easily fed into Pandas?