You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Michael Lavina <mi...@factset.com> on 2021/07/21 17:09:22 UTC
How Pandas/Perspective represent table pivots in arrow
Hello Apache Arrow Team,
I am looking at ways my company can create an SDK that can share apache arrow data while preserving table pivots. I was looking at how Pandas and Perspective do it and it seems like
For row_pivots
Pandas just sorts the data into a flat arrow structure
Perspective actually generates a rowPath for each row
Does Pandas generate a row_path per row that I can reference?
For column_pivots
Pandas and perspective both seem to create new arrays whose names denote the column_path i.e.
Share price per monthj
Ticker | January | February |
| start | end | start | end |
Goog | 200 | 244 | 246 | 260 |
F | 35. | 35 | 35. | 50. |
Would be represented like
| ticker | (January, start) | (January, end) | (February, start) | (February, end)
Why isn’t Pandas and perspective for that matter use Structs to denote that the start of the month ticker price is a child of the column January instead of hard coding that information in the name of the column?
Is that practice documented anywhere so that if we were to create an SDK for internal use it could be easily fed into Pandas?