You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@arrow.apache.org by Martin Percossi <ma...@percossi.com> on 2021/04/19 14:54:51 UTC
xbow - range-v3 views/actions for Arrow C++
Hi, I am developing a library, called xbow [*], to provide improved
ergonomics for Arrow C++ while ideally losing no performance. (If you're on
reddit, an upvote would be appreciated too! [**]) With xbow, you can write
code like this:
def_record(suspect,
(int32_t, id),
(string, name),
(double, salary)
);
auto suspects = vector<suspect>{
{1, "Keyser Söze"s, 1000.0}, {2, "Kobayashi"s, 500.0}, {3, "Fred
Fenster"s, 500.0},
{4, "Jack Baer"s, 100.0}, {5, "Dean Keaton"s, 800.0}, {6,
"Michael McManus"s, 100.0},
};
print("input rows: {}\n", suspects);
// below: traverse the rows, changing name to upper case, skipping
every other element,// cycling over rows so that they repeat and
taking exactly 20 of these rows, and finally// this range-v3 range is
converted to a regular arrow table.// This code shows that we can take
a bog-standard range-v3 pipeline and convert it to// an arrow object.
This could later, for example, be written to a parquet file
(WIP).const auto table = suspects
| views::transform([](auto&& p) -> suspect& {
boost::to_upper(p.name);
return p;
})
| views::stride(2)
| views::cycle
| views::take(20)
| xb::arrow::actions::to_table;
// below: note that to_range<suspect>(table) returns a range
consisting of chunks, each of which// is also a range. These chunks
correspond exactly to the actual low-level chunks in the// arrow file.
We view::join this range to produce a single, collated range, which we
then// convert to a std::vector<suspect> for the sole reason of
printing. Note how easily we// taped together the chunks! Normally
this would be two-level for loop involving laborious// extraction of
each field, type-casting, urgh!print("round-tripped rows: {}\n",
xb::arrow::views::to_range<suspect>(table) | views::join |
to<vector<suspect>>);
It would be great to get feedback from other users of Arrow. I have a few
things planned, but want to see if there's interest before I invest more of
my time:
- zero cost optional-like objects directly using the bitmask memory, to
avoid allocation of temporaries in traversal functions.
- date support (done but not pushed)
- time support (WIP)
- indexes and more dataframe functionality
- integration with python via PEP484.
Thanks in advance!
[*] https://github.com/seertaak/xbow
[**]
https://www.reddit.com/r/cpp/comments/mswno0/xbow_rangev3_actions_and_ranges_for_arrow_c/
--
Martin Percossi