You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Wes McKinney <we...@gmail.com> on 2020/04/30 22:03:30 UTC
[C++] Heads up about breaking API change with Interval types
Hi folks,
In https://github.com/apache/arrow/pull/7060 I proposed an
(unavoidable) C++ API change related to the two types of intervals
that are in the Arrow columnar format.
As context, in the C++ library in almost all cases we use different
Type enum values for each "subtype" that has a different in-memory
representation. So we have
Flatbuffers "Date" -> Type::DATE32 and Type::DATE64
Flatbuffers "Time" -> Type::TIME32 and Type::TIME64
There are two flavors of Interval, YEAR_MONTH (which is represented as
4-byte values) and DAY_TIME (which is represented as 8-byte values).
This means that we generally have to branch on the interval type to
select code paths, so you end up with code like
case Type::INTERVAL: {
switch (checked_cast<const IntervalType&>(*type).interval_type()) {
case IntervalType::MONTHS:
res = std::make_shared<IntegerConverter<MonthIntervalType>>(type);
break;
case IntervalType::DAY_TIME:
res = std::make_shared<DayTimeIntervalConverter>(type);
break;
default:
return not_implemented();
}
This makes any kind of dynamic dispatch for intervals more complex
than other types (e.g. DATE32/64). My patch splits the enum into
INTERVAL_MONTHS and INTERVAL_DAY_TIME to make Interval work the same
as the other types which have different in-memory representations
based on their parameters (i.e. Date and Time).
Since this is a less traveled part of the codebase, the number of
downstream users impacted by the API change should not be large but
per discussion on the PR I wanted to make this change more visible in
case there was a concern.
Thanks
Wes