You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Wes McKinney <we...@gmail.com> on 2020/04/30 22:03:30 UTC

[C++] Heads up about breaking API change with Interval types

Hi folks,

In https://github.com/apache/arrow/pull/7060 I proposed an
(unavoidable) C++ API change related to the two types of intervals
that are in the Arrow columnar format.

As context, in the C++ library in almost all cases we use different
Type enum values for each "subtype" that has a different in-memory
representation. So we have

Flatbuffers "Date" -> Type::DATE32 and Type::DATE64
Flatbuffers "Time" -> Type::TIME32 and Type::TIME64

There are two flavors of Interval, YEAR_MONTH (which is represented as
4-byte values) and DAY_TIME (which is represented as 8-byte values).
This means that we generally have to branch on the interval type to
select code paths, so you end up with code like

case Type::INTERVAL: {
  switch (checked_cast<const IntervalType&>(*type).interval_type()) {
    case IntervalType::MONTHS:
      res = std::make_shared<IntegerConverter<MonthIntervalType>>(type);
      break;
    case IntervalType::DAY_TIME:
      res = std::make_shared<DayTimeIntervalConverter>(type);
      break;
    default:
  return not_implemented();
}

This makes any kind of dynamic dispatch for intervals more complex
than other types (e.g. DATE32/64). My patch splits the enum into
INTERVAL_MONTHS and INTERVAL_DAY_TIME to make Interval work the same
as the other types which have different in-memory representations
based on their parameters (i.e. Date and Time).

Since this is a less traveled part of the codebase, the number of
downstream users impacted by the API change should not be large but
per discussion on the PR I wanted to make this change more visible in
case there was a concern.

Thanks
Wes