You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Paul Rogers <pa...@yahoo.com.INVALID> on 2018/07/16 01:50:14 UTC

Result set loader -- still needed?

Hi All,

Over the last six months I've been slowly trying to get the "result set loader" work committed to Drill. As a recap, this was supposed to provide a uniform way to optimally pack a record batch up to a proscribed memory limit. This technique is particularly useful in readers which do not have much information about incoming data sizes.

In the mean time, the team has done a great job using the "sizer" approach to get a good-enough solution for all internal operators. The sizer simply uses statistics about incoming batches to predict outgoing batch size.

At the same time, work has been done to create a one-off solution for Parquet. Since Parquet is, by far, Drill's most important data source, this means we have the reader problem is solved for the most critical use case.

A time goes on, I get less and less time to maintain the result set loader code. My knowledge of team priorities and of Drill code drifts out of date.

So, the question for the group is, is the result set loader work still needed? If not, we can wait to do the remaining commits until a compelling need presents itself. If it is needed, it would be good to know how the team plans to use it so that we stay in sync.

Thanks,

- Paul


Re: Result set loader -- still needed?

Posted by Parth Chandra <pa...@apache.org>.
+1 to keeping the result set loader.
Also, IMO, the parquet effort should move to using the result set loader (I
believe Salim has a plan to do so).

On Sun, Jul 15, 2018 at 6:50 PM, Paul Rogers <pa...@yahoo.com.invalid>
wrote:

> Hi All,
>
> Over the last six months I've been slowly trying to get the "result set
> loader" work committed to Drill. As a recap, this was supposed to provide a
> uniform way to optimally pack a record batch up to a proscribed memory
> limit. This technique is particularly useful in readers which do not have
> much information about incoming data sizes.
>
> In the mean time, the team has done a great job using the "sizer" approach
> to get a good-enough solution for all internal operators. The sizer simply
> uses statistics about incoming batches to predict outgoing batch size.
>
> At the same time, work has been done to create a one-off solution for
> Parquet. Since Parquet is, by far, Drill's most important data source, this
> means we have the reader problem is solved for the most critical use case.
>
> A time goes on, I get less and less time to maintain the result set loader
> code. My knowledge of team priorities and of Drill code drifts out of date.
>
> So, the question for the group is, is the result set loader work still
> needed? If not, we can wait to do the remaining commits until a compelling
> need presents itself. If it is needed, it would be good to know how the
> team plans to use it so that we stay in sync.
>
> Thanks,
>
> - Paul
>
>