You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Julian Hyde <ju...@gmail.com> on 2013/02/19 20:42:33 UTC

Query plans that write to in-memory queue rather than file

Hi drillers,

I made a small architectural change in my first iteration of the SQL parser/validator/optimizer. I changed the LogicalPlan specification to allow query output to go to a java.util.Queue data structure rather than a file.

It was convenient for my purposes because I didn't need to allocate a temporary file, wait til the process had finished, and then start returning results. I know that in the big data world, many queries will produce large results and the results will not pass through just one processing node. But still, many queries will produce small results.

A logical plan is a graph, and there is an argument that it should be symmetric. A logical plan has sources, so why shouldn't it declare sinks?

I hacked the change in [ see my changes to ROPConverter in https://github.com/apache/incubator-drill/commit/7f294adb649064e26dd2f28864260b17b07315ef ] , creating a "write" operator with a special file name "socket:0", and I modified ROPConverter to create a QueueSinkROP rather than the usual JSONWriter.

Let's discuss what is the correct architecture for this.

Julian

Re: Query plans that write to in-memory queue rather than file

Posted by Jacques Nadeau <ja...@gmail.com>.
Good questions...

The original idea of sources was more of strorage engines as opposed to
just data sources.  The idea wasn't fully blown out before.  You can see my
updates in the latest master rev.  If you take a look at package
org.apache.drill.exec.ref.rse (rse===reference storage engine) for how i
dealt with things along with ref/test/resources/simple_plan.json for the
separation. Basically, a storage engine is responsible for generating
record readers and record writers based on custom specification according
to that storage engine (selection and target opaque values in scan and
store).  Some RSEs may support only read or only write (For example,  the
current ConsoleRSE and QueueRSE storage engines are write only).   You can
also look at the updated plan generated by the sqlparser module.

For the short term, I moved the Queue holder to be held as part of the
DrillConfig object.  Really, we're talking about a in memory pipe... but
I'm not sure exactly how things should be bootstrapped passed around.

Let me know what you think of my updates...

Thanks,
Jacques

On Tue, Feb 19, 2013 at 11:42 AM, Julian Hyde <ju...@gmail.com> wrote:

> Hi drillers,
>
> I made a small architectural change in my first iteration of the SQL
> parser/validator/optimizer. I changed the LogicalPlan specification to
> allow query output to go to a java.util.Queue data structure rather than a
> file.
>
> It was convenient for my purposes because I didn't need to allocate a
> temporary file, wait til the process had finished, and then start returning
> results. I know that in the big data world, many queries will produce large
> results and the results will not pass through just one processing node. But
> still, many queries will produce small results.
>
> A logical plan is a graph, and there is an argument that it should be
> symmetric. A logical plan has sources, so why shouldn't it declare sinks?
>
> I hacked the change in [ see my changes to ROPConverter in
> https://github.com/apache/incubator-drill/commit/7f294adb649064e26dd2f28864260b17b07315ef] , creating a "write" operator with a special file name "socket:0", and I
> modified ROPConverter to create a QueueSinkROP rather than the usual
> JSONWriter.
>
> Let's discuss what is the correct architecture for this.
>
> Julian