You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@calcite.apache.org by Albert <zi...@gmail.com> on 2016/05/27 10:15:19 UTC

whole stage code generation

I was reading article (and references) on the speedup gain in spark2.
apache-spark-as-a-compiler-joining-a-billion-rows-per-second-on-a-laptop.html
<https://databricks.com/blog/2016/05/23/apache-spark-as-a-compiler-joining-a-billion-rows-per-second-on-a-laptop.html>

The main idea is that physical code generated should now be data centric
instead of operator centric, and preserve data locality.

I am thinking maybe this applies to calcite as well. in terms of switching
to the data centric approach, what could calcite do and gain?




quote:>The Future: Whole-stage Code Generation

From the above observation, a natural next step for us was to explore the
possibility of automatically generating this *handwritten* code at runtime,
which we are calling “whole-stage code generation.” This idea is inspired
by Thomas Neumann’s seminal VLDB 2011 paper on*Efficiently Compiling
Efficient Query Plans for Modern Hardware
<http://www.vldb.org/pvldb/vol4/p539-neumann.pdf>*. For more details on the
paper, Adrian Colyer has coordinated with us to publish a review on The
Morning Paper blog
<http://blog.acolyer.org/2016/05/23/efficiently-compiling-efficient-query-plans-for-modern-hardware>
 today.

The goal is to leverage whole-stage code generation so the engine can
achieve the performance of hand-written code, yet provide the functionality
of a general purpose engine. Rather than relying on operators for
processing data at runtime, these operators together generate code at
runtime and collapse each fragment of the query, where possible, into a
single function and execute that generated code instead.

-- 
~~~~~~~~~~~~~~~
no mistakes
~~~~~~~~~~~~~~~~~~

Re: whole stage code generation

Posted by Ted Dunning <te...@gmail.com>.

Trimming somewhat ruthlessly ...

That approach is not only not very radical, it is hard to tell the
difference from what Drill does.  If I were to read that description of
whole stage code generation, I would find it impossible to tell the
difference from the way that Drill generates code.

On Tue, May 31, 2016 at 7:57 PM, Julian Hyde <jh...@apache.org> wrote:

> That approach makes a lot of sense. That said, it’s not as radical as they
> make it sound. ...



> * Drill makes extensive use of generated Java code, even for UDFs,
> carefully generated so that Hotspot can optimize it.
> * More and more of the Java-based engines are moving to off-heap memory.
> It has many benefits, but I hear that Hotspot is not as good at optimizing
> accesses to off-heap memory as it is at accessing, say, a Java long[].
>
> ...

> On May 27, 2016, at 3:15 AM, Albert <zi...@gmail.com> wrote:
> >
> > I was reading article (and references) on the speedup gain in spark2.
> >
> apache-spark-as-a-compiler-joining-a-billion-rows-per-second-on-a-laptop.html
> > <
> https://databricks.com/blog/2016/05/23/apache-spark-as-a-compiler-joining-a-billion-rows-per-second-on-a-laptop.html
> >
> >
> ...

>
> > quote:>The Future: Whole-stage Code Generation
> >
> > From the above observation, a natural next step for us was to explore the
> > possibility of automatically generating this *handwritten* code at
> runtime,
> > which we are calling “whole-stage code generation.” This idea is inspired
> > by Thomas Neumann’s seminal VLDB 2011 paper on*Efficiently Compiling
> > Efficient Query Plans for Modern Hardware
> > <http://www.vldb.org/pvldb/vol4/p539-neumann.pdf>*. For more details on
> the
> > paper, Adrian Colyer has coordinated with us to publish a review on The
> > Morning Paper blog
> > <
> http://blog.acolyer.org/2016/05/23/efficiently-compiling-efficient-query-plans-for-modern-hardware
> >
> > today.
> >
> > The goal is to leverage whole-stage code generation so the engine can
> > achieve the performance of hand-written code, yet provide the
> functionality
> > of a general purpose engine. Rather than relying on operators for
> > processing data at runtime, these operators together generate code at
> > runtime and collapse each fragment of the query, where possible, into a
> > single function and execute that generated code instead.
> >
> > --
> > ~~~~~~~~~~~~~~~
> > no mistakes
> > ~~~~~~~~~~~~~~~~~~
>
>

Re: whole stage code generation

Posted by Julian Hyde <jh...@apache.org>.

That approach makes a lot of sense. That said, it’s not as radical as they make it sound. The Volcano execution model went out a long time ago. Here’s the history from my career:
* When I was at Oracle in ’95, they used an improved version of Volcano that called the “next” with a callback method that was called back with a few dozen rows before “next” returned.
* At SQLstream in 2004 and LucidDB operators would work on ~64KB buffers of rows serialized into a cache-efficient format. We used neither the “pull” approach (driven by a consuming thread) or the “push” approach (driven by a producer thread) but we had a scheduler that invoked operators that had work to do, and tried to invoke operators in sequence so that the data was still in cache.
* Following MonetDB and X100 every DB engine moved to SIMD-friendly data structures. The ones initially written for the Java heap (Hive and, yes, Spark) eventually followed suit.
* Drill makes extensive use of generated Java code, even for UDFs, carefully generated so that Hotspot can optimize it.
* More and more of the Java-based engines are moving to off-heap memory. It has many benefits, but I hear that Hotspot is not as good at optimizing accesses to off-heap memory as it is at accessing, say, a Java long[].

Following this trend, and looking to the future, these would be my architectural recommendations:
* Don’t write your own engine!
* Translate your queries to an algebra (e.g. Calcite)
* Have that algebra translate to a high-performance engine (e.g. Drill, Spark, Hive or Flink)
* Use an efficient memory format (e.g. Arrow) so that engines can efficiently exchange data.

Julian

> On May 27, 2016, at 3:15 AM, Albert <zi...@gmail.com> wrote:
> 
> I was reading article (and references) on the speedup gain in spark2.
> apache-spark-as-a-compiler-joining-a-billion-rows-per-second-on-a-laptop.html
> <https://databricks.com/blog/2016/05/23/apache-spark-as-a-compiler-joining-a-billion-rows-per-second-on-a-laptop.html>
> 
> The main idea is that physical code generated should now be data centric
> instead of operator centric, and preserve data locality.
> 
> I am thinking maybe this applies to calcite as well. in terms of switching
> to the data centric approach, what could calcite do and gain?
> 
> 
> 
> 
> quote:>The Future: Whole-stage Code Generation
> 
> From the above observation, a natural next step for us was to explore the
> possibility of automatically generating this *handwritten* code at runtime,
> which we are calling “whole-stage code generation.” This idea is inspired
> by Thomas Neumann’s seminal VLDB 2011 paper on*Efficiently Compiling
> Efficient Query Plans for Modern Hardware
> <http://www.vldb.org/pvldb/vol4/p539-neumann.pdf>*. For more details on the
> paper, Adrian Colyer has coordinated with us to publish a review on The
> Morning Paper blog
> <http://blog.acolyer.org/2016/05/23/efficiently-compiling-efficient-query-plans-for-modern-hardware>
> today.
> 
> The goal is to leverage whole-stage code generation so the engine can
> achieve the performance of hand-written code, yet provide the functionality
> of a general purpose engine. Rather than relying on operators for
> processing data at runtime, these operators together generate code at
> runtime and collapse each fragment of the query, where possible, into a
> single function and execute that generated code instead.
> 
> -- 
> ~~~~~~~~~~~~~~~
> no mistakes
> ~~~~~~~~~~~~~~~~~~