You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@drill.apache.org by Timothy Farkas <tf...@mapr.com> on 2017/09/14 20:41:05 UTC

Code Generation Question

Hi All,

As I've been looking at the TopN operator and code generation, I've been wondering why we have 2 forms of code generation:


  *   One is the method of stitching compiled methods into a template class with ASM.
  *   The other simply creates a class that extends the TemplateClass and compiles it without using custom ASM techniques. This is the PlainJava technique.

With my high level understanding, it seems like using the PlainJava approach would be the simplest, and would also probably be the most performant since we inherit all the java compiler optimizations. Is there a specific reason why we still use our custom ASM technique? Would it be safe to start retiring the old ASM technique in favor of PlainJava?

Thanks,
Tim

Re: Code Generation Question

Posted by Paul Rogers <pr...@mapr.com>.

We had a quick discussion. There is some doubt that Java can correctly optimize code that uses subclasses. The problem is that, for one query, the JIT wants to optimize the code one way, for another, the JIT wants to optimize a different way. By having copies of the byte codes, the JIT can optimize code differently for different queries.

Of course, there is quite a bit of code that is common. Should we copy that code for each operator also?

Only testing will reveal the best path forward. For now, there is enough FUD (fear, uncertainty and doubt) that we should leave things as they are for production. (Feel free to use plain Java in development.)

Thanks,

- Paul

> On Sep 15, 2017, at 1:21 PM, Boaz Ben-Zvi <bb...@mapr.com> wrote:
> 
> Hi Tim,
> 
>     The latest Pull Request for the Hash Aggr operator (#938) does turn the “plain java” on for the mainline code, as these new template code changes (in the Hash Table) caused the “byte twiddling” to break in some subtle way.
> This is the first attempt; and as it (hopefully) will work well we’ll continue with other operators.
> 
>    Thanks,
> 
>            Boaz 
> 
> On 9/15/17, 11:34 AM, "Paul Rogers" <pr...@mapr.com> wrote:
> 
>    Hi Tim,
> 
>    This question has come up multiple times. The “plain Java” is very handy for developing code with code generation. It also seems to be faster, smaller and simpler than the byte-code-merge mechanism. (However, rewriting byte codes is has the benefit of sounding much more sophisticated than simply invoking the Java compiler!)
> 
>    I suspect that there is a healthy concern that there may be subtle problems with letting Java compile code without our help in twiddling with the byte codes.
> 
>    The way to address this concern is to run a full set of functional and performance tests with the “plan Java” mechanism turned on. But, no one has had the time to do that…
> 
>    That said, feel free to turn “plain Java" on during development; we now have sufficient experience to show that Java does, at least in development, produce code at least as good as what we produce via our byte-code merge mechanisms. With the added benefit that you can debug the code. (By contrast, when using the byte-code merge approach, there is no matching source code for the debugger to step through… I believe that folks have, instead, used print statements to visualize the execution flow.)
> 
>    Thanks,
> 
>    - Paul
> 
>> On Sep 14, 2017, at 1:41 PM, Timothy Farkas <tf...@mapr.com> wrote:
>> 
>> Hi All,
>> 
>> As I've been looking at the TopN operator and code generation, I've been wondering why we have 2 forms of code generation:
>> 
>> 
>> *   One is the method of stitching compiled methods into a template class with ASM.
>> *   The other simply creates a class that extends the TemplateClass and compiles it without using custom ASM techniques. This is the PlainJava technique.
>> 
>> With my high level understanding, it seems like using the PlainJava approach would be the simplest, and would also probably be the most performant since we inherit all the java compiler optimizations. Is there a specific reason why we still use our custom ASM technique? Would it be safe to start retiring the old ASM technique in favor of PlainJava?
>> 
>> Thanks,
>> Tim
> 
> 
>

Re: Code Generation Question

Posted by Boaz Ben-Zvi <bb...@mapr.com>.

 Hi Tim,

     The latest Pull Request for the Hash Aggr operator (#938) does turn the “plain java” on for the mainline code, as these new template code changes (in the Hash Table) caused the “byte twiddling” to break in some subtle way.
This is the first attempt; and as it (hopefully) will work well we’ll continue with other operators.

    Thanks,

            Boaz 

On 9/15/17, 11:34 AM, "Paul Rogers" <pr...@mapr.com> wrote:

    Hi Tim,
    
    This question has come up multiple times. The “plain Java” is very handy for developing code with code generation. It also seems to be faster, smaller and simpler than the byte-code-merge mechanism. (However, rewriting byte codes is has the benefit of sounding much more sophisticated than simply invoking the Java compiler!)
    
    I suspect that there is a healthy concern that there may be subtle problems with letting Java compile code without our help in twiddling with the byte codes.
    
    The way to address this concern is to run a full set of functional and performance tests with the “plan Java” mechanism turned on. But, no one has had the time to do that…
    
    That said, feel free to turn “plain Java" on during development; we now have sufficient experience to show that Java does, at least in development, produce code at least as good as what we produce via our byte-code merge mechanisms. With the added benefit that you can debug the code. (By contrast, when using the byte-code merge approach, there is no matching source code for the debugger to step through… I believe that folks have, instead, used print statements to visualize the execution flow.)
    
    Thanks,
    
    - Paul
    
    > On Sep 14, 2017, at 1:41 PM, Timothy Farkas <tf...@mapr.com> wrote:
    > 
    > Hi All,
    > 
    > As I've been looking at the TopN operator and code generation, I've been wondering why we have 2 forms of code generation:
    > 
    > 
    >  *   One is the method of stitching compiled methods into a template class with ASM.
    >  *   The other simply creates a class that extends the TemplateClass and compiles it without using custom ASM techniques. This is the PlainJava technique.
    > 
    > With my high level understanding, it seems like using the PlainJava approach would be the simplest, and would also probably be the most performant since we inherit all the java compiler optimizations. Is there a specific reason why we still use our custom ASM technique? Would it be safe to start retiring the old ASM technique in favor of PlainJava?
    > 
    > Thanks,
    > Tim

Re: Code Generation Question

Posted by Paul Rogers <pr...@mapr.com>.

Hi Tim,

This question has come up multiple times. The “plain Java” is very handy for developing code with code generation. It also seems to be faster, smaller and simpler than the byte-code-merge mechanism. (However, rewriting byte codes is has the benefit of sounding much more sophisticated than simply invoking the Java compiler!)

I suspect that there is a healthy concern that there may be subtle problems with letting Java compile code without our help in twiddling with the byte codes.

The way to address this concern is to run a full set of functional and performance tests with the “plan Java” mechanism turned on. But, no one has had the time to do that…

That said, feel free to turn “plain Java" on during development; we now have sufficient experience to show that Java does, at least in development, produce code at least as good as what we produce via our byte-code merge mechanisms. With the added benefit that you can debug the code. (By contrast, when using the byte-code merge approach, there is no matching source code for the debugger to step through… I believe that folks have, instead, used print statements to visualize the execution flow.)

Thanks,

- Paul

> On Sep 14, 2017, at 1:41 PM, Timothy Farkas <tf...@mapr.com> wrote:
> 
> Hi All,
> 
> As I've been looking at the TopN operator and code generation, I've been wondering why we have 2 forms of code generation:
> 
> 
>  *   One is the method of stitching compiled methods into a template class with ASM.
>  *   The other simply creates a class that extends the TemplateClass and compiles it without using custom ASM techniques. This is the PlainJava technique.
> 
> With my high level understanding, it seems like using the PlainJava approach would be the simplest, and would also probably be the most performant since we inherit all the java compiler optimizations. Is there a specific reason why we still use our custom ASM technique? Would it be safe to start retiring the old ASM technique in favor of PlainJava?
> 
> Thanks,
> Tim