You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by weijie tong <to...@gmail.com> on 2017/07/30 13:10:12 UTC

Which code compiler is better

The compile process is long when we have 20 sum or avg expression and the
compiler is janino. But if we change the compiler to jdk,we gain lower
compile process time. It seems jdk compiler is better .If that's tue,why
not let jdk be the default one?

Re: Which code compiler is better

Posted by Paul Rogers <pr...@mapr.com>.
Hey all,

What does everyone think about Weijie’s suggestion? We’ve been turning the “plain Java” option on in an ad-hoc fashion for debugging now for about a year and have never seen an issue. Indeed, no issue would be expected since the “plain Java” technique just generates and compiles Java code; while the byte-code fixup method requires that we know how to muck about to rewrite byte codes.

Maybe, for 1.12, let’s turn it on early (that is, now) and run our many tests during the 1.12 cycle. That will give us plenty of time to catch any issues. And, it means that the generated Java code becomes readily accessible for debugging, rather than using the opaque methods that the team has long used.

If we agree, I can go ahead and switch the default to use the “plain Java” method rather than the byte-code manipulation method.

Thoughts?

- Paul

> On Aug 11, 2017, at 6:40 PM, weijie tong <to...@gmail.com> wrote:
> 
> @paul why not set the default code generation be plain java? The current
> default ASM merge method will have some unpredictable scalar replacement
> errors which we ever suffered before. This will give users trustless feal.
> What the merging does will also be done by the JIT if  we choose the plain
> java to extend the template.
> 
> 
> On Fri, 11 Aug 2017 at 5:52 PM weijie tong <to...@gmail.com> wrote:
> 
>> @chunhui  we just adjust different compiler options ,the generating code
>> strategy does not affected by the compiler option.  so I think the
>> different result just reflects the compiler's performance.
>> 
>> On Wed, Aug 2, 2017 at 1:45 AM, Chunhui Shi <cs...@mapr.com> wrote:
>> 
>>> 
>>> Correct my previous response:
>>> 
>>> In DRILL-4778, JDK was faster in compilation but generated slower code.
>>> Janino was slower in compilation and generate faster code. Your JIRA did
>>> not mention how was the performance when running generated code. You may
>>> want to test this aspect as well.
>>> 
>>> 
>>> From: weijie tong <to...@gmail.com>
>>> Sent: Sunday, July 30, 2017 6:10:12 AM
>>> To: dev@drill.apache.org
>>> Subject: Which code compiler is better
>>> 
>>> The compile process is long when we have 20 sum or avg expression and the
>>> compiler is janino. But if we change the compiler to jdk,we gain lower
>>> compile process time. It seems jdk compiler is better .If that's tue,why
>>> not let jdk be the default one?
>>> 
>> 
>> 


Re: Which code compiler is better

Posted by weijie tong <to...@gmail.com>.
@paul why not set the default code generation be plain java? The current
default ASM merge method will have some unpredictable scalar replacement
errors which we ever suffered before. This will give users trustless feal.
What the merging does will also be done by the JIT if  we choose the plain
java to extend the template.


On Fri, 11 Aug 2017 at 5:52 PM weijie tong <to...@gmail.com> wrote:

> @chunhui  we just adjust different compiler options ,the generating code
> strategy does not affected by the compiler option.  so I think the
> different result just reflects the compiler's performance.
>
> On Wed, Aug 2, 2017 at 1:45 AM, Chunhui Shi <cs...@mapr.com> wrote:
>
>>
>> Correct my previous response:
>>
>> In DRILL-4778, JDK was faster in compilation but generated slower code.
>> Janino was slower in compilation and generate faster code. Your JIRA did
>> not mention how was the performance when running generated code. You may
>> want to test this aspect as well.
>>
>>
>> From: weijie tong <to...@gmail.com>
>> Sent: Sunday, July 30, 2017 6:10:12 AM
>> To: dev@drill.apache.org
>> Subject: Which code compiler is better
>>
>> The compile process is long when we have 20 sum or avg expression and the
>> compiler is janino. But if we change the compiler to jdk,we gain lower
>> compile process time. It seems jdk compiler is better .If that's tue,why
>> not let jdk be the default one?
>>
>
>

Re: Which code compiler is better

Posted by weijie tong <to...@gmail.com>.
@chunhui  we just adjust different compiler options ,the generating code
strategy does not affected by the compiler option.  so I think the
different result just reflects the compiler's performance.

On Wed, Aug 2, 2017 at 1:45 AM, Chunhui Shi <cs...@mapr.com> wrote:

>
> Correct my previous response:
>
> In DRILL-4778, JDK was faster in compilation but generated slower code.
> Janino was slower in compilation and generate faster code. Your JIRA did
> not mention how was the performance when running generated code. You may
> want to test this aspect as well.
>
>
> From: weijie tong <to...@gmail.com>
> Sent: Sunday, July 30, 2017 6:10:12 AM
> To: dev@drill.apache.org
> Subject: Which code compiler is better
>
> The compile process is long when we have 20 sum or avg expression and the
> compiler is janino. But if we change the compiler to jdk,we gain lower
> compile process time. It seems jdk compiler is better .If that's tue,why
> not let jdk be the default one?
>

Re: Which code compiler is better

Posted by Chunhui Shi <cs...@mapr.com>.
Correct my previous response:

In DRILL-4778, JDK was faster in compilation but generated slower code. Janino was slower in compilation and generate faster code. Your JIRA did not mention how was the performance when running generated code. You may want to test this aspect as well.


From: weijie tong <to...@gmail.com>
Sent: Sunday, July 30, 2017 6:10:12 AM
To: dev@drill.apache.org
Subject: Which code compiler is better

The compile process is long when we have 20 sum or avg expression and the
compiler is janino. But if we change the compiler to jdk,we gain lower
compile process time. It seems jdk compiler is better .If that's tue,why
not let jdk be the default one?

Re: Which code compiler is better

Posted by Chunhui Shi <cs...@mapr.com>.
A while ago my experiments ( https://issues.apache.org/jira/browse/DRILL-4778 ) show that JDK is more favorable in the aspect of the efficiency of generated code, as to the time of the compilation at that time it was Janino showing better performance. If in JDK 8 this is no more the case, I don't see any reason we need Janino for any case.

________________________________
From: weijie tong <to...@gmail.com>
Sent: Monday, July 31, 2017 8:31:41 AM
To: dev@drill.apache.org
Subject: Re: Which code compiler is better

here is JIRA link :  https://issues.apache.org/jira/browse/DRILL-5696

Our product environment is using JDK 8 , transformed code generation (not
plain java). Paul's experiments verified our product case.

The sql is like : "select
 (d.trade_cnt - d2.trade_cnt)/CAST(d2.trade_cnt AS DECIMAL(28,4)) as
trade_cnt_wr

,(d.trade_amt - d2.trade_amt)/CAST(d2.trade_amt AS DECIMAL(28,4)) as
trade_amt_wr
,(d.trade_shop_cnt - d2.trade_shop_cnt)/CAST(d2.trade_shop_cnt AS
DECIMAL(28,4)) as trade_shop_cnt_wr
,(d.online_shop_cnt - d2.online_shop_cnt)/CAST(d2.online_shop_cnt AS
DECIMAL(28,4)) as online_shop_cnt_wr
,CAST((d.trade_shop_rate - d2.trade_shop_rate) AS
DECIMAL(28,4))/CAST(d2.trade_shop_rate AS DECIMAL(28,4)) as
trade_shop_rate_wr
,(d.offline_item_cnt - d2.offline_item_cnt)/CAST(d2.offline_item_cnt
AS DECIMAL(28,4)) as offline_item_cnt_wr
,(d.business_amt_per_cnt -
d2.business_amt_per_cnt)/CAST(d2.business_amt_per_cnt AS
DECIMAL(28,4)) as business_amt_per_cnt_wr
,(d.order_amt_per_cnt -
d2.order_amt_per_cnt)/CAST(d2.order_amt_per_cnt AS DECIMAL(28,4)) as
order_amt_per_cnt_wr
,(d.new_shop_cnt - d2.new_shop_cnt)/CAST(d2.new_shop_cnt AS
DECIMAL(28,4)) as new_shop_cnt_wr
,(d.offline_shop_cnt - d2.offline_shop_cnt)/CAST(d2.offline_shop_cnt
AS DECIMAL(28,4)) as offline_shop_cnt_wr
,(d.item_use_cnt - d2.item_use_cnt)/CAST(d2.item_use_cnt AS
DECIMAL(28,4)) as item_use_cnt_wr
,(d.item_shop_rate - d2.item_shop_rate)/CAST(d2.item_shop_rate AS
DECIMAL(28,4)) as item_shop_rate_wr
,(d.discount_trd_cnt - d2.discount_trd_cnt)/CAST(d2.discount_trd_cnt
AS DECIMAL(28,4)) as discount_trd_cnt_wr
,(d.discount_shop_cnt -
d2.discount_shop_cnt)/CAST(d2.discount_shop_cnt AS DECIMAL(28,4)) as
discount_shop_cnt_wr
,(d.crm_shop_cnt - d2.crm_shop_cnt)/CAST(d2.crm_shop_cnt AS
DECIMAL(28,4)) as crm_shop_cnt_wr
,(d.crm_shop_rate - d2.crm_shop_rate)/CAST(d2.crm_shop_rate AS
DECIMAL(28,4)) as crm_shop_rate_wr
,(d.trade_cnt_voucher -
d2.trade_cnt_voucher)/CAST(d2.trade_cnt_voucher AS DECIMAL(28,4)) as
trade_cnt_voucher_wr
,(d.trade_amt_voucher -
d2.trade_amt_voucher)/CAST(d2.trade_amt_voucher AS DECIMAL(28,4)) as
trade_amt_voucher_wr
,(d.trade_cnt_per_shop -
d2.trade_cnt_per_shop)/CAST(d2.trade_cnt_per_shop AS DECIMAL(28,4)) as
trade_cnt_per_shop_wr

  from
    xxxxx   "

The project operator's setup time is high when using janino compiler.






On Mon, Jul 31, 2017 at 10:54 AM, Paul Rogers <pr...@mapr.com> wrote:

> A while back I did some experiments with JDK 8. The Java 8 compiler
> appears to be faster in general than Janino, if I remember correctly. (Not
> surprising: many people focus on optimizing the Java compiler, a smaller
> team maintains Janino...)
>
>
> Another experiment was to do "plain Java" code generation and compile
> rather than the compile & byte-code merge we do now. The compilation was
> faster as was code execution. The main reason for the speed-up is that
> "plan Java" does fewer steps: it just compiles and loads. However,
> "traditional" Drill code generation compiles, does a byte code copy and
> merge and then loads. Some "templates" are rather large. By using "plain
> Java" subclassing, we need not copy the base class code as we do when doing
> the byte-code merge.
>
>
> Also, because each generated class (with plain Java) uses the same base
> class code, the JVM can reuse its JIT optimizations; it does not have to
> rediscover them for each new generated class.
>
>
> We've not had time to do full testing, so we conservatively stick with
> what we know works. Still,  preliminary testing did show that "plain Java"
> is both faster and more convenient. You can experiment with this option.
> Find the commented out line like the following in each operator (record
> batch):
>
>
>       // Uncomment out this line to debug the generated code.
>
> //    cg.saveCodeForDebugging(true);
>
> Uncomment the line. You'll get debuggable plain Java code and compilation.
> The generated source code goes into /tmp/drill/codegen by default.
>
> Or, if you want to try for performance, and avoid the step of writing code
> to disk, use the following instead:
>
>     cg.preferPlainJava(true);
>
> More details appears in [1].
>
> Thanks,
>
> - Paul
>
> [1] https://github.com/paul-rogers/drill/wiki/Code-
> Generation-and-%22Short%2C-Fat%22-Queries
> ________________________________
> From: Aman Sinha <am...@apache.org>
> Sent: Sunday, July 30, 2017 9:16:09 AM
> To: dev@drill.apache.org
> Subject: Re: Which code compiler is better
>
> Weijie,
> what is the size (in KB) of your generated code for the aggregate operator
> that is doing the 20 SUM/AVG ?  Also, what JDK version are you using ?
>
> From what I recall, Janino was faster than JDK 1.7  up to about 256 KB
> source code file.  That's the current threshold in Drill; if the size is
> greater than that, Drill automatically switches to JDK compiler.   Newer
> JDK could potentially be faster, so we would need to do the comparison
> again.   Perhaps you should file a JIRA with your observations.
>
> There is also the complexity of the expressions.  For simple expressions,
> my understanding is Janino is typically better.  Can you provide the query
> pattern you used ?
>
> On Sun, Jul 30, 2017 at 6:10 AM, weijie tong <to...@gmail.com>
> wrote:
>
> > The compile process is long when we have 20 sum or avg expression and the
> > compiler is janino. But if we change the compiler to jdk,we gain lower
> > compile process time. It seems jdk compiler is better .If that's tue,why
> > not let jdk be the default one?
> >
>

Re: Which code compiler is better

Posted by weijie tong <to...@gmail.com>.
here is JIRA link :  https://issues.apache.org/jira/browse/DRILL-5696

Our product environment is using JDK 8 , transformed code generation (not
plain java). Paul's experiments verified our product case.

The sql is like : "select
 (d.trade_cnt - d2.trade_cnt)/CAST(d2.trade_cnt AS DECIMAL(28,4)) as
trade_cnt_wr

,(d.trade_amt - d2.trade_amt)/CAST(d2.trade_amt AS DECIMAL(28,4)) as
trade_amt_wr
,(d.trade_shop_cnt - d2.trade_shop_cnt)/CAST(d2.trade_shop_cnt AS
DECIMAL(28,4)) as trade_shop_cnt_wr
,(d.online_shop_cnt - d2.online_shop_cnt)/CAST(d2.online_shop_cnt AS
DECIMAL(28,4)) as online_shop_cnt_wr
,CAST((d.trade_shop_rate - d2.trade_shop_rate) AS
DECIMAL(28,4))/CAST(d2.trade_shop_rate AS DECIMAL(28,4)) as
trade_shop_rate_wr
,(d.offline_item_cnt - d2.offline_item_cnt)/CAST(d2.offline_item_cnt
AS DECIMAL(28,4)) as offline_item_cnt_wr
,(d.business_amt_per_cnt -
d2.business_amt_per_cnt)/CAST(d2.business_amt_per_cnt AS
DECIMAL(28,4)) as business_amt_per_cnt_wr
,(d.order_amt_per_cnt -
d2.order_amt_per_cnt)/CAST(d2.order_amt_per_cnt AS DECIMAL(28,4)) as
order_amt_per_cnt_wr
,(d.new_shop_cnt - d2.new_shop_cnt)/CAST(d2.new_shop_cnt AS
DECIMAL(28,4)) as new_shop_cnt_wr
,(d.offline_shop_cnt - d2.offline_shop_cnt)/CAST(d2.offline_shop_cnt
AS DECIMAL(28,4)) as offline_shop_cnt_wr
,(d.item_use_cnt - d2.item_use_cnt)/CAST(d2.item_use_cnt AS
DECIMAL(28,4)) as item_use_cnt_wr
,(d.item_shop_rate - d2.item_shop_rate)/CAST(d2.item_shop_rate AS
DECIMAL(28,4)) as item_shop_rate_wr
,(d.discount_trd_cnt - d2.discount_trd_cnt)/CAST(d2.discount_trd_cnt
AS DECIMAL(28,4)) as discount_trd_cnt_wr
,(d.discount_shop_cnt -
d2.discount_shop_cnt)/CAST(d2.discount_shop_cnt AS DECIMAL(28,4)) as
discount_shop_cnt_wr
,(d.crm_shop_cnt - d2.crm_shop_cnt)/CAST(d2.crm_shop_cnt AS
DECIMAL(28,4)) as crm_shop_cnt_wr
,(d.crm_shop_rate - d2.crm_shop_rate)/CAST(d2.crm_shop_rate AS
DECIMAL(28,4)) as crm_shop_rate_wr
,(d.trade_cnt_voucher -
d2.trade_cnt_voucher)/CAST(d2.trade_cnt_voucher AS DECIMAL(28,4)) as
trade_cnt_voucher_wr
,(d.trade_amt_voucher -
d2.trade_amt_voucher)/CAST(d2.trade_amt_voucher AS DECIMAL(28,4)) as
trade_amt_voucher_wr
,(d.trade_cnt_per_shop -
d2.trade_cnt_per_shop)/CAST(d2.trade_cnt_per_shop AS DECIMAL(28,4)) as
trade_cnt_per_shop_wr

  from
    xxxxx   "

The project operator's setup time is high when using janino compiler.






On Mon, Jul 31, 2017 at 10:54 AM, Paul Rogers <pr...@mapr.com> wrote:

> A while back I did some experiments with JDK 8. The Java 8 compiler
> appears to be faster in general than Janino, if I remember correctly. (Not
> surprising: many people focus on optimizing the Java compiler, a smaller
> team maintains Janino...)
>
>
> Another experiment was to do "plain Java" code generation and compile
> rather than the compile & byte-code merge we do now. The compilation was
> faster as was code execution. The main reason for the speed-up is that
> "plan Java" does fewer steps: it just compiles and loads. However,
> "traditional" Drill code generation compiles, does a byte code copy and
> merge and then loads. Some "templates" are rather large. By using "plain
> Java" subclassing, we need not copy the base class code as we do when doing
> the byte-code merge.
>
>
> Also, because each generated class (with plain Java) uses the same base
> class code, the JVM can reuse its JIT optimizations; it does not have to
> rediscover them for each new generated class.
>
>
> We've not had time to do full testing, so we conservatively stick with
> what we know works. Still,  preliminary testing did show that "plain Java"
> is both faster and more convenient. You can experiment with this option.
> Find the commented out line like the following in each operator (record
> batch):
>
>
>       // Uncomment out this line to debug the generated code.
>
> //    cg.saveCodeForDebugging(true);
>
> Uncomment the line. You'll get debuggable plain Java code and compilation.
> The generated source code goes into /tmp/drill/codegen by default.
>
> Or, if you want to try for performance, and avoid the step of writing code
> to disk, use the following instead:
>
>     cg.preferPlainJava(true);
>
> More details appears in [1].
>
> Thanks,
>
> - Paul
>
> [1] https://github.com/paul-rogers/drill/wiki/Code-
> Generation-and-%22Short%2C-Fat%22-Queries
> ________________________________
> From: Aman Sinha <am...@apache.org>
> Sent: Sunday, July 30, 2017 9:16:09 AM
> To: dev@drill.apache.org
> Subject: Re: Which code compiler is better
>
> Weijie,
> what is the size (in KB) of your generated code for the aggregate operator
> that is doing the 20 SUM/AVG ?  Also, what JDK version are you using ?
>
> From what I recall, Janino was faster than JDK 1.7  up to about 256 KB
> source code file.  That's the current threshold in Drill; if the size is
> greater than that, Drill automatically switches to JDK compiler.   Newer
> JDK could potentially be faster, so we would need to do the comparison
> again.   Perhaps you should file a JIRA with your observations.
>
> There is also the complexity of the expressions.  For simple expressions,
> my understanding is Janino is typically better.  Can you provide the query
> pattern you used ?
>
> On Sun, Jul 30, 2017 at 6:10 AM, weijie tong <to...@gmail.com>
> wrote:
>
> > The compile process is long when we have 20 sum or avg expression and the
> > compiler is janino. But if we change the compiler to jdk,we gain lower
> > compile process time. It seems jdk compiler is better .If that's tue,why
> > not let jdk be the default one?
> >
>

Re: Which code compiler is better

Posted by Paul Rogers <pr...@mapr.com>.
A while back I did some experiments with JDK 8. The Java 8 compiler appears to be faster in general than Janino, if I remember correctly. (Not surprising: many people focus on optimizing the Java compiler, a smaller team maintains Janino...)


Another experiment was to do "plain Java" code generation and compile rather than the compile & byte-code merge we do now. The compilation was faster as was code execution. The main reason for the speed-up is that "plan Java" does fewer steps: it just compiles and loads. However, "traditional" Drill code generation compiles, does a byte code copy and merge and then loads. Some "templates" are rather large. By using "plain Java" subclassing, we need not copy the base class code as we do when doing the byte-code merge.


Also, because each generated class (with plain Java) uses the same base class code, the JVM can reuse its JIT optimizations; it does not have to rediscover them for each new generated class.


We've not had time to do full testing, so we conservatively stick with what we know works. Still,  preliminary testing did show that "plain Java" is both faster and more convenient. You can experiment with this option. Find the commented out line like the following in each operator (record batch):


      // Uncomment out this line to debug the generated code.

//    cg.saveCodeForDebugging(true);

Uncomment the line. You'll get debuggable plain Java code and compilation. The generated source code goes into /tmp/drill/codegen by default.

Or, if you want to try for performance, and avoid the step of writing code to disk, use the following instead:

    cg.preferPlainJava(true);

More details appears in [1].

Thanks,

- Paul

[1] https://github.com/paul-rogers/drill/wiki/Code-Generation-and-%22Short%2C-Fat%22-Queries
________________________________
From: Aman Sinha <am...@apache.org>
Sent: Sunday, July 30, 2017 9:16:09 AM
To: dev@drill.apache.org
Subject: Re: Which code compiler is better

Weijie,
what is the size (in KB) of your generated code for the aggregate operator
that is doing the 20 SUM/AVG ?  Also, what JDK version are you using ?

From what I recall, Janino was faster than JDK 1.7  up to about 256 KB
source code file.  That's the current threshold in Drill; if the size is
greater than that, Drill automatically switches to JDK compiler.   Newer
JDK could potentially be faster, so we would need to do the comparison
again.   Perhaps you should file a JIRA with your observations.

There is also the complexity of the expressions.  For simple expressions,
my understanding is Janino is typically better.  Can you provide the query
pattern you used ?

On Sun, Jul 30, 2017 at 6:10 AM, weijie tong <to...@gmail.com>
wrote:

> The compile process is long when we have 20 sum or avg expression and the
> compiler is janino. But if we change the compiler to jdk,we gain lower
> compile process time. It seems jdk compiler is better .If that's tue,why
> not let jdk be the default one?
>

Re: Which code compiler is better

Posted by Aman Sinha <am...@apache.org>.
Weijie,
what is the size (in KB) of your generated code for the aggregate operator
that is doing the 20 SUM/AVG ?  Also, what JDK version are you using ?

From what I recall, Janino was faster than JDK 1.7  up to about 256 KB
source code file.  That's the current threshold in Drill; if the size is
greater than that, Drill automatically switches to JDK compiler.   Newer
JDK could potentially be faster, so we would need to do the comparison
again.   Perhaps you should file a JIRA with your observations.

There is also the complexity of the expressions.  For simple expressions,
my understanding is Janino is typically better.  Can you provide the query
pattern you used ?

On Sun, Jul 30, 2017 at 6:10 AM, weijie tong <to...@gmail.com>
wrote:

> The compile process is long when we have 20 sum or avg expression and the
> compiler is janino. But if we change the compiler to jdk,we gain lower
> compile process time. It seems jdk compiler is better .If that's tue,why
> not let jdk be the default one?
>