You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by RPost <rp...@pacbell.net> on 2004/12/22 23:45:04 UTC

Two poorly optimized functions in SQLParser.java

Does anyone have any experience or knowledge about the JavaCC process used to produce SQLParser.java?

There are two functions, jj_3R_315() and jj_3R_338(), in SQLParser.java that are very poorly optimized.

jj_3R_315 contains around 115 nested IF statements and jj_3R_338 contains over 200 nested IF statements.

The statement:    PreparedStatement pstmt = con.prepareStatement("insert into table1 values (?)");

winds up calling jj_3R_315 three times. For each call the current token position is at the "(" and the function executes the entire chain of 115 calls:

    if (jj_scan_token(208)) {
    jj_scanpos = xsp;
    if (jj_scan_token(209)) {
    jj_scanpos = xsp;
    ... over 100 more nested if statements before finishing.

A stack trace shows that the actual call to jj_3R_315 is about 15 layers deep:

jj_3_9
  jj_3R_51
    jj_3R_91
      jj_3R_167, which calls 204, 261, 289, 310, 322, 333, 337, 343, jj_3R_315

Most of the calling functions do no more than call one function and returns its boolean result.

I have not been able to create code that calls the jj_3R_338 function but it is similar in structure with a depth of over 200 nested 'if' statements. Each 'if' calls the jj_scan_token() function.

The sysinfo for this is:


D:\cloudscape_derby\test>d:\jdk1.3.1\bin\java -classpath .;..\lib\derby.jar;..\lib\derbynet.jar;..\lib\derbytools.jar org.apache.derby.tools.sysinfo 
------------------ Java Information ------------------
Java Version:    1.3.1
Java Vendor:     Sun Microsystems Inc.
Java home:       d:\jdk1.3.1\jre
Java classpath:  .;..\lib\derby.jar;..\lib\derbynet.jar;..\lib\derbytools.jar
OS name:         Windows 2000
OS architecture: x86
OS version:      5.0
Java user name:  Administrator
Java user home:  C:\Documents and Settings\Administrator
Java user dir:   D:\cloudscape_derby\test
--------- Derby Information --------
[D:\cloudscape_derby\lib\derby.jar] 10.0.2.1 - (56458)
[D:\cloudscape_derby\lib\derbynet.jar] 10.0.2.1 - (56458)
[D:\cloudscape_derby\lib\derbytools.jar] 10.0.2.1 - (56458)
------------------------------------------------------
----------------- Locale Information -----------------
------------------------------------------------------

Each of the three executions appear to take less than 1ms so I don't know how long it is really taking to execute the functions but these functions should probably be produced with 'switch' statements like many of the other functions in this file are.

Preparation time for the 'insert' statement averages about 250ms. Inserting 100 records with:

    for (int i = 0; i < 100; i++)
    {
      pstmt.setInt(1, i);
      lStart = System.currentTimeMillis();
      pstmt.execute();
      lEnd   = System.currentTimeMillis();
      System.out.println(i + " " + (lEnd - lStart));
    }

averages about 16ms for each of the 1st 30 records and then about 266ms for each of the remaining 70 records.


Re: Two poorly optimized functions in SQLParser.java

Posted by RPost <rp...@pacbell.net>.
Thanks Jeffrey. That's the info I was looking for.

I will supply a call stack from Derby. I will also try to find the spot in
JavaCC that is spitting out the JJ_3R_315 method to see what part of
sqlgrammer.jj it is interpreting.

----- Original Message ----- 
From: "Jeffrey Lichtman" <sw...@rcn.com>
To: "Derby Development" <de...@db.apache.org>
Sent: Monday, December 27, 2004 11:45 AM
Subject: Re: Two poorly optimized functions in SQLParser.java


>
> >I am merely suggesting that it would be more efficient if JavaCC produced
> >'switch' statements for these rather than the nested 'if' statements.
> >
> >So far, I haven't been able to determine how to modify the sqlgrammar.jj
> >file to achieve this.
>
> It's time for me to out myself. I wrote most of sqlgrammar.jj, along with
a
> lot of the other Cloudscape/Derby language code. I no longer work on
> Cloudscape or Derby, but I'm still interested in it.
>
> The code in SQLParser.java consists mostly of rules from sqlgrammar.jj and
> lookahead code. I suspect the massive nested if statements are for
lookahead.
>
> sqlgrammar.jj uses a single token of lookahead by default. The first
> implementation of sqlgrammar.jj used numeric lookahead to disambiguate the
> grammar. That is, where javacc would report that more tokens of lookahead
> were needed, I would put in LOOKAHEAD(2) (or 3 or 4, or whatever javacc
> suggested). This led to a very large parser.
>
> The javacc development team told me that the size of the parser would be
> reduced significantly if, in those places where more than one token of
> lookahead is needed, we used semantic lookahead rather than numeric
> lookahead. What that means is that, where the grammar is ambiguous with a
> single token of lookahead, we use a hand-written method for lookahead
> rather than tell javacc to look ahead two or more tokens. When I made this
> change, the number of generated methods starting with "jj_" shrank
> significantly.
>
> I suspect the ugly massive nested if statements in SQLParser.java have to
> do with lookahead. I thought that maybe someone had put in some numeric
> lookahead, but I didn't find any such thing when I looked through
> sqlgrammar.jj. It's possible that I missed something, though.
>
> In any case, it would help to have the complete call stack for the methods
> in question. I am particularly interested in which grammar rules are
> calling these methods. Given that, I might be able to figure out what's
> going on by looking at sqlgrammar.jj
>
>
>                         -        Jeff Lichtman
>                                  swazoo@rcn.com
>                                  Check out Swazoo Koolak's Web Jukebox at
>                                  http://swazoo.com/
>


Re: Two poorly optimized functions in SQLParser.java

Posted by Jeffrey Lichtman <sw...@rcn.com>.
>I am merely suggesting that it would be more efficient if JavaCC produced
>'switch' statements for these rather than the nested 'if' statements.
>
>So far, I haven't been able to determine how to modify the sqlgrammar.jj
>file to achieve this.

It's time for me to out myself. I wrote most of sqlgrammar.jj, along with a 
lot of the other Cloudscape/Derby language code. I no longer work on 
Cloudscape or Derby, but I'm still interested in it.

The code in SQLParser.java consists mostly of rules from sqlgrammar.jj and 
lookahead code. I suspect the massive nested if statements are for lookahead.

sqlgrammar.jj uses a single token of lookahead by default. The first 
implementation of sqlgrammar.jj used numeric lookahead to disambiguate the 
grammar. That is, where javacc would report that more tokens of lookahead 
were needed, I would put in LOOKAHEAD(2) (or 3 or 4, or whatever javacc 
suggested). This led to a very large parser.

The javacc development team told me that the size of the parser would be 
reduced significantly if, in those places where more than one token of 
lookahead is needed, we used semantic lookahead rather than numeric 
lookahead. What that means is that, where the grammar is ambiguous with a 
single token of lookahead, we use a hand-written method for lookahead 
rather than tell javacc to look ahead two or more tokens. When I made this 
change, the number of generated methods starting with "jj_" shrank 
significantly.

I suspect the ugly massive nested if statements in SQLParser.java have to 
do with lookahead. I thought that maybe someone had put in some numeric 
lookahead, but I didn't find any such thing when I looked through 
sqlgrammar.jj. It's possible that I missed something, though.

In any case, it would help to have the complete call stack for the methods 
in question. I am particularly interested in which grammar rules are 
calling these methods. Given that, I might be able to figure out what's 
going on by looking at sqlgrammar.jj


                        -        Jeff Lichtman
                                 swazoo@rcn.com
                                 Check out Swazoo Koolak's Web Jukebox at
                                 http://swazoo.com/ 


Re: Two poorly optimized functions in SQLParser.java

Posted by Dain Sundstrom <ds...@gluecode.com>.
On Dec 27, 2004, at 1:14 PM, RPost wrote:

> Now your the one that's kidding right?
>
> You gave a great answer. Too bad it wasn't related to the question.
>
> Which was:
>
>> Does anyone have any experience or knowledge about the JavaCC process
>> used to produce SQLParser.java?
>
> I will assume your answer to that one is 'no'.

Actually, I have extensive experience with javacc, but as Daniel 
pointed out it is a waste of time and effort to optimize non-critical 
code paths.  Basically, you have a trade off between fast code and 
complex code (i.e. more expensive to maintain and improve), so unless 
it affects real world performance, I personally will always choose the 
simple code.

-dain


Re: Two poorly optimized functions in SQLParser.java

Posted by RPost <rp...@pacbell.net>.
Now your the one that's kidding right?

You gave a great answer. Too bad it wasn't related to the question.

Which was:

> Does anyone have any experience or knowledge about the JavaCC process
> used to produce SQLParser.java?

I will assume your answer to that one is 'no'.

I'm trying to determine why JavaCC is spitting out code with 115 and 200
nested IFs rather than the (much) more efficient SWITCH statement it puts
out for most of the other code performing similar functionality.

I have generally found in the past that 115 executions of a function takes
longer than one execution-:)

----- Original Message ----- 
From: "Dain Sundstrom" <ds...@gluecode.com>
To: "Derby Development" <de...@db.apache.org>
Sent: Monday, December 27, 2004 11:41 AM
Subject: Re: Two poorly optimized functions in SQLParser.java


> On Dec 27, 2004, at 11:13 AM, RPost wrote:
>
> > I don't see how the presence or absence of a JIT compiler is relevant.
>
> Your kidding, right?  If not, please take some time to look at VM
> benchmarks over the last few years.  It shows that JITs have added
> several orders of magnitude to the performance of Java.
>
> > These extremely nested 'if' statements are part of source code
> > (SQLParser.java) that is being produced by JavaCC. The source file is
> > then
> > compiled during build to a class file.
>
> If you can convert it into a switch statement, so can the JIT compiler.
>
> > Depending on the JVM being used at run-time there may not even be a JIT
> > compiler.
>
> Every major VM has a JIT compiler, so there we should play to them.  If
> the vm doesn't have a jit, you have much bigger issues.
>
> > The decompiled class file, produced by either Sun's provided 'javac' or
> > Jikes, also shows a nested if with a separate function call as shown
> > in the
> > source code.
> >
> > I am merely suggesting that it would be more efficient if JavaCC
> > produced
> > 'switch' statements for these rather than the nested 'if' statements.
> >
> > So far, I haven't been able to determine how to modify the
> > sqlgrammar.jj
> > file to achieve this.
>
> Seriously, if you believe that this is a real problem with performance,
> prove it.  This is a policy we have had in the geronimo project since
> the beginning.  There have been several times where people believed
> that certain code was slow, tried to prove it, and it turned out it was
> another chunk of code.  Also, without data you don't know if your
> solution is optional (or possibly worse).  Anyways, I suggest you time
> the code in isolation as it stands, and then hand modify the generated
> code as you describe above and time it again.
>
> IMNOHO, only if it is a true provable performance problem should the
> code be changed to address the problem.  This is because normally
> addressing such issues introduces complexity, and this complexity adds
> a tax to all future development.  Also, complex code is more difficult
> for a JIT to understand so it can cut you off from future innovation in
> JIT technology (ask that guys that wrote the first XML parsers in
> java).
>
> -dain
>


Re: Two poorly optimized functions in SQLParser.java

Posted by Daniel John Debrunner <dj...@debrunners.com>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Dain Sundstrom wrote:


> Seriously, if you believe that this is a real problem with performance,
> prove it.  This is a policy we have had in the geronimo project since
> the beginning.  There have been several times where people believed that
> certain code was slow, tried to prove it, and it turned out it was
> another chunk of code.

Another way to look at it is to say what difference would it make to
real world applications if you made this piece of code, say, ten times
faster?

Hopefully any well written JDBC application spends less than 1% of its
time compiling SQL statements. The compile phase is made up of four
stages, parse, bind, optimise and generate. Parse time is probably the
quickest stage, let's say 15% of the compile time. So if your ten times
improvement made parsing twice as fast, you will only improve overall
performance by less than 0.1%.

Traditionally performance work on Cloudscape mainly focussed on making
execution time improvements, not compile time improvements.

Dan.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFB0G7YIv0S4qsbfuQRAjINAKCXGD1xnjiBHwut1cSUb4czjR/HKwCdHC9t
wAt5N7d7p8oRaOVh8s0kjeE=
=/ohv
-----END PGP SIGNATURE-----


Re: Two poorly optimized functions in SQLParser.java

Posted by Dain Sundstrom <ds...@gluecode.com>.
On Dec 27, 2004, at 11:13 AM, RPost wrote:

> I don't see how the presence or absence of a JIT compiler is relevant.

Your kidding, right?  If not, please take some time to look at VM 
benchmarks over the last few years.  It shows that JITs have added 
several orders of magnitude to the performance of Java.

> These extremely nested 'if' statements are part of source code
> (SQLParser.java) that is being produced by JavaCC. The source file is 
> then
> compiled during build to a class file.

If you can convert it into a switch statement, so can the JIT compiler.

> Depending on the JVM being used at run-time there may not even be a JIT
> compiler.

Every major VM has a JIT compiler, so there we should play to them.  If 
the vm doesn't have a jit, you have much bigger issues.

> The decompiled class file, produced by either Sun's provided 'javac' or
> Jikes, also shows a nested if with a separate function call as shown 
> in the
> source code.
>
> I am merely suggesting that it would be more efficient if JavaCC 
> produced
> 'switch' statements for these rather than the nested 'if' statements.
>
> So far, I haven't been able to determine how to modify the 
> sqlgrammar.jj
> file to achieve this.

Seriously, if you believe that this is a real problem with performance, 
prove it.  This is a policy we have had in the geronimo project since 
the beginning.  There have been several times where people believed 
that certain code was slow, tried to prove it, and it turned out it was 
another chunk of code.  Also, without data you don't know if your 
solution is optional (or possibly worse).  Anyways, I suggest you time 
the code in isolation as it stands, and then hand modify the generated 
code as you describe above and time it again.

IMNOHO, only if it is a true provable performance problem should the 
code be changed to address the problem.  This is because normally 
addressing such issues introduces complexity, and this complexity adds 
a tax to all future development.  Also, complex code is more difficult 
for a JIT to understand so it can cut you off from future innovation in 
JIT technology (ask that guys that wrote the first XML parsers in 
java).

-dain


Re: Two poorly optimized functions in SQLParser.java

Posted by RPost <rp...@pacbell.net>.
I don't see how the presence or absence of a JIT compiler is relevant.

These extremely nested 'if' statements are part of source code
(SQLParser.java) that is being produced by JavaCC. The source file is then
compiled during build to a class file.

Depending on the JVM being used at run-time there may not even be a JIT
compiler.

The decompiled class file, produced by either Sun's provided 'javac' or
Jikes, also shows a nested if with a separate function call as shown in the
source code.

I am merely suggesting that it would be more efficient if JavaCC produced
'switch' statements for these rather than the nested 'if' statements.

So far, I haven't been able to determine how to modify the sqlgrammar.jj
file to achieve this.

----- Original Message ----- 
From: "Dain Sundstrom" <ds...@gluecode.com>
To: "Derby Development" <de...@db.apache.org>
Sent: Monday, December 27, 2004 10:55 AM
Subject: Re: Two poorly optimized functions in SQLParser.java


You really never know what is going to happen to code once it is passes
through the JIT compiler.  The "if" statement may be converted to a
switch statement.  The "if" statement may be inlined.  The only way to
tell is to time the code in isolation.  Can you write a test case that
tests just the parser code in isolation?  If you can write a isolated
test, don't forget to run the test twice to force the JIT compiler to
kick in and then throw out the first result.  One final thing, a VM
can't really accurately measure a single period shorter then 500 ms.  I
normally do some work in a for loop with a few thousand or million
iterations depending on the thing being measured.  I time the entire
loop, and divid by the number of iterations to get to the time for a
single loop (normally in nanoseconds).

-dain

On Dec 22, 2004, at 2:45 PM, RPost wrote:

> Does anyone have any experience or knowledge about the JavaCC process
> used to produce SQLParser.java?
>
> There are two functions, jj_3R_315() and jj_3R_338(), in
> SQLParser.java that are very poorly optimized.
>
> jj_3R_315 contains around 115 nested IF statements and jj_3R_338
> contains over 200 nested IF statements.
>
> The statement: PreparedStatement pstmt =
> con.prepareStatement("insert into table1 values (?)");
> winds up calling jj_3R_315 three times. For each call the current
> token position is at the "(" and the function executes the entire
> chain of 115 calls:
>
> if (jj_scan_token(208)) {
> jj_scanpos = xsp;
> if (jj_scan_token(209)) {
> jj_scanpos = xsp;
> ... over 100 more nested if statements before finishing.
>
> A stack trace shows that the actual call to jj_3R_315 is about 15
> layers deep:
>
> jj_3_9
> jj_3R_51
> jj_3R_91
> jj_3R_167, which calls 204, 261, 289, 310, 322, 333, 337, 343,
> jj_3R_315
>
> Most of the calling functions do no more than call one function and
> returns its boolean result.
>
> I have not been able to create code that calls the jj_3R_338 function
> but it is similar in structure with a depth of over 200 nested 'if'
> statements. Each 'if' calls the jj_scan_token() function.
>
> The sysinfo for this is:
>
>
>
> D:\cloudscape_derby\test>d:\jdk1.3.1\bin\java -classpath
> .;..\lib\derby.jar;..\lib\derbynet.jar;..\lib\derbytools.jar
> org.apache.derby.tools.sysinfo
>  ------------------ Java Information ------------------
> Java Version: 1.3.1
> Java Vendor: Sun Microsystems Inc.
> Java home: d:\jdk1.3.1\jre
> Java classpath:
> .;..\lib\derby.jar;..\lib\derbynet.jar;..\lib\derbytools.jar
> OS name: Windows 2000
> OS architecture: x86
> OS version: 5.0
> Java user name: Administrator
> Java user home: C:\Documents and Settings\Administrator
> Java user dir: D:\cloudscape_derby\test
> --------- Derby Information --------
> [D:\cloudscape_derby\lib\derby.jar] 10.0.2.1 - (56458)
> [D:\cloudscape_derby\lib\derbynet.jar] 10.0.2.1 - (56458)
> [D:\cloudscape_derby\lib\derbytools.jar] 10.0.2.1 - (56458)
> ------------------------------------------------------
> ----------------- Locale Information -----------------
> ------------------------------------------------------
> Each of the three executions appear to take less than 1ms so I don't
> know how long it is really taking to execute the functions but these
> functions should probably be produced with 'switch' statements like
> many of the other functions in this file are.
>
> Preparation time for the 'insert' statement averages about 250ms.
> Inserting 100 records with:
>
> for (int i = 0; i < 100; i++)
> {
> pstmt.setInt(1, i);
> lStart = System.currentTimeMillis();
> pstmt.execute();
> lEnd = System.currentTimeMillis();
> System.out.println(i + " " + (lEnd - lStart));
> }
> averages about 16ms for each of the 1st 30 records and then about
> 266ms for each of the remaining 70 records.
>
>
>


Re: Two poorly optimized functions in SQLParser.java

Posted by Dain Sundstrom <ds...@gluecode.com>.
You really never know what is going to happen to code once it is passes 
through the JIT compiler.  The "if" statement may be converted to a 
switch statement.  The "if" statement may be inlined.  The only way to 
tell is to time the code in isolation.  Can you write a test case that 
tests just the parser code in isolation?  If you can write a isolated 
test, don't forget to run the test twice to force the JIT compiler to 
kick in and then throw out the first result.  One final thing, a VM 
can't really accurately measure a single period shorter then 500 ms.  I 
normally do some work in a for loop with a few thousand or million 
iterations depending on the thing being measured.  I time the entire 
loop, and divid by the number of iterations to get to the time for a 
single loop (normally in nanoseconds).

-dain

On Dec 22, 2004, at 2:45 PM, RPost wrote:

> Does anyone have any experience or knowledge about the JavaCC process 
> used to produce SQLParser.java?
>  
> There are two functions, jj_3R_315() and jj_3R_338(), in 
> SQLParser.java that are very poorly optimized.
>  
> jj_3R_315 contains around 115 nested IF statements and jj_3R_338 
> contains over 200 nested IF statements.
>  
> The statement:    PreparedStatement pstmt = 
> con.prepareStatement("insert into table1 values (?)");
> winds up calling jj_3R_315 three times. For each call the current 
> token position is at the "(" and the function executes the entire 
> chain of 115 calls:
>  
>     if (jj_scan_token(208)) {
>     jj_scanpos = xsp;
>     if (jj_scan_token(209)) {
>     jj_scanpos = xsp;
>     ... over 100 more nested if statements before finishing.
>  
> A stack trace shows that the actual call to jj_3R_315 is about 15 
> layers deep:
>  
> jj_3_9
>   jj_3R_51
>     jj_3R_91
>       jj_3R_167, which calls 204, 261, 289, 310, 322, 333, 337, 343, 
> jj_3R_315
>  
> Most of the calling functions do no more than call one function and 
> returns its boolean result.
>  
> I have not been able to create code that calls the jj_3R_338 function 
> but it is similar in structure with a depth of over 200 nested 'if' 
> statements. Each 'if' calls the jj_scan_token() function.
>  
> The sysinfo for this is:
>  
>
>
> D:\cloudscape_derby\test>d:\jdk1.3.1\bin\java -classpath 
> .;..\lib\derby.jar;..\lib\derbynet.jar;..\lib\derbytools.jar 
> org.apache.derby.tools.sysinfo
>  ------------------ Java Information ------------------
> Java Version:    1.3.1
> Java Vendor:     Sun Microsystems Inc.
> Java home:       d:\jdk1.3.1\jre
> Java classpath:  
> .;..\lib\derby.jar;..\lib\derbynet.jar;..\lib\derbytools.jar
> OS name:         Windows 2000
> OS architecture: x86
> OS version:      5.0
> Java user name:  Administrator
> Java user home:  C:\Documents and Settings\Administrator
> Java user dir:   D:\cloudscape_derby\test
> --------- Derby Information --------
> [D:\cloudscape_derby\lib\derby.jar] 10.0.2.1 - (56458)
> [D:\cloudscape_derby\lib\derbynet.jar] 10.0.2.1 - (56458)
> [D:\cloudscape_derby\lib\derbytools.jar] 10.0.2.1 - (56458)
> ------------------------------------------------------
> ----------------- Locale Information -----------------
> ------------------------------------------------------
> Each of the three executions appear to take less than 1ms so I don't 
> know how long it is really taking to execute the functions but these 
> functions should probably be produced with 'switch' statements like 
> many of the other functions in this file are.
>  
> Preparation time for the 'insert' statement averages about 250ms. 
> Inserting 100 records with:
>  
>     for (int i = 0; i < 100; i++)
>     {
>       pstmt.setInt(1, i);
>       lStart = System.currentTimeMillis();
>       pstmt.execute();
>       lEnd   = System.currentTimeMillis();
>       System.out.println(i + " " + (lEnd - lStart));
>     }
> averages about 16ms for each of the 1st 30 records and then about 
> 266ms for each of the remaining 70 records.
>  
>
>