You are viewing a plain text version of this content. The canonical link for it is here.

Posted to derby-dev@db.apache.org by Kathey Marsden <km...@sbcglobal.net> on 2005/03/17 20:57:45 UTC

Any ideas for debugging a byte code generation problem

Hi,

I know this question is a bit vague but  I just have a little
information right now and am looking for clues on where to start.

I am working on a linkage error with a generated class.  I have the
class but don't have a reproduction for the problem right now.    I
would typically decompile with jad which will usually decompile what it
can even if there is a problem,  but I  get this.

$ jad *class
Parsing acf04340b7x0102xb1d7x6a62xffffd5573ece33.class...This class has
a non-class tag 1
Super class has a non-class tag 1
ItemCollectionInvalidIndex: constants: requested 26636, limit 26635

The source of the trouble comes from a huge query with many UNION
ALL's.   The class is huge 2.4MB.  Anyone have any ideas about how to
get at the contents of a corrupted class like this?

Thanks

Kathey

Re: Any ideas for debugging a byte code generation problem

Posted by Kathey Marsden <km...@sbcglobal.net>.

RPost wrote:

>Kathey Marsden <km...@...> writes:
>
>  
>
>>>Another thing I would try is paring down the query to make it as small
>>>as possible while still reproducing the problem.
>>>
>>>
>>>      
>>>
>>Aye, there's the rub.  In general with these byte code generation
>>problems it is the size of the query that is the problem.  We are
>>dealing with strict JVM specification  limits on things like method
>>sizes.   So, we split methods into smaller methods and then we have too
>>many constants.  Robbing Peter to pay Paul only works for so long.
>>
>>
>>    
>>
>
>I create a java class with 30,000 constants of the form:
>
>int i1= 1,i2= 1,i3= 1,i4= 1,i5= 1,i6= 1,i7= 1,i8= 1,i9= 1,i10= 1;
>
>Java 1.3.1 compiles it ok but jad 1.5.7g gives an error when trying to 
>decompile it. A dialog box titled 'Program Error' with 'jad.exe has generated 
>errors and will be closed by Windows. You will need to restart the program. An 
>error log is being created.'
>
>Couldn't find anything in the JVM spec about a limit on constants
>
>
>
>
>
>  
>
I actually should have said constant pool entries for which there is a
limit.
http://java.sun.com/docs/books/vmspec/2nd-edition/html/ClassFile.doc.html#88659.
Seem to be passed that issue now which was not due to too much method
splitting but other factors and now need to split up one of those >
65535 methods.   I am  actually working on an older release of
Cloudscape but the issues are relevant to Derby, so we'll have to
address them here too.    Currently in our architecture  there is  a
poorly defined upper bound on the complexity of queries due to  the
limits on the byte code generated.

Re: Any ideas for debugging a byte code generation problem

Posted by Kathey Marsden <km...@sbcglobal.net>.

Kathey Marsden wrote:

>    1) As I mentioned I am actually working  on an older release of
>Cloudscape so don't have the changes ported to Derby that I talk about
>in the notes so can't post them here right now, but wanted to get as
>much community input as possible as some sort of solution will need to
>come to Derby.
>  
>
Let me put this code here too which would go in BCMethod. It iis for the
automatic detection of zero arg method overflows which took care of the
constructor problem.  It would be really nice if we could find a
solution like this for all methods. instead of the having the checks in
the language layer.   The problem is it doesn't work if the stackDepth
is not 0 and even though it could be tweaked for stack depth 1 that does
not appear to be the case for fillResultSet.


There are checks in callMethod and endstatement like this.
if (stackDepth == 0)
                overflowMethodCheck();

// check to see if the method is getting close to the
    // limt and if required overflow to a another method.
    // Currently only called if the stackDepth is zero.
    private void overflowMethodCheck()
    {
        if (handlingOverflow)
            return;
       
        // don't sub method in the middle of a conditional
        if (condition != null)
            return;
       
        int currentCodeSize = myCode.getRelativePC();
        if (currentCodeSize < 55000)
            return;
       
        // only handle no-arg methods at the moment.
        if (parameters != null)
        {
            if (parameters.length != 0)
                return;
        }
       
        int modifiers = myEntry.getModifier();
       
        // too lazy to handle static methods.
        if ((modifiers & Modifier.STATIC) != 0)
            return;
       
        // System.out.println("NEED TO SPLIT " + myName + "  " +
currentCodeSize);
       
        String subMethodName = myName + "_sub" +
Integer.toString(subMethodCount++);
        BCMethod subMethod = (BCMethod)
cb.newMethodBuilder(myEntry.getModifier(),
                myReturnType, subMethodName);
        subMethod.thrownExceptions = this.thrownExceptions;
       
        //subMethod.methodReturn();
        //subMethod.complete();
       
        // stop any recursion
        handlingOverflow = true;
       
        // in this method make a call to the sub method we will
        // be transferring control to.
        this.pushThis();
        this.callMethod(VMOpcode.INVOKEVIRTUAL, (String) null,
subMethodName, myReturnType, 0);
   
        // and return its value, works just as well for a void method!
        this.methodReturn();
        this.complete();
       
        handlingOverflow = false;
       
        // now the tricky bit, make this object take over the
        // code etc. from the sub method. This is done so
        // that any code that has a reference to this MethodBuilder
        // will continue to work. They will be writing code into the
        // new sub method.
       
        this.myEntry = subMethod.myEntry;
        this.myCode = subMethod.myCode;
        this.currentVarNum = subMethod.currentVarNum;
        this.statementNum = subMethod.statementNum;
       
        // copy stack info
        this.stackTypes = subMethod.stackTypes;
        this.stackTypeOffset = subMethod.stackTypeOffset;
        this.maxStack = subMethod.maxStack;
        this.stackDepth = subMethod.stackDepth;
    }

Re: Any ideas for debugging a byte code generation problem

Posted by Daniel John Debrunner <dj...@debrunners.com>.

Kathey Marsden wrote:

> Jeremy Boynes wrote:
> 
> 
>>Kathey Marsden wrote:
>>
>>
>>>I could post my REALLY rough notes on my crash byte code generation
>>>course with Dan if anyone wants to see them,  filled with holes and
>>>probably as many untruths in the translation, but it is probably better
>>>for Dan to post when he gets back.
>>>
>>
>>Please post them, especially if Dan is going to be away for a few weeks.

[Kathey's notes deleted]

I'm going to migrate the changes Kathey & I made in the Cloudscape 5.1
codebase to the Derby trunk.

I'll add them with notes as comments to Derby-176

http://issues.apache.org/jira/browse/DERBY-176

Dan.

Re: Any ideas for debugging a byte code generation problem

Posted by Kathey Marsden <km...@sbcglobal.net>.

Jeremy Boynes wrote:

> Kathey Marsden wrote:
>
>>
>> I could post my REALLY rough notes on my crash byte code generation
>> course with Dan if anyone wants to see them,  filled with holes and
>> probably as many untruths in the translation, but it is probably better
>> for Dan to post when he gets back.
>>
>
> Please post them, especially if Dan is going to be away for a few weeks.
>
> -- 
> Jeremy
>

OK. Here they are, kind of a stream of consciousness thing with notes
cut and pasted.  A  few issues.  

    1) As I mentioned I am actually working  on an older release of
Cloudscape so don't have the changes ported to Derby that I talk about
in the notes so can't post them here right now, but wanted to get as
much community input as possible as some sort of solution will need to
come to Derby.
    2) I also don't have a reproduction on Derby  but make your self a
super duper query with lots of subqueries and unions and you'll too
could get a java linkage error.
    3) My current outstanding issue is with fillResultSet being too
large.   Ideally it would be fixed centrally like Dan was able to do
with the constructor,  but it seems the state of the stack would prevent
that.

 I am thinking that my fix for fillResultSet will be to have the 
smarter statementNumHitLimit use the code size and possibly be used in
other places and make continuation methods like Dan did for the
constructor.  It would be much better to have it centralized but I don't
see how to do that right now for fill ResultSet.

Essentially this goes at the top of  statementNumHitLimit
int currentCodeSize = myCode.getRelativePC();

        if (currentCodeSize > (50000))
            return true;

        if (currentCodeSize < 5000 || statementNum < 5)
        {
            statementNum += noStatementsAdded;
            return false;
        }

-----------------------------------------------------------------------------------------------------------------------
Notes:     Please forgive the loosie goosieness of it all...

VM Spec - The Java Virtual Machine Specification - Tim Lindholm - Frank
Yellin
    ISBN 0-201-63453 -X

Constant Pool has 
    max 65535 entries
    12 types - The primatives + UTF8String
    long and double take up 2 slots.  Not clear why
    entries are given an index starting with 1 (0 reserved/not used?)
    class name is represented by UTF8 Constant
    index entry has pointer s to constant pool entry
    this.class -> index class_info[] -> UTF8
    each entry is a tag with data

If you have field  int x, you have
         UTF8 - "X"    // name lets say at index  77
            UTF8 - "I"    // type (integer) at index 83

You have
field_info
    access_flags  // e.g private/public
    name_index
    descriptor_index
    attributes

so there are 8 - 12 bytes for every field

Byte code for using variable
    getfield
    field_ref indexbyte1 ->index into class info index for class ??
              indexbyte2 -> Name and Type constant entry

    so the reference for x above points to 77 for the name
    and 83 for the type

There are att least 3 constant pool entries per variable   

String constants points  to string info which creates 2 entries
methods are string names concatonated so take up a lot of space
(everybody gets an org/apache/derby)

---------------------------------
ClassHolder is the entry point to an active class
addEntry shows where constant entries are added.
You can do a new Exception().printStackTrace() to see where constant
entries are coming from.

we don't create byte code for local variables so everything gets added
as a field.

java compiler ClassBuilder  ??? Don't know what I was writing here

----------------------------------------------
On first pass Dan saw a lot of constant pool entries getting
 created in this  optimization.  where we call DataValueFactory to reuse
DVD holder.
    x = DataValueFactory.getValue(expr, x)   // reuse holder

changing the second argument x to null a DataValueDescriptor for x  would
get newed up each time, but he would get past the the constant pool problem.
One issue is that even though x is really just a local variable never to
be used elsewhere it is defined as a field and just takes up space. 
Derby doesn't create local variables in it's byte code.

solution was to add a new method to ExpressionClassBuilder.
newFieldDeclarationOptional.  It is like newFieldDeclaration but intead
of alwayas returning a LocalField it returns null if we have passed some
reasonable limit  (2000) entries.  See reusableBoolean in
BinaryLogicalOperatorNode for sample usage.  Dan ran nist and found that
none of the queries exceeded this limit.

-----------------------------------------------------------------

VM is stack based.
    words on stack
    operations on the stack
    pushes values on stack
    pops them off the stack.
    push
        creates constant pool entry
        byte, index into pool  ???

So if  we have
    this.x(foo(), bar())
we essentially create temporary variables for the foo() and bar() return
valuesthat end up as constant pool entries.
replace temp variables with swaps.  My notes make no sense here, need to
complete

    we would push this
    call foo
    push this
    call bar
    <finish>

Dan's comment was this

"Remove a use of a generated local field where manipulating values on
the stack is sufficient.
    Reduces number of constant pool entries, a local field requires at
least three constant pool entries.

    Generated code used to be equivalent to
    // instance field
    <type> right;

    right = <right expr>;
    right.method(<left expr>, right);

    Now generated code uses the stack to store two copies of right
    rather than an instance field
"
----------------------------------------------------

Field and method names can safely overlap in Java so Dan's next step was
to share the namespace for expressions and fields.
We used to have
Expressions - take no args. return Object
    e#
    e0 - e9 are preset expressions
    e10 - en - other expressions
Fields
    f0 - fn

Other methods (argument methods)
    g0 - gn

Now the fields go in the e# name space.  The f# namespace is gone.

-----------------------------------------

Ultimately it would be good  to throw a Standard Exception when we reach
this point.  There is not really a mechanism to do that during the byte
code generation.
    For now  generate an IOException  in ClassFormatOutput stream with
    a Terse JVM Spec description of the limit problem.
    Ultimately catch it in  language and throw a StandardException
    with query too complex and chain this one.
    Checks are in writeXXX methods of ClasFormatOutput.java

---------------------------------------

Each class has execute and Constructor.  The Constructor is called once
when the Activation is created.  The actual work is done by post
Constructor.

we want to create a continuation constructor so.
public void postConstructor()
{
    // 238k worth of code
}

to

public void postConstructor()
{
    // 50k worth of code

    subConstructor0();
}

private void subConstructor0()
{
    // 50k worth of code

    subConstructor1();
}

private void subConstructor1()
{
    // 50k worth of code

    subConstructor2();
}

at endStatement is a good time to check if the  stack is empty and call
our continuation constructor if we need it.

About endStatement
    if we have x=3
    putField leaves a value on the stack.
    endStatement will pop it off

Note: There seemed to be an inital attempt to automatically break up the
constructors with the method statementNumHitLimit().  It needs to be
stmarter about the codesize instead of the number of statements and then
make continuation constructors based on that.

Tried to  make continuation constructor. 
    Code in ExpressionClassBuilder
    if (big) create another MethodBuilder with same signature.
    Make existing method call to new method
    complete old constructor
    set it up so acb.getConstructor returns our new constructor from now
on        So .. . from now on the get the new method and start writing
to it.
    Seems safe for the constructor.

This didn't work!  Because there were still some callers with references
to the constructor.  It was too risky to try to find them all.   Dan's
next solution was this...
"I think I have the generic split working (under the covers of
MethodBuilder)
for methods that take zero arguments and whose stack depth drops to zero
sometime after they reach 55000 code size.

This works for the constructor (postConstructor) but fillResultSet doesn't
trigger it, most likely as the stack is never 0.
Two changes

1) overflowMethodCheck method & calls in BCMethod.
     Checks to see if the method is getting too big and creates
a sub method as we did earlier today. But rather than being specific
it bases the method on the current method, modifiers and return type.

This method is called when the stack depth is 0, and only when the
stack depth can be zero, e.g. not called when pushing values onto the
stack as the depth cannot be zero, so only called when the stack depth
is being reduced.

It looks like what we did today, but once the current method has been
completed
the actual BCMethod takes on the identity of the new submethod. Thus leaving
the callers unaware of any change (same reference will now add code into
the sub method).

1a) putFieldPop() in MethodBuild/BCMethod. Does not leave the value of
the field on the stack. Reduces code size by not duplicatiing the value
of the field on the stack only to pop it later with an endStatement by
the caller.
Was about 8% of the code in postConstructor.

2) Calls to putFieldPop(). basically replacing

putField();
endStatement();

sequences with

putFieldPop();
// endStatement();
"

--------------------------------------------------------------------
There is one 236K method left.

It comes forom the fillResultSet method.
It returns a value so it might be tricky.
If it has 0 args as before.
With one arg, you have to create a new method with a param of that type.
e.g
ResultSet fillResultSet()

if you have 0 or 1 arg on the stack it is somewhat easy, 0 args as
before, 1 arg means you have to create
a new method with a param of that type
e.g.
ResultSet fillResultSet()
{
  // much code
// escape to avoid limit
return FRS_0(arg);
}
private FRS_0(<type that happened to be on stack>)
{
 etc.
}
FRS_0 would return a ResultSet

---- One more change and note from Dan
"Has the postConstructor() split working automatically, actually
any method that gets big (55,000 byte code size), has no parameters
and whose stack depth drops to zero after reaching 55,000 bytes.

fillResultSet does not fall into this category.

It might be worth changing the stackDepth check on the calls to the
overflow method in BCMEthod to be <= 1 rather than == 0. Then in the
method check that stack depth is 0 after the 55000 check.

Then if the stack depth is not zero and size > 55000 then print the
stack depth
to see if fillResultSet ever gets to a 1 stack depth. Ie. while running
the query.

If it it does then you need to modify the overflow
method to.

get the declared type of the top (only) stack word, say <stype>

make the sub method have a one parameter of <stype>

then just call the sub-type method as it currently done, BUT
perform a swap after the pushing of this, and then set 1 arg passed, not 0.
"

I performed the indicated checks and it doesn't meet the prerquisites.

Re: Any ideas for debugging a byte code generation problem

Posted by Jeremy Boynes <jb...@apache.org>.

Kathey Marsden wrote:
> 
> I could post my REALLY rough notes on my crash byte code generation
> course with Dan if anyone wants to see them,  filled with holes and
> probably as many untruths in the translation, but it is probably better
> for Dan to post when he gets back.
> 

Please post them, especially if Dan is going to be away for a few weeks.

--
Jeremy

Re: Any ideas for debugging a byte code generation problem

Posted by Kathey Marsden <km...@sbcglobal.net>.

Andrew McIntyre wrote:

>On Sun, 20 Mar 2005 02:25:17 +0000 (UTC), RPost <rp...@pacbell.net> wrote:
>  
>
>>Couldn't find anything in the JVM spec about a limit on constants
>>    
>>
> 
>The limit is 65535 entries in the constant pool, as the index into the
>constant pool is an unsigned short. See:
>
>http://java.sun.com/docs/books/vmspec/2nd-edition/html/ClassFile.doc.html#88659
>
>Not sure if this would help with debugging the problem, but I modified
>the Unabstract class that was a part of my jdk142-only compile patch
>to dump the constant pool. From the errors in Kathey's original email,
>it appears that there may be too few constants in the constant pool
>and the indexes for the class info and superclass info in the constant
>pool are incorrect (tag 1, UTF8, instead of tag 7, classinfo). If the
>constant pool has too few entries, then in the output of
>DumpConstantPool, the last few entries in the constant pool will read
>"error!". If the constant pool has more entries than the index value,
>then the output for the access flags and class info will be obviously
>incorrect. If the constant pool actually has the correct number of
>entries, then maybe the class info was just written out with bad
>indexes.
>
>andrew
>  
>
Dan helped me with the constant pool problem yesterday.  There were too
many entries in the constant pool.  The jad output seemed to be the
actual number % 65535.   He saw a few areas where we could improve in
this area.    I think I would feel more comfortable letting him comment
on them since I am really a newbie in this area and am bound to mess it
up.  He is away for a few weeks now but promises a post to Derby on the
topic when he gets back.
In general  a couple of the big reclamation items were:

Reuse single set of names for expression methods and fields in generated
code
Add new method that allows optional creation of a generated field for
holder purposes.

I could post my REALLY rough notes on my crash byte code generation
course with Dan if anyone wants to see them,  filled with holes and
probably as many untruths in the translation, but it is probably better
for Dan to post when he gets back.

With the query I am dealing with, there are also issues with method
sizes.   Dan suggested a generic split working (under the covers of
MethodBuilder) for methods that take zero arguments and whose stack
depth drops to zero
sometime after they reach 55000 code size.

 The one I am grappling with is fillResultSet which is already a spin
off of execute because of a byte code generation problem and is 236K at
the moment.  It does not fall into the category above because the stack
is not empty.   I don't really have questions intelligent enough to ask
right now.  But if anyone has ideas and input I really welcome it. 

Thanks

Kathey

Re: Any ideas for debugging a byte code generation problem

Posted by Andrew McIntyre <mc...@gmail.com>.

On Sun, 20 Mar 2005 02:25:17 +0000 (UTC), RPost <rp...@pacbell.net> wrote:
> Couldn't find anything in the JVM spec about a limit on constants
 
The limit is 65535 entries in the constant pool, as the index into the
constant pool is an unsigned short. See:

http://java.sun.com/docs/books/vmspec/2nd-edition/html/ClassFile.doc.html#88659

Not sure if this would help with debugging the problem, but I modified
the Unabstract class that was a part of my jdk142-only compile patch
to dump the constant pool. From the errors in Kathey's original email,
it appears that there may be too few constants in the constant pool
and the indexes for the class info and superclass info in the constant
pool are incorrect (tag 1, UTF8, instead of tag 7, classinfo). If the
constant pool has too few entries, then in the output of
DumpConstantPool, the last few entries in the constant pool will read
"error!". If the constant pool has more entries than the index value,
then the output for the access flags and class info will be obviously
incorrect. If the constant pool actually has the correct number of
entries, then maybe the class info was just written out with bad
indexes.

andrew

Re: Any ideas for debugging a byte code generation problem

Posted by RPost <rp...@pacbell.net>.

Kathey Marsden <km...@...> writes:

> 
> >
> > Another thing I would try is paring down the query to make it as small
> > as possible while still reproducing the problem.
> >
> >
> Aye, there's the rub.  In general with these byte code generation
> problems it is the size of the query that is the problem.  We are
> dealing with strict JVM specification  limits on things like method
> sizes.   So, we split methods into smaller methods and then we have too
> many constants.  Robbing Peter to pay Paul only works for so long.
> 
> 

I create a java class with 30,000 constants of the form:

int i1= 1,i2= 1,i3= 1,i4= 1,i5= 1,i6= 1,i7= 1,i8= 1,i9= 1,i10= 1;

Java 1.3.1 compiles it ok but jad 1.5.7g gives an error when trying to 
decompile it. A dialog box titled 'Program Error' with 'jad.exe has generated 
errors and will be closed by Windows. You will need to restart the program. An 
error log is being created.'

Couldn't find anything in the JVM spec about a limit on constants

Re: Any ideas for debugging a byte code generation problem

Posted by Kathey Marsden <km...@sbcglobal.net>.

>
> Another thing I would try is paring down the query to make it as small
> as possible while still reproducing the problem.
>
>
Aye, there's the rub.  In general with these byte code generation
problems it is the size of the query that is the problem.  We are
dealing with strict JVM specification  limits on things like method
sizes.   So, we split methods into smaller methods and then we have too
many constants.  Robbing Peter to pay Paul only works for so long.

Re: Any ideas for debugging a byte code generation problem

Posted by Jeffrey Lichtman <sw...@rcn.com>.

>The source of the trouble comes from a huge query with many UNION
>ALL's.   The class is huge 2.4MB.  Anyone have any ideas about how to
>get at the contents of a corrupted class like this?

It looks like jad is complaining about too many constants in the class. 
This is the first thing I would look at - is the limit of 26635 constants a 
Java limit or a jad limit? If it's a Java limit, you may have found the 
problem. If it's a jad limit, you might be able to get jad to work by 
increasing the size of an internal table (assuming you have the source).

Another thing I would try is paring down the query to make it as small as 
possible while still reproducing the problem.


                        -        Jeff Lichtman
                                 swazoo@rcn.com
                                 Check out Swazoo Koolak's Web Jukebox at
                                 http://swazoo.com/