You are viewing a plain text version of this content. The canonical link for it is here.
Posted to bcel-user@jakarta.apache.org by "Nicholson, Jonathan O H" <jo...@essex.ac.uk> on 2007/03/07 15:33:09 UTC

Invokation and return statement analysis with BCEL

Heya guys,

I'm doing a research project in formal methods, and I'm looking into
BCEl to be able to provide me with certain information about a given
class (there are benefits, from our point of view, to class inspection
over source code inspection that I need not go into).

I have managed to program a vast majority of the features we require
pretty quickly, and I am more than glad to see the programs dependency
on CFParse disappear. The method that does the analysis is in this
format:

foreach(JavaClass c : somearray)
{
  // Inspect the class
  InstructionList list = c.getInstructionList();
  foreach(Instruction i : list.getInstructions())
  {
    switch(i.getOpcode())
    {
      // do something when certain instructions are found
    }
  }
}

There is no modification of the classes as they are processed, and
information is basically dumped into a database (currently to the screen
while debugging) as its found

Problems:
1) When a ARETURN instruction is found, I need to know the identifier
and type of the object being returned. Return type is not enough, and
from what I can see all methods in Java bytecode return
java.lang.Object, so using getType() on the ARETURN instruction doesn't
help either.
2) When an InvokeInstruction is found, I need to know the identifiers of
each of the objects used as arguments for the invoked method.

I understand the Java VM is stack based, so at first I thought it would
be logical to maintain a stack of variables so by knowing how many
things are removed from the stack (for example 1 in the case of
ARETURN... I think...) I can find out what idents/types are being used.

However I just cant get it to work. I'm not familiar enough with every
bytecode instruction to be able to do it. I have been looking at the
CodeHTML class to see how it works, but right now I can't get my head
around the logic.

If someone could help, give me example/pseudo code if you've done
something similar, direct me to a package/library that can provide me
with this information, etc, I would be very grateful.

Thanks all
Regards

Mac

---------------------------------------------------------------------
To unsubscribe, e-mail: bcel-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: bcel-user-help@jakarta.apache.org


RE: Possible bug in InstructionFinder class

Posted by "Nicholson, Jonathan O H" <jo...@essex.ac.uk>.
Ah fair enough, this was the first place I thought of to mention the
bug.

Thanks
Mac

-----Original Message-----
From: Dave Brosius [mailto:dbrosius@apache.org] 
Sent: 23 June 2007 19:57
To: BCEL Users List
Subject: Re: Possible bug in InstructionFinder class

This was reported here:
http://issues.apache.org/bugzilla/show_bug.cgi?id=40044

and fixed in svn head.


----- Original Message -----
From: "Nicholson, Jonathan O H" <jo...@essex.ac.uk>
To: "BCEL Users List" <bc...@jakarta.apache.org>
Sent: Friday, June 22, 2007 7:13 PM
Subject: Possible bug in InstructionFinder class


Hi there,

Been playing with the InstructionFinder class, and it seems that it
always 
returns an extra instruction handler in the resulting array, causing an 
exception if the last instruction being looked for is a return.

I've looked in the source (downloaded today) and found the problem,
there is 
a line in search(String, InstructionHandle, CodeConstraint) that reads:
int lenExpr = (endExpr - startExpr) + 1;

Not sure why that 1 is there... removing it makes the class work
correctly.

Hope this helps others who've had similar issues
Mac

---------------------------------------------------------------------
To unsubscribe, e-mail: bcel-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: bcel-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: bcel-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: bcel-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: bcel-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: bcel-user-help@jakarta.apache.org


Re: Possible bug in InstructionFinder class

Posted by Dave Brosius <db...@apache.org>.
This was reported here:
http://issues.apache.org/bugzilla/show_bug.cgi?id=40044

and fixed in svn head.


----- Original Message ----- 
From: "Nicholson, Jonathan O H" <jo...@essex.ac.uk>
To: "BCEL Users List" <bc...@jakarta.apache.org>
Sent: Friday, June 22, 2007 7:13 PM
Subject: Possible bug in InstructionFinder class


Hi there,

Been playing with the InstructionFinder class, and it seems that it always 
returns an extra instruction handler in the resulting array, causing an 
exception if the last instruction being looked for is a return.

I've looked in the source (downloaded today) and found the problem, there is 
a line in search(String, InstructionHandle, CodeConstraint) that reads:
int lenExpr = (endExpr - startExpr) + 1;

Not sure why that 1 is there... removing it makes the class work correctly.

Hope this helps others who've had similar issues
Mac

---------------------------------------------------------------------
To unsubscribe, e-mail: bcel-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: bcel-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: bcel-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: bcel-user-help@jakarta.apache.org


Possible bug in InstructionFinder class

Posted by "Nicholson, Jonathan O H" <jo...@essex.ac.uk>.
Hi there,

Been playing with the InstructionFinder class, and it seems that it always returns an extra instruction handler in the resulting array, causing an exception if the last instruction being looked for is a return.

I've looked in the source (downloaded today) and found the problem, there is a line in search(String, InstructionHandle, CodeConstraint) that reads:
int lenExpr = (endExpr - startExpr) + 1;

Not sure why that 1 is there... removing it makes the class work correctly.

Hope this helps others who've had similar issues
Mac

---------------------------------------------------------------------
To unsubscribe, e-mail: bcel-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: bcel-user-help@jakarta.apache.org


RE: Invokation and return statement analysis with BCEL

Posted by "Nicholson, Jonathan O H" <jo...@essex.ac.uk>.
Martin

Try this one instead:
http://jakarta.apache.org/bcel/apidocs/org/apache/bcel/verifier/structurals/
package-summary.html 

Direct off the main BCEL webpage.

Jon

-----Original Message-----
From: Martin Schoeberl [mailto:martin@jopdesign.com] 
Sent: 08 March 2007 11:56
To: BCEL Users List; arrin.daley@anu.edu.au
Subject: Re: Invokation and return statement analysis with BCEL

> There are some classes in BCEL's verifer project that will help you 
> out by doing stack simulation this will let you access the types on 
> the stack. Have a look at the org.apache.bcel.verifier.structurals
> <http://djvm/internal/bcel-5.2/apidocs/org/apache/bcel/verifier/struct
> urals/package-frame.html>

interesting, but the link is broken.

Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: bcel-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: bcel-user-help@jakarta.apache.org


Re: Invokation and return statement analysis with BCEL

Posted by Martin Schoeberl <ma...@jopdesign.com>.
> There are some classes in BCEL's verifer project that will help you out 
> by doing stack simulation this will let you access the types on the 
> stack. Have a look at the org.apache.bcel.verifier.structurals 
> <http://djvm/internal/bcel-5.2/apidocs/org/apache/bcel/verifier/structurals/package-frame.html> 

interesting, but the link is broken.

Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: bcel-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: bcel-user-help@jakarta.apache.org


RE: Invokation and return statement analysis with BCEL

Posted by "Nicholson, Jonathan O H" <jo...@essex.ac.uk>.
Thanks Arrin,

I'll have a look at the classes you suggested, I wasn't sure if a visitor
was the way to go, so far I've been mimicing the structure of the program we
had based on CFParse... And as you can imagine nested loops for my purposes
are very large and difficult to understand/maintain. Although logically
using a good design pattern such as Visitor is better than using an
anti-pattern such as what I have. [Before finishing this email I've been
playing with making a visitor and itsmuch simpler than I had originally
thought, same problems when inside methods but a lot neater code generally
and easier to see where things go, thanks for the suggestion!!!]

To answer your questions I think its probably best to give some examples.
Therefore with return statements:

public class c
{
	Object o = new Object();

	public Object someMethod(int i)
	{
		Integer myInt = new Integer(i);
		if(i < 5)
			return o;
		if(i < 10)
			return new Integer();
		if(i < 15)
			return myInt;
		if(i < 20)
			return null;
	}
}

I need to log the fact that this method returns Object, Integer and null.
I've managed to hack a way into getting this to work (under only certain
limited conditions I'm sure), but I need to extend this further. For example
if I were to return the value stored in a field of a class (see object o
above) then I would store that in one place in the database (for example
under the heading "Returned"). If the method returns a variable that has
been created during the scope of the method (either in a single line as
above or in seperate instructions, see the two return statements relating to
Integer objects) then its stored in a different part of the database (for
example under the heading "Produced"). Also for many reasons we treat
primitive types, arrays, and the null keyword as classes, so the same has to
be done for these types too.

For methods, I need to distinguish between an invokation, and a forward
operation. Say "invoker" is the method in which there is an invokation
instruction, and "invoked" is the method being called. A forward operation
is an invokation where invoker and invoked share the same signature (same
method name, number and type ...in the same order... of arguments), and the
idents defined in the invokers' arguments list are the same idents passed to
the invokation of the invoked method. I realise my explanation is not that
well defined (I'm in the process of really nailing the terminology, doing
case studies, etc), so here is a very simple example:

public class A
{	public void s(int i)
	{	B.s(i);	// forward operation
		B.s(1234);	// Invoke operation
	}
}

public class B
{	public static void s(int j)
	{ }
}

For our purposes the keyword "static" along with other pieces of information
is ignored. A.s(int) is the invoker, and B.s(int) is the invoked. I can get
the idents of the invokers' arguments, these idents do not have to match the
idents given to the invoked method (as shown). However (as you spotted
Arrin) one of more of the method arguments may not have idents, in which
case it must be an invoke operation rather than a forward one.

I hope that clarifies things a bit

So, if I never modify the class in any way, can I trust the
LocalVariableTable? I can't seem to find a way to get the pc value from what
information I'm keeping track of, so I'm having to use the depreciated
method to access the LocalVariableTable as it is...

Can anyone help me simulate the stack? I'm a little bit at a loss as to
where to start. Or...and I know it's a long shot... does anyone have sample
code that can simulate the stack they would be willing to let me see?

Again, thanks for your help Arrin :-)

Mac

-----Original Message-----
From: Arrin Daley [mailto:arrin.daley@anu.edu.au] 
Sent: 07 March 2007 23:09
To: BCEL Users List
Subject: Re: Invokation and return statement analysis with BCEL

Hi Mac

1.
What type are you trying to find out from ARETURN? All the type(s) that
could be returned? or a superclass of all the types that could be returned?

If you are trying to track down all of the types that might be returned (the
subclasses) then I'm guessing you will need to do stack simulation

There are some classes in BCEL's verifer project that will help you out by
doing stack simulation this will let you access the types on the stack. Have
a look at the org.apache.bcel.verifier.structurals
<http://djvm/internal/bcel-5.2/apidocs/org/apache/bcel/verifier/structurals/
package-frame.html>
package in BCEL. It's real purpose is verification but we have used it to
find the types returned from AALOAD operations so finding the types returned
from ARETURN should be no harder.

Your problem if your goal is to capture all types that could be returned the
current stack simulation does a merge where it finds the superclass of the
two or more subclasses found. If you extend the current ExecutionVisitor you
could modify this behaviour and record all types.
The other thing that is a little annoying is that the ExecutionVisitor uses
a InstConstraintVisitor, it seems that the JustIICE verifier has a more
strict notion of what is valid code than the original java virtual machine
so when you modify ExecutionVisitor make it so that it doesn't use a
InstConstraint visitor, otherwise you will get JustICE telling you about
errors in code the JVM passes.

2.
I'm not sure what your after here, if you find an InvokeInstruction the
arguments passed to it may have been from the stack and never had any
identifiers (that I can see, I might be wrong here) you could find out the
argument names by loading and looking up the associated method, is this what
you mean?
Looking at the LocalVariableTable might look tempting but the
LocalVariableTable is a optional debugging structure so you can't count on
it, more importantly I'm not sure it is always correct if it is present, for
instance I can edit a Method in BCEL and never effect the contents of the
LocalVariableTable.

I hope this helps, letme know if I can help some more.

Bye Arrin



Nicholson, Jonathan O H wrote:

>Heya guys,
>
>I'm doing a research project in formal methods, and I'm looking into 
>BCEl to be able to provide me with certain information about a given 
>class (there are benefits, from our point of view, to class inspection 
>over source code inspection that I need not go into).
>
>I have managed to program a vast majority of the features we require 
>pretty quickly, and I am more than glad to see the programs dependency 
>on CFParse disappear. The method that does the analysis is in this
>format:
>
>foreach(JavaClass c : somearray)
>{
>  // Inspect the class
>  InstructionList list = c.getInstructionList();
>  foreach(Instruction i : list.getInstructions())
>  {
>    switch(i.getOpcode())
>    {
>      // do something when certain instructions are found
>    }
>  }
>}
>
>There is no modification of the classes as they are processed, and 
>information is basically dumped into a database (currently to the 
>screen while debugging) as its found
>
>Problems:
>1) When a ARETURN instruction is found, I need to know the identifier 
>and type of the object being returned. Return type is not enough, and 
>from what I can see all methods in Java bytecode return 
>java.lang.Object, so using getType() on the ARETURN instruction doesn't 
>help either.
>2) When an InvokeInstruction is found, I need to know the identifiers 
>of each of the objects used as arguments for the invoked method.
>
>I understand the Java VM is stack based, so at first I thought it would 
>be logical to maintain a stack of variables so by knowing how many 
>things are removed from the stack (for example 1 in the case of 
>ARETURN... I think...) I can find out what idents/types are being used.
>
>However I just cant get it to work. I'm not familiar enough with every 
>bytecode instruction to be able to do it. I have been looking at the 
>CodeHTML class to see how it works, but right now I can't get my head 
>around the logic.
>
>If someone could help, give me example/pseudo code if you've done 
>something similar, direct me to a package/library that can provide me 
>with this information, etc, I would be very grateful.
>
>Thanks all
>Regards
>
>Mac
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: bcel-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: bcel-user-help@jakarta.apache.org
>
>  
>

--
Conventional wisdom says to know your limits. To know your limits you need
to find them first. Finding you limits generally involves getting in over
your head and hoping you live long enough to benefit from the experience.
That's the fun part.


---------------------------------------------------------------------
To unsubscribe, e-mail: bcel-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: bcel-user-help@jakarta.apache.org


Re: Invokation and return statement analysis with BCEL

Posted by Arrin Daley <ar...@anu.edu.au>.
Hi Mac

1.
What type are you trying to find out from ARETURN? All the type(s) that 
could be returned? or a superclass of all the types that could be returned?

If you are trying to track down all of the types that might be returned 
(the subclasses) then I'm guessing you will need to do stack simulation

There are some classes in BCEL's verifer project that will help you out 
by doing stack simulation this will let you access the types on the 
stack. Have a look at the org.apache.bcel.verifier.structurals 
<http://djvm/internal/bcel-5.2/apidocs/org/apache/bcel/verifier/structurals/package-frame.html> 
package in BCEL. It's real purpose is verification but we have used it 
to find the types returned from AALOAD operations so finding the types 
returned from ARETURN should be no harder.

Your problem if your goal is to capture all types that could be returned 
the current stack simulation does a merge where it finds the superclass 
of the two or more subclasses found. If you extend the current 
ExecutionVisitor you could modify this behaviour and record all types.
The other thing that is a little annoying is that the ExecutionVisitor 
uses a InstConstraintVisitor, it seems that the JustIICE verifier has a 
more strict notion of what is valid code than the original java virtual 
machine so when you modify ExecutionVisitor make it so that it doesn't 
use a InstConstraint visitor, otherwise you will get JustICE telling you 
about errors in code the JVM passes.

2.
I'm not sure what your after here, if you find an InvokeInstruction the 
arguments passed to it may have been from the stack and never had any 
identifiers (that I can see, I might be wrong here) you could find out 
the argument names by loading and looking up the associated method, is 
this what you mean?
Looking at the LocalVariableTable might look tempting but the 
LocalVariableTable is a optional debugging structure so you can't count 
on it, more importantly I'm not sure it is always correct if it is 
present, for instance I can edit a Method in BCEL and never effect the 
contents of the LocalVariableTable.

I hope this helps, letme know if I can help some more.

Bye Arrin



Nicholson, Jonathan O H wrote:

>Heya guys,
>
>I'm doing a research project in formal methods, and I'm looking into
>BCEl to be able to provide me with certain information about a given
>class (there are benefits, from our point of view, to class inspection
>over source code inspection that I need not go into).
>
>I have managed to program a vast majority of the features we require
>pretty quickly, and I am more than glad to see the programs dependency
>on CFParse disappear. The method that does the analysis is in this
>format:
>
>foreach(JavaClass c : somearray)
>{
>  // Inspect the class
>  InstructionList list = c.getInstructionList();
>  foreach(Instruction i : list.getInstructions())
>  {
>    switch(i.getOpcode())
>    {
>      // do something when certain instructions are found
>    }
>  }
>}
>
>There is no modification of the classes as they are processed, and
>information is basically dumped into a database (currently to the screen
>while debugging) as its found
>
>Problems:
>1) When a ARETURN instruction is found, I need to know the identifier
>and type of the object being returned. Return type is not enough, and
>from what I can see all methods in Java bytecode return
>java.lang.Object, so using getType() on the ARETURN instruction doesn't
>help either.
>2) When an InvokeInstruction is found, I need to know the identifiers of
>each of the objects used as arguments for the invoked method.
>
>I understand the Java VM is stack based, so at first I thought it would
>be logical to maintain a stack of variables so by knowing how many
>things are removed from the stack (for example 1 in the case of
>ARETURN... I think...) I can find out what idents/types are being used.
>
>However I just cant get it to work. I'm not familiar enough with every
>bytecode instruction to be able to do it. I have been looking at the
>CodeHTML class to see how it works, but right now I can't get my head
>around the logic.
>
>If someone could help, give me example/pseudo code if you've done
>something similar, direct me to a package/library that can provide me
>with this information, etc, I would be very grateful.
>
>Thanks all
>Regards
>
>Mac
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: bcel-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: bcel-user-help@jakarta.apache.org
>
>  
>

-- 
Conventional wisdom says to know your limits. To know your limits you 
need to find them first. Finding you limits generally involves getting
in over your head and hoping you live long enough to benefit from the 
experience. That's the fun part.


---------------------------------------------------------------------
To unsubscribe, e-mail: bcel-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: bcel-user-help@jakarta.apache.org