You are viewing a plain text version of this content. The canonical link for it is here.

Posted to kato-spec@incubator.apache.org by Stuart Monteith <st...@stoo.me.uk> on 2009/11/24 13:53:45 UTC

Diagnosing JNI problems

Hello,
     I've just been looking at JNI and how we go about diagnosing 
crashes within that.
There are a number of weaknesses in this area in the API and the 
implementation of the
CJVMTI agent

In the API we are unable to:
     1. Determine the call stack through VM, JNI and native stack 
frames, with the correct
         interleaving. For example:

native      pthread_cond_wait()
native      Java_java_lang_Object_wait()
JNI         Object.wait()
Java        TestArticle.halt()

     2. We have no information on local and global variables in native code.
         This would would very useful for diagnosing native problem. Of 
course,
         if we could do that, we would essentially have gdb in Java.

     3. The JavaClass API does not easily allow us to determine the 
native function that implements
         a native method. While we can retrieve bytecode and compiled 
code sections, there is nothing
         to indicate where native code is held.

The implementation has issues too:

     1. There is very little to no native information. Native threads 
are entirely missing, for instance.

     2. The CJVMTI agent is not invoked during a JVM crash, which means 
that they cannot be diagnosed
         using our RI as it stands just now. There are the core file 
readers, but we'd need to decide
         how they would operate.

Practicably, unless we can address the implementation issues, the API 
issues are moot.
Given that, I'd propose that we concentrate on the Java API, and remove 
the Image API parts that cannot
be implemented, which will be unavoidable if we are to have a credible 
RI for the TCK.


Regards,
     Stuart

-- 
Stuart Monteith
http://blog.stoo.me.uk/

Re: Diagnosing JNI problems

Posted by Steve Poole <sp...@googlemail.com>.

On Wed, Nov 25, 2009 at 2:27 PM, Bobrovsky, Konstantin S <
konstantin.s.bobrovsky@intel.com> wrote:

> Hi Stuart, all,
>
> >Given that, I'd propose that we concentrate on the Java API, and remove
> >the Image API parts that cannot
> >be implemented, which will be unavoidable if we are to have a credible
> >RI for the TCK.
>
> I did not actually see API parts which are principally not implementable -
> maybe I overlooked something - could you please list them?
>
> Can RI just throw some "DataUnavailable" exceptions in where it is hard to
> implement something quickly for the "non-coredump" mode (which is the
> primary focus now, as I can see)?
>
>

Yes its possible that an implementation can do that - my general concern
though is about having too many methods , interfaces, classes etc that do
nothing in the RI.   If we have a specification in which a few methods are
not implemented in the RI I'd be comfortable but not if the number becomes
excessive.   Historically for JSRs the RI is almost always the  only
implementation.   The RI then has to be a credible , useful, implementation
running against the Sun JVM.

 Building an implementation that uses core files requires understanding of
the data structures inside the JVM.   That information is available in
OpenJDK but the licence is wrong for Apache and the amount of work is
substantial.

BTW, I tried a simple JNI-involving application on hotspot/ia32, and I can
> see that it reports interleaving frames of native JNI methods correctly up
> to the last Java frame:
>
> Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
> j  ClassToLoad.nativeMethodWhichCrashes()V+0
> j  ClassToLoad.<clinit>()V+8
> v  ~StubRoutines::call_stub
> j  JNITest.nativeMethod1()V+0
> j  JNITest.JAVA_intermediateCall0()V+8
> j  JNITest.JAVAMethod()V+8
> v  ~StubRoutines::call_stub
> j  JNITest.nativeMethod2()V+0
> v  ~StubRoutines::call_stub
> j  JNITest.nativeMethod0()V+0
> j  JNITest.main([Ljava/lang/String;)V+0
> v  ~StubRoutines::call_stub
>
> (I can send you the test sources if you want). I did not try it on JRockit
> or any other JVM though.
>
> >     3. The JavaClass API does not easily allow us to determine the
> >native function that implements
> >         a native method. While we can retrieve bytecode and compiled
> >code sections, there is nothing
> >         to indicate where native code is held.
>
> Again, for hotspot it is pretty feasible. Pointer to the code of native
> method is kept within a structure corresponding to this method.
>
> What I can see is that CJVMTI-agent-produced dumps can not provide all the
> information necessary to implement every part of current Kato API. But with
> core files much more data is available, so we should not drop API parts
> which can't be implemented based on one of the artifact producers.
>
> BTW, in the beginning of this JSR we spoke of possibility of using Sun's
> Serviceability Agent implementation as another basis for implementation.
> Does anyone know of the status on this?
>
> Nicholas Sterling worked hard for us with Sun to get the SA interface
amended so that it was covered under the classpath exception  (
http://openjdk.java.net/legal/gplv2+ce.html )   Unfortunately this was not
possible.   To be honest though  we did think that the runtime restrictions
imposed by SA (ie need to run  with  same build of JVM)   was going to cause
us problems and reduce adoption.


Thanks,
> Konst
>
> Intel Novosibirsk
> Closed Joint Stock Company Intel A/O
> Registered legal address: Krylatsky Hills Business Park,
> 17 Krylatskaya Str., Bldg 4, Moscow 121614,
> Russian Federation
>
>
> >-----Original Message-----
> >From: Stuart Monteith [mailto:stukato@stoo.me.uk]
> >Sent: Tuesday, November 24, 2009 7:54 PM
> >To: kato-spec@incubator.apache.org
> >Subject: Diagnosing JNI problems
> >
> >Hello,
> >     I've just been looking at JNI and how we go about diagnosing
> >crashes within that.
> >There are a number of weaknesses in this area in the API and the
> >implementation of the
> >CJVMTI agent
> >
> >In the API we are unable to:
> >     1. Determine the call stack through VM, JNI and native stack
> >frames, with the correct
> >         interleaving. For example:
> >
> >native      pthread_cond_wait()
> >native      Java_java_lang_Object_wait()
> >JNI         Object.wait()
> >Java        TestArticle.halt()
> >
> >     2. We have no information on local and global variables in native
> >code.
> >         This would would very useful for diagnosing native problem. Of
> >course,
> >         if we could do that, we would essentially have gdb in Java.
> >
> >     3. The JavaClass API does not easily allow us to determine the
> >native function that implements
> >         a native method. While we can retrieve bytecode and compiled
> >code sections, there is nothing
> >         to indicate where native code is held.
> >
> >The implementation has issues too:
> >
> >     1. There is very little to no native information. Native threads
> >are entirely missing, for instance.
> >
> >     2. The CJVMTI agent is not invoked during a JVM crash, which means
> >that they cannot be diagnosed
> >         using our RI as it stands just now. There are the core file
> >readers, but we'd need to decide
> >         how they would operate.
> >
> >Practicably, unless we can address the implementation issues, the API
> >issues are moot.
> >Given that, I'd propose that we concentrate on the Java API, and remove
> >the Image API parts that cannot
> >be implemented, which will be unavoidable if we are to have a credible
> >RI for the TCK.
> >
> >
> >Regards,
> >     Stuart
> >
> >--
> >Stuart Monteith
> >http://blog.stoo.me.uk/
>
>


-- 
Steve

RE: Diagnosing JNI problems

Posted by "Bobrovsky, Konstantin S" <ko...@intel.com>.

Hi Stuart, all,

>Given that, I'd propose that we concentrate on the Java API, and remove
>the Image API parts that cannot
>be implemented, which will be unavoidable if we are to have a credible
>RI for the TCK.

I did not actually see API parts which are principally not implementable - maybe I overlooked something - could you please list them?

Can RI just throw some "DataUnavailable" exceptions in where it is hard to implement something quickly for the "non-coredump" mode (which is the primary focus now, as I can see)?

BTW, I tried a simple JNI-involving application on hotspot/ia32, and I can see that it reports interleaving frames of native JNI methods correctly up to the last Java frame:

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j  ClassToLoad.nativeMethodWhichCrashes()V+0
j  ClassToLoad.<clinit>()V+8
v  ~StubRoutines::call_stub
j  JNITest.nativeMethod1()V+0
j  JNITest.JAVA_intermediateCall0()V+8
j  JNITest.JAVAMethod()V+8
v  ~StubRoutines::call_stub
j  JNITest.nativeMethod2()V+0
v  ~StubRoutines::call_stub
j  JNITest.nativeMethod0()V+0
j  JNITest.main([Ljava/lang/String;)V+0
v  ~StubRoutines::call_stub

(I can send you the test sources if you want). I did not try it on JRockit or any other JVM though.

>     3. The JavaClass API does not easily allow us to determine the
>native function that implements
>         a native method. While we can retrieve bytecode and compiled
>code sections, there is nothing
>         to indicate where native code is held.

Again, for hotspot it is pretty feasible. Pointer to the code of native method is kept within a structure corresponding to this method.

What I can see is that CJVMTI-agent-produced dumps can not provide all the information necessary to implement every part of current Kato API. But with core files much more data is available, so we should not drop API parts which can't be implemented based on one of the artifact producers. 

BTW, in the beginning of this JSR we spoke of possibility of using Sun's Serviceability Agent implementation as another basis for implementation. Does anyone know of the status on this?

Thanks,
Konst
 
Intel Novosibirsk
Closed Joint Stock Company Intel A/O
Registered legal address: Krylatsky Hills Business Park, 
17 Krylatskaya Str., Bldg 4, Moscow 121614, 
Russian Federation
 

>-----Original Message-----
>From: Stuart Monteith [mailto:stukato@stoo.me.uk]
>Sent: Tuesday, November 24, 2009 7:54 PM
>To: kato-spec@incubator.apache.org
>Subject: Diagnosing JNI problems
>
>Hello,
>     I've just been looking at JNI and how we go about diagnosing
>crashes within that.
>There are a number of weaknesses in this area in the API and the
>implementation of the
>CJVMTI agent
>
>In the API we are unable to:
>     1. Determine the call stack through VM, JNI and native stack
>frames, with the correct
>         interleaving. For example:
>
>native      pthread_cond_wait()
>native      Java_java_lang_Object_wait()
>JNI         Object.wait()
>Java        TestArticle.halt()
>
>     2. We have no information on local and global variables in native
>code.
>         This would would very useful for diagnosing native problem. Of
>course,
>         if we could do that, we would essentially have gdb in Java.
>
>     3. The JavaClass API does not easily allow us to determine the
>native function that implements
>         a native method. While we can retrieve bytecode and compiled
>code sections, there is nothing
>         to indicate where native code is held.
>
>The implementation has issues too:
>
>     1. There is very little to no native information. Native threads
>are entirely missing, for instance.
>
>     2. The CJVMTI agent is not invoked during a JVM crash, which means
>that they cannot be diagnosed
>         using our RI as it stands just now. There are the core file
>readers, but we'd need to decide
>         how they would operate.
>
>Practicably, unless we can address the implementation issues, the API
>issues are moot.
>Given that, I'd propose that we concentrate on the Java API, and remove
>the Image API parts that cannot
>be implemented, which will be unavoidable if we are to have a credible
>RI for the TCK.
>
>
>Regards,
>     Stuart
>
>--
>Stuart Monteith
>http://blog.stoo.me.uk/