You are viewing a plain text version of this content. The canonical link for it is here.
Posted to pylucene-dev@lucene.apache.org by Ludovico Cavedon <lu...@gmail.com> on 2009/06/18 02:29:05 UTC

JCC on HtmlUnit

Hi,
I tried to run JCC on HtmlUnit [1].
I managed to get it working, but I had to hack JCC code. Here are the
issues I had: I think it would be worth to fix them on the JCC codebase.

I am using the latest snapshot of JCC

* If I get an error
jcc.cpp.JavaError: java.lang.ExceptionInInitializerError
I actually added a "print className" to understand which class was
causing the error

* org.mozilla.javascript.SecureCaller and
com.gargoylesoftware.htmlunit.javascript.host.ActiveXObjectImpl cause a
jcc.cpp.JavaError: java.lang.ExceptionInInitializerError
If I use --exclude it does not work either, because findClass() is
called anyway and it will trigger the exception
My workaround was by skipping the classes in the cpp.py code. I could
not understand why the though that exception...

* org.mozilla.javascript.ScriptableObject defined constants like:
static int READONLY
However the file /usr/include/python2.6/structmember.h contains:
#define READONLY        1
which will replace "READONLY" with "1" when compiling the code generated
by JCC
How do you think it should be handled?
My workaround was to "#undef XXX" before every "static yyy XXX".

* org.mozilla.javascript.ScriptableObject defines a method "typeof()",
which conflicts with C++ "typeof" keyword. SImilar problem as above...
I skipped methods named "typeof" in cpp.py and python.py.

For the rest, seems to be working great; i think JCC is a very
interesting project!

Thanks,
Ludovico

[1] http://htmlunit.sourceforge.net/

Re: JCC on HtmlUnit

Posted by Andi Vajda <va...@apache.org>.
On Wed, 19 Aug 2009, Ludovico Cavedon wrote:

> On Thu, Jun 18, 2009 at 7:49 PM, Andi Vajda<va...@apache.org> wrote:
>> Instead, I fixed half of bug PYLUCENE-1 [1] by including the stacktrace of a
>> Java exception into the string representation of the Python error object
>> wrapping it. When a JavaError is reported, the corresponding java stacktrace
>> is now included in the error message.
>
> This is useful!
>
> What about not aborting in case of findClass exception, but just
> printing the the class name and the exception? (see attachment).
>
> In HtmlUnit there are two classes that raise an exception, but I do
> not care about them working (and the --exclude parameter has no effect
> on findClass...)

Instead I changed the excludes set to no longer contain classes but their 
names. A class listed with --exclude is no longer loaded unless it's a 
dependency, of course. And in that case, skipping over the error is not an 
option anyway.

This is committed in rev 806057.

Andi..


Re: JCC on HtmlUnit

Posted by Ludovico Cavedon <lu...@gmail.com>.
On Thu, Jun 18, 2009 at 7:49 PM, Andi Vajda<va...@apache.org> wrote:
> Instead, I fixed half of bug PYLUCENE-1 [1] by including the stacktrace of a
> Java exception into the string representation of the Python error object
> wrapping it. When a JavaError is reported, the corresponding java stacktrace
> is now included in the error message.

This is useful!

What about not aborting in case of findClass exception, but just
printing the the class name and the exception? (see attachment).

In HtmlUnit there are two classes that raise an exception, but I do
not care about them working (and the --exclude parameter has no effect
on findClass...)

Thanks,
Ludovico

Re: JCC on HtmlUnit

Posted by Andi Vajda <va...@apache.org>.
On Thu, 18 Jun 2009, Andi Vajda wrote:

> I think a better fix would be to wrap the error reported by findClass with 
> more context information. A plain class not found error already reports the 
> failed class's name:
>
>  File 
> "/Users/vajda/tmp/Python-2.6.2/install/Python.framework/Versions/2.6/lib/python2.6/site-packages/JCC-2.3-py2.6-macosx-10.5-i386.egg/jcc/cpp.py", 
> line 370, in jcc
>    for className in excludes])
> jcc.cpp.JavaError: java.lang.NoClassDefFoundError: 
> org/apache/lucene/queryParser/Token1
>
> I'm going to implement some findClass error wrapping now.

Instead, I fixed half of bug PYLUCENE-1 [1] by including the stacktrace of a 
Java exception into the string representation of the Python error object 
wrapping it. When a JavaError is reported, the corresponding java stacktrace 
is now included in the error message.

This is checked into rev 786355 and requires JCC and PyLucene rebuilds to 
take effect.

Andi..

[1] https://issues.apache.org/jira/browse/PYLUCENE-1

Re: JCC on HtmlUnit

Posted by Andi Vajda <va...@apache.org>.
On Thu, 18 Jun 2009, Ludovico Cavedon wrote:

> Andi Vajda wrote:
>> On Wed, 17 Jun 2009, Ludovico Cavedon wrote:
>>>
>>> * If I get an error
>>> jcc.cpp.JavaError: java.lang.ExceptionInInitializerError
>>> I actually added a "print className" to understand which class was
>>> causing the error
>
> In JCCEnv.cpp I had to decomment the
> 	if (!env->handlers)
> in order to execute
> 	vm_env->ExceptionDescribe();
> and see the java backtrace. I think it would be useful to print it by
> default.

It's actually tricky to get right. Printing them by default will get you 
lots of errors that are properly handled such as failed method look ups in 
JCC itself.

I think a better fix would be to wrap the error reported by findClass with 
more context information. A plain class not found error already reports the 
failed class's name:

   File "/Users/vajda/tmp/Python-2.6.2/install/Python.framework/Versions/2.6/lib/python2.6/site-packages/JCC-2.3-py2.6-macosx-10.5-i386.egg/jcc/cpp.py", line 370, in jcc
     for className in excludes])
jcc.cpp.JavaError: java.lang.NoClassDefFoundError: org/apache/lucene/queryParser/Token1

I'm going to implement some findClass error wrapping now.

Andi..

Re: JCC on HtmlUnit

Posted by Andi Vajda <va...@apache.org>.
On Thu, 18 Jun 2009, Ludovico Cavedon wrote:

> There is a bunch of #define coming from
> /usr/include/python2.6/structmember.h:
>
> 'READONLY', 'T_SHORT', 'T_INT', 'T_LONG', 'T_FLOAT', 'T_DOUBLE',
> 'T_STRING', 'T_OBJECT', 'T_CHAR', 'T_BYTE', 'T_UBYTE', 'T_USHORT',
> 'T_UINT', 'T_ULONG', 'T_STRING_INPLACE', 'T_BOOL', 'T_OBJECT_EX',
> 'T_LONGLONG', 'T_ULONGLONG', 'T_PYSSIZET', 'READONLY', 'RO',
> 'READ_RESTRICTED', 'PY_WRITE_RESTRICTED', 'RESTRICTED'
>
> I think it would useful to add them to the RESERVED list by default.
> I am getting conflicts with READONLY and all T_* at least

There are potentially thousands of things defined in header files included 
by a JCC compilation. These are only a problem when someone names a class, 
method or field using a similar name. Luckily, naming conventions in Java 
land differ from those in C/C++ land so clashes are usually rare. When a 
clash occurs, the --reserved command line arg is your friend.

Andi..


Re: JCC on HtmlUnit

Posted by Ludovico Cavedon <lu...@gmail.com>.
Andi Vajda wrote:
> On Wed, 17 Jun 2009, Ludovico Cavedon wrote:
>>
>> * If I get an error
>> jcc.cpp.JavaError: java.lang.ExceptionInInitializerError
>> I actually added a "print className" to understand which class was
>> causing the error

In JCCEnv.cpp I had to decomment the
	if (!env->handlers)
in order to execute
	vm_env->ExceptionDescribe();
and see the java backtrace. I think it would be useful to print it by
default.

It would also be usefult to print the class name in case of exception,
see attached patch.

>> * org.mozilla.javascript.SecureCaller and
>> com.gargoylesoftware.htmlunit.javascript.host.ActiveXObjectImpl cause a
>> jcc.cpp.JavaError: java.lang.ExceptionInInitializerError
>> If I use --exclude it does not work either, because findClass() is
>> called anyway and it will trigger the exception
>> My workaround was by skipping the classes in the cpp.py code. I could
>> not understand why the though that exception...
> 
> It could be that the class could not be loaded because of some native
> code required by the class in a shared library that was not found. This
> would need to be fixed by adding yet another command line flag that adds
> to the initVM() call in cpp.py a java.lib.path to use with the VM.

For the com.jacob.activeX.ActiveXComponent you were right, it is failing
because of a  java.lang.ClassNotFoundException.

About the net.sourceforge.htmlunit.corejs.javascript.SecureCaller class
something more weird is going on; I am posting the exception here, in
case someone has a clue of what is going on :)
<<<<<<<<<<<<
Exception in thread "main" java.lang.ExceptionInInitializerError
Caused by: java.lang.NullPointerException
	at
net.sourceforge.htmlunit.corejs.javascript.SecureCaller.loadBytecodePrivileged(SecureCaller.java:172)
	at
net.sourceforge.htmlunit.corejs.javascript.SecureCaller.access$100(SecureCaller.java:55)
	at
net.sourceforge.htmlunit.corejs.javascript.SecureCaller$3.run(SecureCaller.java:162)
	at java.security.AccessController.doPrivileged(Native Method)
	at
net.sourceforge.htmlunit.corejs.javascript.SecureCaller.loadBytecode(SecureCaller.java:158)
	at
net.sourceforge.htmlunit.corejs.javascript.SecureCaller.<clinit>(SecureCaller.java:57)
>>>>>>>>>>


>> * org.mozilla.javascript.ScriptableObject defined constants like:
>> static int READONLY
>> However the file /usr/include/python2.6/structmember.h contains:
>> #define READONLY        1
> 
> You can add that word as a reserved word using the --reserved flag. Or,
> you can edit the RESERVED list in cpp.py and add that word to it for good.

Ah, I missed that RESERED list (and --resevred was also on the help  :(
Worked thanks!


There is a bunch of #define coming from
/usr/include/python2.6/structmember.h:

'READONLY', 'T_SHORT', 'T_INT', 'T_LONG', 'T_FLOAT', 'T_DOUBLE',
'T_STRING', 'T_OBJECT', 'T_CHAR', 'T_BYTE', 'T_UBYTE', 'T_USHORT',
'T_UINT', 'T_ULONG', 'T_STRING_INPLACE', 'T_BOOL', 'T_OBJECT_EX',
'T_LONGLONG', 'T_ULONGLONG', 'T_PYSSIZET', 'READONLY', 'RO',
'READ_RESTRICTED', 'PY_WRITE_RESTRICTED', 'RESTRICTED'

I think it would useful to add them to the RESERVED list by default.
I am getting conflicts with READONLY and all T_* at least

I have another problem I forgot to mention; i get this gcc error:
<<<<<<<<<<<<
/var/lib/python-support/python2.6/jcc/sources/functions.h: In function
‘PyObject* get_iterator_next(T*) [with T = java::util::t_Iterator, U =
com::gargoylesoftware::htmlunit::html:
:t_HtmlTableCell, V =
com::gargoylesoftware::htmlunit::html::HtmlTableCell]’:
build/_htmlunit/__wrap__.cpp:112089:   instantiated from here
/var/lib/python-support/python2.6/jcc/sources/functions.h:116: error: no
match for ‘operator=’ in ‘next = java::util::Iterator::next() const()’
build/_htmlunit/com/gargoylesoftware/htmlunit/html/HtmlTableCell.h:27:
note: candidates are:
com::gargoylesoftware::htmlunit::html::HtmlTableCell&
com::gargoylesoftware::htmlunit::html::HtmlTableCell::operator=(const
com::gargoylesoftware::htmlunit::html::HtmlTableCell&)
>>>>>>>>>>>>

where build/_htmlunit/__wrap__.cpp:112089:
<<<<<<<
DECLARE_TYPE(HtmlTableRow$CellIterator, t_HtmlTableRow$CellIterator,
java::lang::Object, HtmlTableRow$CellIterator,
t_HtmlTableRow$CellIterator_init_, PyObject_SelfIter, ((PyObject
*(*)(java::util::t_Iterator *))
get_iterator_next<java::util::t_Iterator,com::gargoylesoftware::htmlunit::html::t_HtmlTableCell,com::gargoylesoftware::htmlunit::html::HtmlTableCell>),
0, 0, 0);
>>>>>>>>

Looks like it is doing wrong with HtmlTableRow.CellIterator:
http://htmlunit.sourceforge.net/apidocs/com/gargoylesoftware/htmlunit/html/HtmlTableRow.CellIterator.html


Thank you for your help,
Ludovico


Re: JCC on HtmlUnit

Posted by Andi Vajda <va...@apache.org>.
On Wed, 17 Jun 2009, Ludovico Cavedon wrote:

> I tried to run JCC on HtmlUnit [1].
> I managed to get it working, but I had to hack JCC code. Here are the
> issues I had: I think it would be worth to fix them on the JCC codebase.

Definitely. If you could send in a patch with your changes, that would be 
best. More comments inline.

> I am using the latest snapshot of JCC
>
> * If I get an error
> jcc.cpp.JavaError: java.lang.ExceptionInInitializerError
> I actually added a "print className" to understand which class was
> causing the error
>
> * org.mozilla.javascript.SecureCaller and
> com.gargoylesoftware.htmlunit.javascript.host.ActiveXObjectImpl cause a
> jcc.cpp.JavaError: java.lang.ExceptionInInitializerError
> If I use --exclude it does not work either, because findClass() is
> called anyway and it will trigger the exception
> My workaround was by skipping the classes in the cpp.py code. I could
> not understand why the though that exception...

It could be that the class could not be loaded because of some native code 
required by the class in a shared library that was not found. This would 
need to be fixed by adding yet another command line flag that adds to the 
initVM() call in cpp.py a java.lib.path to use with the VM.

> * org.mozilla.javascript.ScriptableObject defined constants like:
> static int READONLY
> However the file /usr/include/python2.6/structmember.h contains:
> #define READONLY        1
> which will replace "READONLY" with "1" when compiling the code generated
> by JCC
> How do you think it should be handled?
> My workaround was to "#undef XXX" before every "static yyy XXX".

You can add that word as a reserved word using the --reserved flag. Or, you 
can edit the RESERVED list in cpp.py and add that word to it for good.

> * org.mozilla.javascript.ScriptableObject defines a method "typeof()",
> which conflicts with C++ "typeof" keyword. SImilar problem as above...
> I skipped methods named "typeof" in cpp.py and python.py.

Ugh, that 'typeof' reserved word should definitely be in the RESERVED word 
list in cpp.py. I'll add it now.

> For the rest, seems to be working great; i think JCC is a very
> interesting project!

Thanks !

Andi..