You are viewing a plain text version of this content. The canonical link for it is here.

Posted to pylucene-dev@lucene.apache.org by Michael McCandless <lu...@mikemccandless.com> on 2009/03/13 11:34:55 UTC

a few issues with 2.4.1 RC3

I'm playing with 2.4.1 RC3 (on OS X 10.5.6) and found a few issues:

   * If I fail to call lucene.initVM, I get a rather unfriendly Bus
     Error.  Is it possible (desirable?) to detect this and throw a
     friendly exception instead?

   * When I hit an exception in Java, the carryover to Python fails to
     include the full stack trace (sources & line numbers) from Java,
     which makes debugging harder.  Is that normal?

   * I'm attempting to re-use a field, by changing its value, and then
     adding the document to an index:

         lucene.initVM(lucene.CLASSPATH)

         writer = lucene.IndexWriter(lucene.RAMDirectory(),
                                     lucene.StandardAnalyzer())
         doc = lucene.Document()
         field = lucene.Field('field', '', lucene.Field.Store.NO,  
lucene.Field.Index.NOT_ANALYZED)
         field.setValue('abc')
         doc.add(field)
         writer.addDocument(doc)
         writer.close()

      However, unexpectedly I hit a Java NullPointerException in the
      writer.addDocument.  I hit a different exception if I use
      lucene.Field.Index.ANALYZED instead.  The corresponding code in
      Lucene should work fine I think.

Mike

Re: a few issues with 2.4.1 RC3

Posted by Michael McCandless <lu...@mikemccandless.com>.

Andi Vajda wrote:

>
> On Fri, 13 Mar 2009, Michael McCandless wrote:
>
>> OK it's great that I can .printStackTrace() to see it...
>>
>> But shouldn't we override JavaError.__str__ so by default an  
>> unhandled exception originating from Java would reveal its Java  
>> trace as well?  (And presumably vice/versa).
>
> Does that actually work or does it require some deeper messing with  
> Python so that the exception reporting explores the actual Java  
> exception and continues reporting the stacktrace ?

Good question -- I'm really not sure what's the best way to implement  
it.

>> I think on exception we should try to provide as much info as  
>> possible to aid in debugging.  Sometimes, it's a user who sees this  
>> exception, copies it into email and sends it off to you for remote  
>> debugging.
>
> Oh, in theory, I completely agree with you. Last time I looked at  
> implementing this it wasn't trivial. Maybe, I missed the obvious ?
>
> Even if overriding __str__ thus worked here, Java is not making it  
> trivial to get a stacktrace as a String (you have to have  
> PrintWriter and StringWriter available to you). Similarly, the JNI  
> ExceptionDescribe() C++ function prints to stderr, it doesn't give  
> you a string.
>
> So, what needs to be done instead is bend Python to do the right  
> thing when reporting the stacktrace of the JavaError python  
> exception and that looked non-trivial last time I looked into it.

Yeah it does not sound easy.  I'll at least open an issue for it...  
this certainly shouldn't block 2.4.1.

Mike

Re: a few issues with 2.4.1 RC3

Posted by Andi Vajda <va...@apache.org>.

On Fri, 13 Mar 2009, Michael McCandless wrote:

>
> Andi Vajda wrote:
>
>> 
>> On Fri, 13 Mar 2009, Michael McCandless wrote:
>> 
>>> 
>>> OK, I created PYLUCENE-1, yay!:
>>> 
>>> https://issues.apache.org/jira/browse/PYLUCENE-1
>>> 
>>> Andi can you go add some components to the Jira instance?
>> 
>> What do you mean ? Please, give me an example.
>
> If you go here:
>
>   https://issues.apache.org/jira/browse/PYLUCENE
>
> Do you see an "Administer Project" link under the "Create a new issue in 
> project.."?
>
> (I don't, because I don't have enough karma).

Same here, it seems.

> If you don't then I think Grant needs to give you karma.  Then you can create 
> components, versions, etc.
>
> (And after 2.4.1 is out, don't forget to add it as a version).

Grant, can you give some karma, please :) ?

Thanks !

Andi..

Re: a few issues with 2.4.1 RC3

Posted by Michael McCandless <lu...@mikemccandless.com>.

Andi Vajda wrote:

>
> On Fri, 13 Mar 2009, Michael McCandless wrote:
>
>>
>> OK, I created PYLUCENE-1, yay!:
>>
>>  https://issues.apache.org/jira/browse/PYLUCENE-1
>>
>> Andi can you go add some components to the Jira instance?
>
> What do you mean ? Please, give me an example.

If you go here:

     https://issues.apache.org/jira/browse/PYLUCENE

Do you see an "Administer Project" link under the "Create a new issue  
in project.."?

(I don't, because I don't have enough karma).

If you don't then I think Grant needs to give you karma.  Then you can  
create components, versions, etc.

(And after 2.4.1 is out, don't forget to add it as a version).

Mike

Re: a few issues with 2.4.1 RC3

Posted by Andi Vajda <va...@apache.org>.

On Fri, 13 Mar 2009, Michael McCandless wrote:

>
> OK, I created PYLUCENE-1, yay!:
>
>   https://issues.apache.org/jira/browse/PYLUCENE-1
>
> Andi can you go add some components to the Jira instance?

What do you mean ? Please, give me an example.

Andi..

>
> Mike
>
> Andi Vajda wrote:
>
>> 
>> On Fri, 13 Mar 2009, Michael McCandless wrote:
>> 
>>> OK it's great that I can .printStackTrace() to see it...
>>> 
>>> But shouldn't we override JavaError.__str__ so by default an unhandled 
>>> exception originating from Java would reveal its Java trace as well?  (And 
>>> presumably vice/versa).
>> 
>> Does that actually work or does it require some deeper messing with Python 
>> so that the exception reporting explores the actual Java exception and 
>> continues reporting the stacktrace ?
>> 
>>> I think on exception we should try to provide as much info as possible to 
>>> aid in debugging.  Sometimes, it's a user who sees this exception, copies 
>>> it into email and sends it off to you for remote debugging.
>> 
>> Oh, in theory, I completely agree with you. Last time I looked at 
>> implementing this it wasn't trivial. Maybe, I missed the obvious ?
>> 
>> Even if overriding __str__ thus worked here, Java is not making it trivial 
>> to get a stacktrace as a String (you have to have PrintWriter and 
>> StringWriter available to you). Similarly, the JNI ExceptionDescribe() C++ 
>> function prints to stderr, it doesn't give you a string.
>> 
>> So, what needs to be done instead is bend Python to do the right thing when 
>> reporting the stacktrace of the JavaError python exception and that looked 
>> non-trivial last time I looked into it.
>> 
>> Andi..
>> 
>>> 
>>> Mike
>>> 
>>> Andi Vajda wrote:
>>> 
>>>> On Fri, 13 Mar 2009, Michael McCandless wrote:
>>>>> * When I hit an exception in Java, the carryover to Python fails to
>>>>> include the full stack trace (sources & line numbers) from Java,
>>>>> which makes debugging harder.  Is that normal?
>>>> That's right and documented here [1].
>>>> Andi..
>>>> [1] 
>>>> http://lucene.apache.org/pylucene/jcc/documentation/readme.html#exceptions

Re: a few issues with 2.4.1 RC3

Posted by Michael McCandless <lu...@mikemccandless.com>.

OK, I created PYLUCENE-1, yay!:

     https://issues.apache.org/jira/browse/PYLUCENE-1

Andi can you go add some components to the Jira instance?

Mike

Andi Vajda wrote:

>
> On Fri, 13 Mar 2009, Michael McCandless wrote:
>
>> OK it's great that I can .printStackTrace() to see it...
>>
>> But shouldn't we override JavaError.__str__ so by default an  
>> unhandled exception originating from Java would reveal its Java  
>> trace as well?  (And presumably vice/versa).
>
> Does that actually work or does it require some deeper messing with  
> Python so that the exception reporting explores the actual Java  
> exception and continues reporting the stacktrace ?
>
>> I think on exception we should try to provide as much info as  
>> possible to aid in debugging.  Sometimes, it's a user who sees this  
>> exception, copies it into email and sends it off to you for remote  
>> debugging.
>
> Oh, in theory, I completely agree with you. Last time I looked at  
> implementing this it wasn't trivial. Maybe, I missed the obvious ?
>
> Even if overriding __str__ thus worked here, Java is not making it  
> trivial to get a stacktrace as a String (you have to have  
> PrintWriter and StringWriter available to you). Similarly, the JNI  
> ExceptionDescribe() C++ function prints to stderr, it doesn't give  
> you a string.
>
> So, what needs to be done instead is bend Python to do the right  
> thing when reporting the stacktrace of the JavaError python  
> exception and that looked non-trivial last time I looked into it.
>
> Andi..
>
>>
>> Mike
>>
>> Andi Vajda wrote:
>>
>>> On Fri, 13 Mar 2009, Michael McCandless wrote:
>>>> * When I hit an exception in Java, the carryover to Python fails to
>>>> include the full stack trace (sources & line numbers) from Java,
>>>> which makes debugging harder.  Is that normal?
>>> That's right and documented here [1].
>>> Andi..
>>> [1] http://lucene.apache.org/pylucene/jcc/documentation/readme.html#exceptions

Re: a few issues with 2.4.1 RC3

Posted by Andi Vajda <va...@apache.org>.

On Fri, 13 Mar 2009, Michael McCandless wrote:

> OK it's great that I can .printStackTrace() to see it...
>
> But shouldn't we override JavaError.__str__ so by default an unhandled 
> exception originating from Java would reveal its Java trace as well?  (And 
> presumably vice/versa).

Does that actually work or does it require some deeper messing with Python 
so that the exception reporting explores the actual Java exception and 
continues reporting the stacktrace ?

> I think on exception we should try to provide as much info as possible to aid 
> in debugging.  Sometimes, it's a user who sees this exception, copies it into 
> email and sends it off to you for remote debugging.

Oh, in theory, I completely agree with you. Last time I looked at 
implementing this it wasn't trivial. Maybe, I missed the obvious ?

Even if overriding __str__ thus worked here, Java is not making it trivial 
to get a stacktrace as a String (you have to have PrintWriter and 
StringWriter available to you). Similarly, the JNI ExceptionDescribe() C++ 
function prints to stderr, it doesn't give you a string.

So, what needs to be done instead is bend Python to do the right thing when 
reporting the stacktrace of the JavaError python exception and that looked 
non-trivial last time I looked into it.

Andi..

>
> Mike
>
> Andi Vajda wrote:
>
>> 
>> On Fri, 13 Mar 2009, Michael McCandless wrote:
>> 
>>> * When I hit an exception in Java, the carryover to Python fails to
>>> include the full stack trace (sources & line numbers) from Java,
>>> which makes debugging harder.  Is that normal?
>> 
>> That's right and documented here [1].
>> 
>> Andi..
>> 
>> [1] 
>> http://lucene.apache.org/pylucene/jcc/documentation/readme.html#exceptions

Re: a few issues with 2.4.1 RC3

Posted by Michael McCandless <lu...@mikemccandless.com>.

OK it's great that I can .printStackTrace() to see it...

But shouldn't we override JavaError.__str__ so by default an unhandled  
exception originating from Java would reveal its Java trace as well?   
(And presumably vice/versa).

I think on exception we should try to provide as much info as possible  
to aid in debugging.  Sometimes, it's a user who sees this exception,  
copies it into email and sends it off to you for remote debugging.

Mike

Andi Vajda wrote:

>
> On Fri, 13 Mar 2009, Michael McCandless wrote:
>
>> * When I hit an exception in Java, the carryover to Python fails to
>>  include the full stack trace (sources & line numbers) from Java,
>>  which makes debugging harder.  Is that normal?
>
> That's right and documented here [1].
>
> Andi..
>
> [1] http://lucene.apache.org/pylucene/jcc/documentation/readme.html#exceptions

Re: a few issues with 2.4.1 RC3

Posted by Andi Vajda <va...@apache.org>.

On Fri, 13 Mar 2009, Michael McCandless wrote:

> * When I hit an exception in Java, the carryover to Python fails to
>   include the full stack trace (sources & line numbers) from Java,
>   which makes debugging harder.  Is that normal?

That's right and documented here [1].

Andi..

[1] http://lucene.apache.org/pylucene/jcc/documentation/readme.html#exceptions

Re: a few issues with 2.4.1 RC3

Posted by Andi Vajda <va...@apache.org>.

On Fri, 13 Mar 2009, Andi Vajda wrote:

> If, however, you replace the field.setValue('abc') call in the Python code 
> with passing 'abc' to the constructor, it works too.
> It looks like you found a bug with the setValue() wrapper.
>
> I'm looking into it...

It also works if one calls field.setValue(u'abc') instead of 
field.setValue('abc'). Clearly, a bug.

Andi..

Re: a few issues with 2.4.1 RC3

Posted by Andi Vajda <va...@apache.org>.

On Fri, 13 Mar 2009, Michael McCandless wrote:

> I'm playing with 2.4.1 RC3 (on OS X 10.5.6) and found a few issues:
>
> * I'm attempting to re-use a field, by changing its value, and then
>   adding the document to an index:
>
>       lucene.initVM(lucene.CLASSPATH)
>
>       writer = lucene.IndexWriter(lucene.RAMDirectory(),
>                                   lucene.StandardAnalyzer())
>       doc = lucene.Document()
>       field = lucene.Field('field', '', lucene.Field.Store.NO, 
> lucene.Field.Index.NOT_ANALYZED)
>       field.setValue('abc')
>       doc.add(field)
>       writer.addDocument(doc)
>       writer.close()
>
>    However, unexpectedly I hit a Java NullPointerException in the
>    writer.addDocument.  I hit a different exception if I use
>    lucene.Field.Index.ANALYZED instead.  The corresponding code in
>    Lucene should work fine I think.

Indeed, this fails (see example of Java stacktrace reporting):

   from lucene import *
   initVM(CLASSPATH)

   writer = IndexWriter(RAMDirectory(),StandardAnalyzer())
   doc = Document()
   field = Field('field', '', Field.Store.NO, Field.Index.NOT_ANALYZED)
   field.setValue('abc')
   doc.add(field)

   try:
       writer.addDocument(doc)
   except JavaError, e:
       e.getJavaException().printStackTrace()
       raise

   writer.close()

But the equivalent Java version works:

   import java.io.IOException;

   import org.apache.lucene.index.IndexWriter;
   import org.apache.lucene.store.RAMDirectory;
   import org.apache.lucene.analysis.standard.StandardAnalyzer;
   import org.apache.lucene.document.Document;
   import org.apache.lucene.document.Field;


   public class t02 {
       public static void main(String[] args)
           throws IOException
       {
           IndexWriter writer = new IndexWriter(new RAMDirectory(),
                                                new StandardAnalyzer());
           Document doc = new Document();
           Field field = new Field("field", "",
                                   Field.Store.NO,
                                   Field.Index.NOT_ANALYZED);
           field.setValue("abc");
           doc.add(field);
           writer.addDocument(doc);
           writer.close();

           System.err.println("done");
       }
   }


If, however, you replace the field.setValue('abc') call in the Python code with passing 'abc' to the constructor, it works too.
It looks like you found a bug with the setValue() wrapper.

I'm looking into it...

Andi..

Re: a few issues with 2.4.1 RC3

Posted by Michael McCandless <lu...@mikemccandless.com>.

OK thanks Andi.  I'll switch to RC4.

I opened https://issues.apache.org/jira/browse/LUCENE-1564 for the  
underlying Lucene bug.

Mike

Andi Vajda wrote:

>
> On Fri, 13 Mar 2009, Andi Vajda wrote:
>
>> On Fri, 13 Mar 2009, Michael McCandless wrote:
>>
>>> * I'm attempting to re-use a field, by changing its value, and then
>>>  adding the document to an index:
>>>
>>>      lucene.initVM(lucene.CLASSPATH)
>>>
>>>      writer = lucene.IndexWriter(lucene.RAMDirectory(),
>>>                                  lucene.StandardAnalyzer())
>>>      doc = lucene.Document()
>>>      field = lucene.Field('field', '', lucene.Field.Store.NO,  
>>> lucene.Field.Index.NOT_ANALYZED)
>>>      field.setValue('abc')
>>>      doc.add(field)
>>>      writer.addDocument(doc)
>>>      writer.close()
>>>
>>>   However, unexpectedly I hit a Java NullPointerException in the
>>>   writer.addDocument.  I hit a different exception if I use
>>>   lucene.Field.Index.ANALYZED instead.  The corresponding code in
>>>   Lucene should work fine I think.
>>
>> Two bugs at work here:
>>
>> 1. Because of the order in which the wrappers for setValue() were
>>    generated and because JCC let's you pass a python string for a  
>> Java
>>    byte array (I should probably remove that since JArray('byte') 
>> ([...])
>>    is the way to do this now), your setValue('abc') call gets  
>> passed to
>>    the Java Field.setValue(byte[]) overload.
>
> I fixed this by indeed removing support for passing byte[] or char[]  
> via strings. Before the addition of JArray(), this was necessary but  
> could cause the problem you found where a byte[] or char[] overload  
> would be picked up before a String overload, as was the case here.
>
> Your test case now works as originally written. The Java Lucene bug  
> we found is not hit anymore since the correct setValue() overload is  
> now invoked.
>
> I uploaded a new release candidate, rc4, with this fix to the  
> staging are:
>    http://people.apache.org/~vajda/staging_area/
>
> Andi..
>
>>
>> 2. Because of a probable Java Lucene bug, that fails to take and a
>>    NullPointerException is thrown during indexing.
>>
>> The following Java code illustrates the error:
>>
>>
>> import java.io.IOException;
>>
>> import org.apache.lucene.index.IndexWriter;
>> import org.apache.lucene.store.RAMDirectory;
>> import org.apache.lucene.analysis.standard.StandardAnalyzer;
>> import org.apache.lucene.document.Document;
>> import org.apache.lucene.document.Field;
>>
>>
>> public class t02 {
>>   public static void main(String[] args)
>>       throws IOException
>>   {
>>       IndexWriter writer = new IndexWriter(new RAMDirectory(),
>>                                            new StandardAnalyzer());
>>       Document doc = new Document();
>>       Field field = new Field("field", "",
>>                               Field.Store.NO,
>>                               Field.Index.NOT_ANALYZED);
>>       field.setValue(new byte[] { 'a', 'b', 'c' });
>>       doc.add(field);
>>       writer.addDocument(doc);
>>       writer.close();
>>
>>       System.err.println("done");
>>   }
>> }
>>
>>
>> Mike, what do you think ?
>>
>> Andi..
>>

Re: a few issues with 2.4.1 RC3

Posted by Andi Vajda <va...@apache.org>.

On Fri, 13 Mar 2009, Michael McCandless wrote:

> Andi Vajda wrote:
>
>> I uploaded a new release candidate, rc4, with this fix to the staging are:
>>   http://people.apache.org/~vajda/staging_area/
>
> +1 to release
>
> My previous example now works fine. I was able to build & search index
> of first 100K docs from wikipedia.  Good job!

Thank you for your vote.

Two more PMC votes are needed !

Andi..

Re: a few issues with 2.4.1 RC3

Posted by Andi Vajda <va...@apache.org>.

On Fri, 13 Mar 2009, Michael McCandless wrote:

> Andi Vajda wrote:
>
>> I uploaded a new release candidate, rc4, with this fix to the staging are:
>>   http://people.apache.org/~vajda/staging_area/
>
> +1 to release
>
> My previous example now works fine. I was able to build & search index
> of first 100K docs from wikipedia.  Good job!

Thank you for your vote.

Two more PMC votes are needed !

Andi..

Re: a few issues with 2.4.1 RC3

Posted by Michael McCandless <lu...@mikemccandless.com>.

Andi Vajda wrote:

> I uploaded a new release candidate, rc4, with this fix to the  
> staging are:
>    http://people.apache.org/~vajda/staging_area/

+1 to release

My previous example now works fine. I was able to build & search index
of first 100K docs from wikipedia.  Good job!

Mike

Re: a few issues with 2.4.1 RC3

Posted by Andi Vajda <va...@apache.org>.

On Fri, 13 Mar 2009, Andi Vajda wrote:

> On Fri, 13 Mar 2009, Michael McCandless wrote:
>
>> * I'm attempting to re-use a field, by changing its value, and then
>>   adding the document to an index:
>>
>>       lucene.initVM(lucene.CLASSPATH)
>>
>>       writer = lucene.IndexWriter(lucene.RAMDirectory(),
>>                                   lucene.StandardAnalyzer())
>>       doc = lucene.Document()
>>       field = lucene.Field('field', '', lucene.Field.Store.NO, 
>> lucene.Field.Index.NOT_ANALYZED)
>>       field.setValue('abc')
>>       doc.add(field)
>>       writer.addDocument(doc)
>>       writer.close()
>>
>>    However, unexpectedly I hit a Java NullPointerException in the
>>    writer.addDocument.  I hit a different exception if I use
>>    lucene.Field.Index.ANALYZED instead.  The corresponding code in
>>    Lucene should work fine I think.
>
> Two bugs at work here:
>
>  1. Because of the order in which the wrappers for setValue() were
>     generated and because JCC let's you pass a python string for a Java
>     byte array (I should probably remove that since JArray('byte')([...])
>     is the way to do this now), your setValue('abc') call gets passed to
>     the Java Field.setValue(byte[]) overload.

I fixed this by indeed removing support for passing byte[] or char[] via 
strings. Before the addition of JArray(), this was necessary but could cause 
the problem you found where a byte[] or char[] overload would be picked up 
before a String overload, as was the case here.

Your test case now works as originally written. The Java Lucene bug we found 
is not hit anymore since the correct setValue() overload is now invoked.

I uploaded a new release candidate, rc4, with this fix to the staging are:
     http://people.apache.org/~vajda/staging_area/

Andi..

>
>  2. Because of a probable Java Lucene bug, that fails to take and a
>     NullPointerException is thrown during indexing.
>
> The following Java code illustrates the error:
>
>
> import java.io.IOException;
>
> import org.apache.lucene.index.IndexWriter;
> import org.apache.lucene.store.RAMDirectory;
> import org.apache.lucene.analysis.standard.StandardAnalyzer;
> import org.apache.lucene.document.Document;
> import org.apache.lucene.document.Field;
>
>
> public class t02 {
>    public static void main(String[] args)
>        throws IOException
>    {
>        IndexWriter writer = new IndexWriter(new RAMDirectory(),
>                                             new StandardAnalyzer());
>        Document doc = new Document();
>        Field field = new Field("field", "",
>                                Field.Store.NO,
>                                Field.Index.NOT_ANALYZED);
>        field.setValue(new byte[] { 'a', 'b', 'c' });
>        doc.add(field);
>        writer.addDocument(doc);
>        writer.close();
>
>        System.err.println("done");
>    }
> }
>
>
> Mike, what do you think ?
>
> Andi..
>

Re: a few issues with 2.4.1 RC3

Posted by Michael McCandless <lu...@mikemccandless.com>.

Andi Vajda wrote:

> Mike, what do you think ?

Indeed this looks like a Lucene bug!  Sneaky.  I'll take it.

Mike

Re: a few issues with 2.4.1 RC3

Posted by Andi Vajda <va...@apache.org>.

On Fri, 13 Mar 2009, Michael McCandless wrote:

> * I'm attempting to re-use a field, by changing its value, and then
>   adding the document to an index:
>
>       lucene.initVM(lucene.CLASSPATH)
>
>       writer = lucene.IndexWriter(lucene.RAMDirectory(),
>                                   lucene.StandardAnalyzer())
>       doc = lucene.Document()
>       field = lucene.Field('field', '', lucene.Field.Store.NO, 
> lucene.Field.Index.NOT_ANALYZED)
>       field.setValue('abc')
>       doc.add(field)
>       writer.addDocument(doc)
>       writer.close()
>
>    However, unexpectedly I hit a Java NullPointerException in the
>    writer.addDocument.  I hit a different exception if I use
>    lucene.Field.Index.ANALYZED instead.  The corresponding code in
>    Lucene should work fine I think.

Two bugs at work here:

   1. Because of the order in which the wrappers for setValue() were
      generated and because JCC let's you pass a python string for a Java
      byte array (I should probably remove that since JArray('byte')([...])
      is the way to do this now), your setValue('abc') call gets passed to
      the Java Field.setValue(byte[]) overload.

   2. Because of a probable Java Lucene bug, that fails to take and a
      NullPointerException is thrown during indexing.

The following Java code illustrates the error:


import java.io.IOException;

import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;


public class t02 {
     public static void main(String[] args)
         throws IOException
     {
         IndexWriter writer = new IndexWriter(new RAMDirectory(),
                                              new StandardAnalyzer());
         Document doc = new Document();
         Field field = new Field("field", "",
                                 Field.Store.NO,
                                 Field.Index.NOT_ANALYZED);
         field.setValue(new byte[] { 'a', 'b', 'c' });
         doc.add(field);
         writer.addDocument(doc);
         writer.close();

         System.err.println("done");
     }
}


Mike, what do you think ?

Andi..

Re: a few issues with 2.4.1 RC3

Posted by Andi Vajda <va...@apache.org>.

On Fri, 13 Mar 2009, Bill Janssen wrote:

> Michael McCandless <lu...@mikemccandless.com> wrote:
>
>> I'm playing with 2.4.1 RC3 (on OS X 10.5.6) and found a few issues:
>>
>>   * If I fail to call lucene.initVM, I get a rather unfriendly Bus
>>     Error.  Is it possible (desirable?) to detect this and throw a
>>     friendly exception instead?
>
> I was thinking about this some more, and have an idea.  How about only
> having the dictionary of the module init'ed by calling initVM?  That is,
> it would be practically speaking impossible to call any other Java
> method until the VM has been initialized and the thread attached.  The
> overhead would go into "import", where it might be acceptable.  So
> instead of getting the bus error, you'd get an AttributeError.
>
> Or, keep the dictionary, but map the values associated with each
> variable in the module to an error function which raises
> JavaNotInitialized, and do the real mapping after the initialization
> step.

The following needs to work:

  >>> from lucene import Document, initVM, CLASSPATH
  >>> initVM(CLASSPATH)
  >>> Document()

So, messing with the module dictionary is not really an option.
But messing with all the C types mapping to Java classes could be done.

Again, as with attachCurrentThread(), we don't want to be in a situation 
where the init check is run for every call all the time. So that C type 
messing needs to be implemented in a way that catches the error before 
initVM() is called and just runs without any checks after it got called.

Maybe initializing the tp_getattro slot of the C type struct with a checking 
version before initVM() is called and replacing it with the real version 
during initVM() is the way to go. Would you like to contribute a patch ?

Hint: in macros.h, add a parameter to DECLARE_TYPE to take this checking 
tp_getattro version, have JCC generate code with that new parameter so that 
all C type mapping to Java classes are initialized with that checking 
tp_getattro, and then, during initVM(), as all of these types are visited, 
replace the tp_getattro with the python default, PyObject_GenericGetAttr().
You might also have to play the same trick with the tp_new C type struct 
slot to catch the constructor calls.

Andi..

Re: a few issues with 2.4.1 RC3

Posted by Andi Vajda <va...@apache.org>.

On Fri, 13 Mar 2009, Bill Janssen wrote:

> Christian Heimes <li...@cheimes.de> wrote:
>
>> Bill Janssen wrote:
>>> I was thinking about this some more, and have an idea.  How about only
>>> having the dictionary of the module init'ed by calling initVM?  That is,
>>> it would be practically speaking impossible to call any other Java
>>> method until the VM has been initialized and the thread attached.  The
>>> overhead would go into "import", where it might be acceptable.  So
>>> instead of getting the bus error, you'd get an AttributeError.
>>
>> It's possible to have multiple packages that are warpped with JCC. You
>> have to invoke initVM exactly once with the combined class path of every
>> package. You have to add a way to tell a package that it has been
>> initialized by initVM() of another package.
>
> Yes...  Of course, it's possible to call initVM multiple times, once
> with each CLASSPATH, for example, and perhaps that should be the
> standard way to do it.  But even without doing that, the check made by
> "import" could look to see if (1) the VM is initialized, and (2) the
> classpath for the module being imported is on the JVM classpath.
>
> "Attaching" a thread to the JVM could happen automatically this way,
> too.  Calling initVM would attach the current thread; import with the VM
> already init'ed would attach the importing thread.  And there would be
> no methods to call in an unattached thread to cause a bus error.

Sadly, initVM() _must_ be called from the main thread when the Java VM is 
actually getting initialized. That main thread is automatically attached.

Calling initVM() from another thread, when the Java VM was not yet 
initialized is a non-starter.

Andi..

Re: a few issues with 2.4.1 RC3

Posted by Bill Janssen <ja...@parc.com>.

Christian Heimes <li...@cheimes.de> wrote:

> Bill Janssen wrote:
> > I was thinking about this some more, and have an idea.  How about only
> > having the dictionary of the module init'ed by calling initVM?  That is,
> > it would be practically speaking impossible to call any other Java
> > method until the VM has been initialized and the thread attached.  The
> > overhead would go into "import", where it might be acceptable.  So
> > instead of getting the bus error, you'd get an AttributeError.
> 
> It's possible to have multiple packages that are warpped with JCC. You
> have to invoke initVM exactly once with the combined class path of every
> package. You have to add a way to tell a package that it has been
> initialized by initVM() of another package.

Yes...  Of course, it's possible to call initVM multiple times, once
with each CLASSPATH, for example, and perhaps that should be the
standard way to do it.  But even without doing that, the check made by
"import" could look to see if (1) the VM is initialized, and (2) the
classpath for the module being imported is on the JVM classpath.

"Attaching" a thread to the JVM could happen automatically this way,
too.  Calling initVM would attach the current thread; import with the VM
already init'ed would attach the importing thread.  And there would be
no methods to call in an unattached thread to cause a bus error.

Bill

Re: a few issues with 2.4.1 RC3

Posted by Andi Vajda <va...@apache.org>.

On Fri, 13 Mar 2009, Christian Heimes wrote:

> Bill Janssen wrote:
>> I was thinking about this some more, and have an idea.  How about only
>> having the dictionary of the module init'ed by calling initVM?  That is,
>> it would be practically speaking impossible to call any other Java
>> method until the VM has been initialized and the thread attached.  The
>> overhead would go into "import", where it might be acceptable.  So
>> instead of getting the bus error, you'd get an AttributeError.
>
> It's possible to have multiple packages that are warpped with JCC. You
> have to invoke initVM exactly once with the combined class path of every
> package. You have to add a way to tell a package that it has been
> initialized by initVM() of another package.

This is where shared mode comes in. The initVM() call is then implemented by 
the libjcc.so shared library and all global state it controls, such as the 
VM pointer, is shared among all JCC-built extensions in the process.

Andi..

Re: a few issues with 2.4.1 RC3

Posted by Christian Heimes <li...@cheimes.de>.

Bill Janssen wrote:
> I was thinking about this some more, and have an idea.  How about only
> having the dictionary of the module init'ed by calling initVM?  That is,
> it would be practically speaking impossible to call any other Java
> method until the VM has been initialized and the thread attached.  The
> overhead would go into "import", where it might be acceptable.  So
> instead of getting the bus error, you'd get an AttributeError.

It's possible to have multiple packages that are warpped with JCC. You
have to invoke initVM exactly once with the combined class path of every
package. You have to add a way to tell a package that it has been
initialized by initVM() of another package.

Christian

Re: a few issues with 2.4.1 RC3

Posted by Bill Janssen <ja...@parc.com>.

Michael McCandless <lu...@mikemccandless.com> wrote:

> I'm playing with 2.4.1 RC3 (on OS X 10.5.6) and found a few issues:
> 
>   * If I fail to call lucene.initVM, I get a rather unfriendly Bus
>     Error.  Is it possible (desirable?) to detect this and throw a
>     friendly exception instead?

I was thinking about this some more, and have an idea.  How about only
having the dictionary of the module init'ed by calling initVM?  That is,
it would be practically speaking impossible to call any other Java
method until the VM has been initialized and the thread attached.  The
overhead would go into "import", where it might be acceptable.  So
instead of getting the bus error, you'd get an AttributeError.

Or, keep the dictionary, but map the values associated with each
variable in the module to an error function which raises
JavaNotInitialized, and do the real mapping after the initialization
step.

Bill