You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Aditi Goyal <ad...@gmail.com> on 2008/08/19 11:09:18 UTC

java.lang.NullPointerExcpetion while indexing on linux

Hi All,

I am using IndexWriter for adding the documents. I am re-using the document
as well as the fields for improving index speed as per the link
http://wiki.apache.org/lucene-java/ImproveIndexingSpeed.

So, for each doc, i am first removing field using doc.removeField() and then
field.setValue() for changing the value of the field and finally
doc.add(field) for adding the field to the document.

It works fine on windows, however it throws  (<class 'lucene.JavaError'>,
JavaError(<Throwable: java.lang.NullPointerException>,) when I run
indexwriter.addDocument(doc) on Linux.

Can anyone please guide why is it happening this way.
I am using lucene 2.3.1 version and JCC version is 1.8 and Python is 2.5

Thanks,
Aditi

Re: java.lang.NullPointerExcpetion while indexing on linux

Posted by Aditi Goyal <ad...@gmail.com>.
On Wed, Aug 20, 2008 at 6:12 PM, Michael McCandless <mail@mikemccandless.com
> wrote:

>
> Aditi Goyal wrote:
>
>  Thanks Mike. I found the problem.
>> The problem was that I was not converting the value of the fields to utf-8
>> and hence while adding it to doc it was getting stored as None.
>> So, when I did doc.get('fieldA') , instead of giving the blank or any
>> other
>> string, it was giving out None.
>>
>
> I don't really understand why failing to pre-convert to utf-8 would result
> in None being set -- is this a PyLucene (JCC) strangeness?
>
> It seems like if the incoming PyObject is a simple str, the C++ glue code
> generated by JCC should cast it to unicode before passing it to Java (and
> you shouldn't get null added on).
>
>  To overcome this, I first converted the string to utf-8 format and then
>> field.setValue() and then doc.add(field), It seems to be working fine.,
>>
>>
>> However, I have one question. When I do a feild.setValue() and then
>> doc.add() will it replace the value of the field in the doc or add a new
>> field with the similar name and the new value? Since i am reusing the doc
>> and i am not reinitialising the doc anywhere and since you told that
>> doc.removeField() is an expensive operation.
>>
>
> It replaces the value of that Field instance (not add a new field).

I checked this. In my index I was getting the fields with multiple duplicate
values. So Just doing field.setValue() would suffice. Doing doc.add() will
create duplicates of the same field in the document and it keeps on growing.


> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: java.lang.NullPointerExcpetion while indexing on linux

Posted by Michael McCandless <ma...@mikemccandless.com>.
Aditi Goyal wrote:

> Thanks Mike. I found the problem.
> The problem was that I was not converting the value of the fields to  
> utf-8
> and hence while adding it to doc it was getting stored as None.
> So, when I did doc.get('fieldA') , instead of giving the blank or  
> any other
> string, it was giving out None.

I don't really understand why failing to pre-convert to utf-8 would  
result in None being set -- is this a PyLucene (JCC) strangeness?

It seems like if the incoming PyObject is a simple str, the C++ glue  
code generated by JCC should cast it to unicode before passing it to  
Java (and you shouldn't get null added on).

> To overcome this, I first converted the string to utf-8 format and  
> then
> field.setValue() and then doc.add(field), It seems to be working  
> fine.,
>
>
> However, I have one question. When I do a feild.setValue() and then
> doc.add() will it replace the value of the field in the doc or add a  
> new
> field with the similar name and the new value? Since i am reusing  
> the doc
> and i am not reinitialising the doc anywhere and since you told that
> doc.removeField() is an expensive operation.

It replaces the value of that Field instance (not add a new field).

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: java.lang.NullPointerExcpetion while indexing on linux

Posted by Aditi Goyal <ad...@gmail.com>.
Thanks Mike. I found the problem.
The problem was that I was not converting the value of the fields to utf-8
and hence while adding it to doc it was getting stored as None.
So, when I did doc.get('fieldA') , instead of giving the blank or any other
string, it was giving out None.

To overcome this, I first converted the string to utf-8 format and then
field.setValue() and then doc.add(field), It seems to be working fine.,


However, I have one question. When I do a feild.setValue() and then
doc.add() will it replace the value of the field in the doc or add a new
field with the similar name and the new value? Since i am reusing the doc
and i am not reinitialising the doc anywhere and since you told that
doc.removeField() is an expensive operation.

Thanks,
Aditi


On Tue, Aug 19, 2008 at 5:39 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

>
> On quick look that code looks fine, though removeField is an expensive
> operation and unnecessary for this.
>
> We really need the full traceback of the exception.
>
> Mike
>
>
> Aditi Goyal wrote:
>
>  Thanks Michael and Ian for your valuable response.
>> I am attaching a small default code. Please have a look and tell me where
>> am
>> I going wrong.
>>
>> import lucene
>> from lucene import Document, Field, initVM, CLASSPATH
>>
>> doc = Document()
>> fieldA = Field('fieldA', "", Field.Store.YES, Field.Index.UN_TOKENIZED)
>> fieldB = Field('fieldB', "", Field.Store.YES, Field.Index.TOKENIZED)
>> fieldC = Field ('fieldC', "", Field.Store.YES, Field.Index.TOKENIZED)
>>
>> doc.add(fieldA)
>> doc.add(fieldB)
>> doc.add(fieldC)
>>
>> def get_fields():
>>   if doc.getField('FieldA') is not None:
>>       doc.removeField('FieldA')
>>   if doc.getField('FieldB') is not None:
>>       doc.removeField('FieldB')
>>   if doc.getField('FieldC') is not None:
>>       doc.removeField('FieldC')
>>
>>   fieldA.setValue("abc")
>>   doc.add(fieldA)
>>   fieldB.setValue("xyz")
>>   doc.add(fieldB)
>>   fieldC.setValue("123")
>>   doc.add(fieldC)
>>
>>   return doc
>>
>>
>> def add_document():
>>   doc = get_fields()
>>   writer = lucene.IndexWriter(index_directory, analyzer, create_path)
>>   writer.addDocument(doc)
>>   writer.close()
>>
>> This writer.addDocument is throwing an exception saying
>> java.lang.NullPointerException
>>
>> Thanks,
>> Aditi
>>
>> On Tue, Aug 19, 2008 at 3:25 PM, Michael McCandless <
>> lucene@mikemccandless.com> wrote:
>>
>>
>>> Ian Lea wrote:
>>>
>>> I don't think you need to remove the field and then add it again, but
>>>
>>>> I've no idea if that is relevant to your problem or not.
>>>>
>>>>
>>> That's right: just leave the Field there and change its value (assuming
>>> the
>>> doc you are changing to still uses that field).
>>>
>>> A full stack trace would be more help, and maybe an upgrade to 2.3.2,
>>>
>>>> and maybe a snippet of your code, and what is JCC?
>>>>
>>>>
>>> JCC generates the necessary C/C++ glue code for Python to directly invoke
>>> Java code.  The Chandler project created this for PyLucene because they
>>> were
>>> having trouble with GCJ:
>>>
>>>  http://blog.chandlerproject.org/author/vajda/
>>>
>>> Mike
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: java.lang.NullPointerExcpetion while indexing on linux

Posted by Michael McCandless <lu...@mikemccandless.com>.
On quick look that code looks fine, though removeField is an expensive  
operation and unnecessary for this.

We really need the full traceback of the exception.

Mike

Aditi Goyal wrote:

> Thanks Michael and Ian for your valuable response.
> I am attaching a small default code. Please have a look and tell me  
> where am
> I going wrong.
>
> import lucene
> from lucene import Document, Field, initVM, CLASSPATH
>
> doc = Document()
> fieldA = Field('fieldA', "", Field.Store.YES,  
> Field.Index.UN_TOKENIZED)
> fieldB = Field('fieldB', "", Field.Store.YES, Field.Index.TOKENIZED)
> fieldC = Field ('fieldC', "", Field.Store.YES, Field.Index.TOKENIZED)
>
> doc.add(fieldA)
> doc.add(fieldB)
> doc.add(fieldC)
>
> def get_fields():
>    if doc.getField('FieldA') is not None:
>        doc.removeField('FieldA')
>    if doc.getField('FieldB') is not None:
>        doc.removeField('FieldB')
>    if doc.getField('FieldC') is not None:
>        doc.removeField('FieldC')
>
>    fieldA.setValue("abc")
>    doc.add(fieldA)
>    fieldB.setValue("xyz")
>    doc.add(fieldB)
>    fieldC.setValue("123")
>    doc.add(fieldC)
>
>    return doc
>
>
> def add_document():
>    doc = get_fields()
>    writer = lucene.IndexWriter(index_directory, analyzer, create_path)
>    writer.addDocument(doc)
>    writer.close()
>
> This writer.addDocument is throwing an exception saying
> java.lang.NullPointerException
>
> Thanks,
> Aditi
>
> On Tue, Aug 19, 2008 at 3:25 PM, Michael McCandless <
> lucene@mikemccandless.com> wrote:
>
>>
>> Ian Lea wrote:
>>
>> I don't think you need to remove the field and then add it again, but
>>> I've no idea if that is relevant to your problem or not.
>>>
>>
>> That's right: just leave the Field there and change its value  
>> (assuming the
>> doc you are changing to still uses that field).
>>
>> A full stack trace would be more help, and maybe an upgrade to 2.3.2,
>>> and maybe a snippet of your code, and what is JCC?
>>>
>>
>> JCC generates the necessary C/C++ glue code for Python to directly  
>> invoke
>> Java code.  The Chandler project created this for PyLucene because  
>> they were
>> having trouble with GCJ:
>>
>>   http://blog.chandlerproject.org/author/vajda/
>>
>> Mike
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: java.lang.NullPointerExcpetion while indexing on linux

Posted by Aditi Goyal <ad...@gmail.com>.
Thanks Michael and Ian for your valuable response.
I am attaching a small default code. Please have a look and tell me where am
I going wrong.

import lucene
from lucene import Document, Field, initVM, CLASSPATH

doc = Document()
fieldA = Field('fieldA', "", Field.Store.YES, Field.Index.UN_TOKENIZED)
fieldB = Field('fieldB', "", Field.Store.YES, Field.Index.TOKENIZED)
fieldC = Field ('fieldC', "", Field.Store.YES, Field.Index.TOKENIZED)

doc.add(fieldA)
doc.add(fieldB)
doc.add(fieldC)

def get_fields():
    if doc.getField('FieldA') is not None:
        doc.removeField('FieldA')
    if doc.getField('FieldB') is not None:
        doc.removeField('FieldB')
    if doc.getField('FieldC') is not None:
        doc.removeField('FieldC')

    fieldA.setValue("abc")
    doc.add(fieldA)
    fieldB.setValue("xyz")
    doc.add(fieldB)
    fieldC.setValue("123")
    doc.add(fieldC)

    return doc


def add_document():
    doc = get_fields()
    writer = lucene.IndexWriter(index_directory, analyzer, create_path)
    writer.addDocument(doc)
    writer.close()

This writer.addDocument is throwing an exception saying
java.lang.NullPointerException

Thanks,
Aditi

On Tue, Aug 19, 2008 at 3:25 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

>
> Ian Lea wrote:
>
>  I don't think you need to remove the field and then add it again, but
>> I've no idea if that is relevant to your problem or not.
>>
>
> That's right: just leave the Field there and change its value (assuming the
> doc you are changing to still uses that field).
>
>  A full stack trace would be more help, and maybe an upgrade to 2.3.2,
>> and maybe a snippet of your code, and what is JCC?
>>
>
> JCC generates the necessary C/C++ glue code for Python to directly invoke
> Java code.  The Chandler project created this for PyLucene because they were
> having trouble with GCJ:
>
>    http://blog.chandlerproject.org/author/vajda/
>
> Mike
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: java.lang.NullPointerExcpetion while indexing on linux

Posted by Michael McCandless <lu...@mikemccandless.com>.
Ian Lea wrote:

> I don't think you need to remove the field and then add it again, but
> I've no idea if that is relevant to your problem or not.

That's right: just leave the Field there and change its value  
(assuming the doc you are changing to still uses that field).

> A full stack trace would be more help, and maybe an upgrade to 2.3.2,
> and maybe a snippet of your code, and what is JCC?

JCC generates the necessary C/C++ glue code for Python to directly  
invoke Java code.  The Chandler project created this for PyLucene  
because they were having trouble with GCJ:

     http://blog.chandlerproject.org/author/vajda/

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: java.lang.NullPointerExcpetion while indexing on linux

Posted by Ian Lea <ia...@gmail.com>.
Hi


I don't think you need to remove the field and then add it again, but
I've no idea if that is relevant to your problem or not.

A full stack trace would be more help, and maybe an upgrade to 2.3.2,
and maybe a snippet of your code, and what is JCC?


--
Ian.


On Tue, Aug 19, 2008 at 10:09 AM, Aditi Goyal <ad...@gmail.com> wrote:
> Hi All,
>
> I am using IndexWriter for adding the documents. I am re-using the document
> as well as the fields for improving index speed as per the link
> http://wiki.apache.org/lucene-java/ImproveIndexingSpeed.
>
> So, for each doc, i am first removing field using doc.removeField() and then
> field.setValue() for changing the value of the field and finally
> doc.add(field) for adding the field to the document.
>
> It works fine on windows, however it throws  (<class 'lucene.JavaError'>,
> JavaError(<Throwable: java.lang.NullPointerException>,) when I run
> indexwriter.addDocument(doc) on Linux.
>
> Can anyone please guide why is it happening this way.
> I am using lucene 2.3.1 version and JCC version is 1.8 and Python is 2.5
>
> Thanks,
> Aditi
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org