You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Raymond Balmès <ra...@gmail.com> on 2009/03/28 18:36:21 UTC

Empty SinkTokenizer

Hi guys,

I'm using a SinkTokenizer to collect some terms of the documents while doing
the main document indexing
I attached it to a specific field (tokenized, indexed).

*

writer* = *new* IndexWriter(index, *my _analyzer*, create,
*new*IndexWriter.MaxFieldLength(1000000));
doc.add(new Field("content", reader));

doc.add(*new* Field("*myField*",*my_analyzer.sinkStream**));*

writer.addDocument(doc);

I have a set of document which don't have those terms so the Sink is empty.

writer.addDocument works fine on the first document, but it fails always on
the second ????

Any idea what I should look for... I kind of get stuck, because
understanding what's done under addDocument is tough.

-Raymond-

Re: Empty SinkTokenizer

Posted by Raymond Balmès <ra...@gmail.com>.

Well I wanted an order because in my first analysis I'm collecting terms
which I put in a 2nd field.  I can live with whatever order (creation or
alpha) I just needed to know and also was wondering why it is that way,
looks to me as an extra complication.

-Raymond-
On Tue, Mar 31, 2009 at 3:24 PM, Grant Ingersoll <gs...@apache.org>wrote:

> I might add that I don't know that we explicitly ever declare they must be
> in order, but it has always been my understanding that they should be and I
> confirm this by several conversations in the past:
>
> http://www.lucidimagination.com/search/document/274ec8c1c56fdd54/order_of_field_objects_within_document#5ffce4509ed32511
>
>
> http://www.lucidimagination.com/search/document/d6b19ab1bd87e30a/order_of_fields_returned_by_document_getfields#d6b19ab1bd87e30a
>
>
> http://www.lucidimagination.com/search/document/deda4dd3f9041bee/the_order_of_fields_in_document_fields#bb26d84091aebcaa
>
> -Grant
>
>
> On Mar 31, 2009, at 8:44 AM, Grant Ingersoll wrote:
>
> I'm going to bring this over to java-dev.
>>
>> -Grant
>>
>> On Mar 30, 2009, at 11:34 AM, Raymond Balmès wrote:
>>
>> lucene 2.4.0
>>>
>>> On Mon, Mar 30, 2009 at 2:18 PM, Grant Ingersoll <gsingers@apache.org
>>> >wrote:
>>>
>>>
>>>> On Mar 30, 2009, at 4:42 AM, Raymond Balmès wrote:
>>>>
>>>>
>>>>>
>>>>> I found out that the fields are processed in alpha order... and not in
>>>>> creation order. Is there any reason for that ?
>>>>>
>>>>>
>>>> Hmm, that doesn't sound right (in other words, something must have
>>>> changed).  What version of Lucene are you using?
>>>>
>>>> -Grant
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>>
>> --------------------------
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>>
>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
>> Solr/Lucene:
>> http://www.lucidimagination.com/search
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> Solr/Lucene:
> http://www.lucidimagination.com/search
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Empty SinkTokenizer

Posted by Grant Ingersoll <gs...@apache.org>.

I might add that I don't know that we explicitly ever declare they  
must be in order, but it has always been my understanding that they  
should be and I confirm this by several conversations in the past:
http://www.lucidimagination.com/search/document/274ec8c1c56fdd54/order_of_field_objects_within_document#5ffce4509ed32511

http://www.lucidimagination.com/search/document/d6b19ab1bd87e30a/order_of_fields_returned_by_document_getfields#d6b19ab1bd87e30a

http://www.lucidimagination.com/search/document/deda4dd3f9041bee/the_order_of_fields_in_document_fields#bb26d84091aebcaa

-Grant

On Mar 31, 2009, at 8:44 AM, Grant Ingersoll wrote:

> I'm going to bring this over to java-dev.
>
> -Grant
>
> On Mar 30, 2009, at 11:34 AM, Raymond Balmès wrote:
>
>> lucene 2.4.0
>>
>> On Mon, Mar 30, 2009 at 2:18 PM, Grant Ingersoll  
>> <gs...@apache.org>wrote:
>>
>>>
>>> On Mar 30, 2009, at 4:42 AM, Raymond Balmès wrote:
>>>
>>>>
>>>>
>>>> I found out that the fields are processed in alpha order... and  
>>>> not in
>>>> creation order. Is there any reason for that ?
>>>>
>>>
>>> Hmm, that doesn't sound right (in other words, something must have
>>> changed).  What version of Lucene are you using?
>>>
>>> -Grant
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
> using Solr/Lucene:
> http://www.lucidimagination.com/search
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Empty SinkTokenizer

Posted by Grant Ingersoll <gs...@apache.org>.

I'm going to bring this over to java-dev.

-Grant

On Mar 30, 2009, at 11:34 AM, Raymond Balmès wrote:

> lucene 2.4.0
>
> On Mon, Mar 30, 2009 at 2:18 PM, Grant Ingersoll  
> <gs...@apache.org>wrote:
>
>>
>> On Mar 30, 2009, at 4:42 AM, Raymond Balmès wrote:
>>
>>>
>>>
>>> I found out that the fields are processed in alpha order... and  
>>> not in
>>> creation order. Is there any reason for that ?
>>>
>>
>> Hmm, that doesn't sound right (in other words, something must have
>> changed).  What version of Lucene are you using?
>>
>> -Grant
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Empty SinkTokenizer

Posted by Raymond Balmès <ra...@gmail.com>.

lucene 2.4.0

On Mon, Mar 30, 2009 at 2:18 PM, Grant Ingersoll <gs...@apache.org>wrote:

>
> On Mar 30, 2009, at 4:42 AM, Raymond Balmès wrote:
>
>>
>>
>> I found out that the fields are processed in alpha order... and not in
>> creation order. Is there any reason for that ?
>>
>
> Hmm, that doesn't sound right (in other words, something must have
> changed).  What version of Lucene are you using?
>
> -Grant
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Empty SinkTokenizer

Posted by Grant Ingersoll <gs...@apache.org>.

On Mar 30, 2009, at 4:42 AM, Raymond Balmès wrote:
>
>
> I found out that the fields are processed in alpha order... and not in
> creation order. Is there any reason for that ?

Hmm, that doesn't sound right (in other words, something must have  
changed).  What version of Lucene are you using?

-Grant
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Empty SinkTokenizer

Posted by Raymond Balmès <ra...@gmail.com>.

Yes indeed confusing code... I was also very confused.
In the meantime I solved my problem by checking in the tokenStream method of
myAnalyzer which field was being looked at and applying the right stream to
the right field. No idea if this is how it is intended to be done, but it
works perfect in my case.

I found out that the fields are processed in alpha order... and not in
creation order. Is there any reason for that ?

-Ray-

On Sat, Mar 28, 2009 at 9:30 PM, Erick Erickson <er...@gmail.com>wrote:

> What kind of failures do you get? And I'm confused by the code. Are
> you creating a new IndexWriter every time? Do you ever close it?
>
> It'd help to see the surrounding code...
>
> Best
> Erick
>
> On Sat, Mar 28, 2009 at 1:36 PM, Raymond Balmès <raymond.balmes@gmail.com
> >wrote:
>
> > Hi guys,
> >
> > I'm using a SinkTokenizer to collect some terms of the documents while
> > doing
> > the main document indexing
> > I attached it to a specific field (tokenized, indexed).
> >
> > *
> >
> > writer* = *new* IndexWriter(index, *my _analyzer*, create,
> > *new*IndexWriter.MaxFieldLength(1000000));
> > doc.add(new Field("content", reader));
> >
> > doc.add(*new* Field("*myField*",*my_analyzer.sinkStream**));*
> >
> > writer.addDocument(doc);
> >
> > I have a set of document which don't have those terms so the Sink is
> empty.
> >
> > writer.addDocument works fine on the first document, but it fails always
> on
> > the second ????
> >
> > Any idea what I should look for... I kind of get stuck, because
> > understanding what's done under addDocument is tough.
> >
> > -Raymond-
> >
>

Re: Empty SinkTokenizer

Posted by Erick Erickson <er...@gmail.com>.

What kind of failures do you get? And I'm confused by the code. Are
you creating a new IndexWriter every time? Do you ever close it?

It'd help to see the surrounding code...

Best
Erick

On Sat, Mar 28, 2009 at 1:36 PM, Raymond Balmès <ra...@gmail.com>wrote:

> Hi guys,
>
> I'm using a SinkTokenizer to collect some terms of the documents while
> doing
> the main document indexing
> I attached it to a specific field (tokenized, indexed).
>
> *
>
> writer* = *new* IndexWriter(index, *my _analyzer*, create,
> *new*IndexWriter.MaxFieldLength(1000000));
> doc.add(new Field("content", reader));
>
> doc.add(*new* Field("*myField*",*my_analyzer.sinkStream**));*
>
> writer.addDocument(doc);
>
> I have a set of document which don't have those terms so the Sink is empty.
>
> writer.addDocument works fine on the first document, but it fails always on
> the second ????
>
> Any idea what I should look for... I kind of get stuck, because
> understanding what's done under addDocument is tough.
>
> -Raymond-
>