You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "Bill von Ofenheim (LaRC)" <w....@nasa.gov> on 2004/12/07 00:21:30 UTC

Single Digit Indexing

How can I get Lucene to index single digits (e.g. "8" as in "Gemini 8")?
I am able to index numbers with two or more digits (e.g. "11" as in
"Apollo 11").

Thanks,
Bill von Ofenheim




---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Is this a bug or a feature with addIndexes?

Posted by Chris Hostetter <ho...@fucit.org>.
: [amigo@venus tmp]# time java MemoryVsDisk 1 1 100000 -r
: Docs in the RAM index: 1
: Docs in the FS index: 0
: Total time: 142 ms

I looked at the code from the article you mentioned and added the print
statements i'm guessing you added for ramWriter/fsWriter.docCount() before
and after each are closed.  I also opened the resulting indexDir with a
new IndexReader after all the writers had been closed to get it's numDocs
-- and I can confirm that the index in indexDir is in fact empty.  (using
1.4.2)


But like i said before:  You should try closing the ramWriter before
calling fsWriter.addIndexes.  i can say with authority that it works
(because i've tried it)

The date on that article is March of 2003 -- which pre-dates the lucene
1.3 RC, so it's likely that the internals have changed a bit making
it neccessary to close ramWriter first.

Hell, it's entirely possible that the code in Otis's article never work
100% correctly ... that code never printed out the number of docs in the
final index, so it's entirely possible it was missing a few even when he
ran it.

-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Is this a bug or a feature with addIndexes?

Posted by "amigo@max3d.com" <am...@max3d.com>.
Hi Otis

I did try, here's what I get:

[amigo@venus tmp]# time java MemoryVsDisk 1 1 100000 -r  
Docs in the RAM index: 1
Docs in the FS index: 0
Total time: 142 ms

real    0m0.322s
user    0m0.268s
sys     0m0.033s

I tried other combinations but they dont seem to affect the outcome 
either :(

thanks

-pedja


Otis Gospodnetic said the following on 12/6/2004 8:11 PM:

>Hello,
>
>Try changing IndexWriter's mergeFactor variable.  It's 10 by default. 
>Change it to 1, for instance.
>
>Otis
>
>--- "amigo@max3d.com" <am...@max3d.com> wrote:
>
>  
>
>>Greetings,
>>
>>Ok, so maybe this is common knowledge to most of you but I'm a lamen 
>>when it comes to Lucene and
>>I couldnt find any details about this after some searching.
>>
>>When you merge two indexes via addIndexes, does it only work in
>>batches 
>>(10 or more documents)?
>>
>>Because I've been banging my head off the wall wondering why my code 
>>does not want to index 1 (one) document and
>>then I went to run Otis's MemoryVsDisk class from 
>>http://www.onjava.com/pub/a/onjava/2003/03/05/lucene.html?page=last
>>but I didnt use 10,000 documents as suggested, I used 5 and 15
>>instead.
>>And what do you know, less than 10 it doesnt merge at all while more 
>>than 10 it will merge only first 10 documents and
>>"gently" forget about the other 5.
>>
>>My project requires me to index/update one single document as
>>required 
>>and make it immediately available for searching.
>>
>>How do I accomplish this if index merging will not merge less than 10
>>
>>and in increments of 10, and single indexing doesnt
>>seem to do it at all (please see my other post 
>>http://marc.theaimsgroup.com/?l=lucene-user&m=110237364203877&w=2)
>>
>>thanks
>>
>>-pedja
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
>>
>>    
>>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
>
>  
>

Re: Is this a bug or a feature with addIndexes?

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Hello,

Try changing IndexWriter's mergeFactor variable.  It's 10 by default. 
Change it to 1, for instance.

Otis

--- "amigo@max3d.com" <am...@max3d.com> wrote:

> Greetings,
> 
> Ok, so maybe this is common knowledge to most of you but I'm a lamen 
> when it comes to Lucene and
> I couldnt find any details about this after some searching.
> 
> When you merge two indexes via addIndexes, does it only work in
> batches 
> (10 or more documents)?
> 
> Because I've been banging my head off the wall wondering why my code 
> does not want to index 1 (one) document and
> then I went to run Otis's MemoryVsDisk class from 
> http://www.onjava.com/pub/a/onjava/2003/03/05/lucene.html?page=last
> but I didnt use 10,000 documents as suggested, I used 5 and 15
> instead.
> And what do you know, less than 10 it doesnt merge at all while more 
> than 10 it will merge only first 10 documents and
> "gently" forget about the other 5.
> 
> My project requires me to index/update one single document as
> required 
> and make it immediately available for searching.
> 
> How do I accomplish this if index merging will not merge less than 10
> 
> and in increments of 10, and single indexing doesnt
> seem to do it at all (please see my other post 
> http://marc.theaimsgroup.com/?l=lucene-user&m=110237364203877&w=2)
> 
> thanks
> 
> -pedja
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Is this a bug or a feature with addIndexes?

Posted by "amigo@max3d.com" <am...@max3d.com>.
Greetings,

Ok, so maybe this is common knowledge to most of you but I'm a lamen 
when it comes to Lucene and
I couldnt find any details about this after some searching.

When you merge two indexes via addIndexes, does it only work in batches 
(10 or more documents)?

Because I've been banging my head off the wall wondering why my code 
does not want to index 1 (one) document and
then I went to run Otis's MemoryVsDisk class from 
http://www.onjava.com/pub/a/onjava/2003/03/05/lucene.html?page=last
but I didnt use 10,000 documents as suggested, I used 5 and 15 instead.
And what do you know, less than 10 it doesnt merge at all while more 
than 10 it will merge only first 10 documents and
"gently" forget about the other 5.

My project requires me to index/update one single document as required 
and make it immediately available for searching.

How do I accomplish this if index merging will not merge less than 10 
and in increments of 10, and single indexing doesnt
seem to do it at all (please see my other post 
http://marc.theaimsgroup.com/?l=lucene-user&m=110237364203877&w=2)

thanks

-pedja

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Single Digit Indexing

Posted by David Spencer <da...@tropo.com>.
Otis Gospodnetic wrote:

> Hm, if you can index 11, you should be able to index 8 as well.  In any
> case, you most likely want to make sure that your Analyzer is not just

In theory you could have a  "length" filter tossing out tokens that are 
too short or too long, and maybe you're getting rid of all tokens less 
than 2 chars...


> throwing your numbers out.  This may stillbe up to date:
> http://www.jguru.com/faq/view.jsp?EID=538308
> 
> See also: http://wiki.apache.org/jakarta-lucene/HowTo
> 
> Otis
> 
> --- "Bill von Ofenheim (LaRC)" <w....@nasa.gov> wrote:
> 
> 
>>How can I get Lucene to index single digits (e.g. "8" as in "Gemini
>>8")?
>>I am able to index numbers with two or more digits (e.g. "11" as in
>>"Apollo 11").
>>
>>Thanks,
>>Bill von Ofenheim
>>
>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
>>
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Single Digit Indexing

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Hm, if you can index 11, you should be able to index 8 as well.  In any
case, you most likely want to make sure that your Analyzer is not just
throwing your numbers out.  This may stillbe up to date:
http://www.jguru.com/faq/view.jsp?EID=538308

See also: http://wiki.apache.org/jakarta-lucene/HowTo

Otis

--- "Bill von Ofenheim (LaRC)" <w....@nasa.gov> wrote:

> How can I get Lucene to index single digits (e.g. "8" as in "Gemini
> 8")?
> I am able to index numbers with two or more digits (e.g. "11" as in
> "Apollo 11").
> 
> Thanks,
> Bill von Ofenheim
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org