You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "Mader, Volker" <VM...@heiler.com> on 2002/09/10 09:00:00 UTC

Performance with 5 Millions indexed items

Hi,

I've got a question about performance with "bigger" indexes. We used IndexWriter with GermanAnalyzer to index data with the following fields:

Field1: ID (a long value)
Field2: Description (a free text)
Field3: Groups (a list of up to 10 long values encoded in a single string)
Field4: Classes (a list of up to 10 long values encoded in a single string)

Documents are created with the 4 fields and then added to the Indexwriter.
After all the index is optimized.

Searching now for a word in field "Description" using IndexSearcher(GermanAnalyzer) with FuzzyQuery leads to search times up to 30 seconds on a Pentium 4 1,4GHz.
Also the retrieval with hits.doc(..) is very slow.

Any ideas?

Volker

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Performance with 5 Millions indexed items

Posted by ta...@pisoftware.com.
We have been successful at loading 10 million documents with 3 fields and
performing acceptable search response times (1-2secs).
Using a network / mounted drive?

What was your mergeFactor?


> Hi,
>
> I've got a question about performance with "bigger" indexes. We used
> IndexWriter with GermanAnalyzer to index data with the following
> fields:
>
> Field1: ID (a long value)
> Field2: Description (a free text)
> Field3: Groups (a list of up to 10 long values encoded in a single
> string) Field4: Classes (a list of up to 10 long values encoded in a
> single string)
>
> Documents are created with the 4 fields and then added to the
> Indexwriter. After all the index is optimized.
>
> Searching now for a word in field "Description" using
> IndexSearcher(GermanAnalyzer) with FuzzyQuery leads to search times up
> to 30 seconds on a Pentium 4 1,4GHz. Also the retrieval with
> hits.doc(..) is very slow.
>
> Any ideas?
>
> Volker
>
> --
> To unsubscribe, e-mail:
> <ma...@jakarta.apache.org> For additional
> commands, e-mail: <ma...@jakarta.apache.org>




--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


RE: Performance with 5 Millions indexed items

Posted by "Nader S. Henein" <ns...@bayt.net>.
I'm working on about a million records, but with a much more complicated XML
tree each record has about 20 fields, and my searches come up in
milliseconds, I'm running on Dual Sparcs with a geg of ram, but I don't
think the hardware is your bottleneck, you might want to take a look at your
JVM and the amount of memory that you've allocated for it. Also since Lucene
is a file based search engine, you might want to look at memory based file
storage, do fragmentation and seek schemes on your hard drives, in all
honesty I don't think it's this complicated, it's probably something stupid
.. make sure you don't have a loop somewhere in the pre/post processing of
the actual search, but rest assured Lucene is much faster than this.

Nader Henein

-----Original Message-----
From: Mader, Volker [mailto:VMader@heiler.com]
Sent: Tuesday, September 10, 2002 11:00 AM
To: lucene-user@jakarta.apache.org
Subject: Performance with 5 Millions indexed items


Hi,

I've got a question about performance with "bigger" indexes. We used
IndexWriter with GermanAnalyzer to index data with the following fields:

Field1: ID (a long value)
Field2: Description (a free text)
Field3: Groups (a list of up to 10 long values encoded in a single string)
Field4: Classes (a list of up to 10 long values encoded in a single string)

Documents are created with the 4 fields and then added to the Indexwriter.
After all the index is optimized.

Searching now for a word in field "Description" using
IndexSearcher(GermanAnalyzer) with FuzzyQuery leads to search times up to 30
seconds on a Pentium 4 1,4GHz.
Also the retrieval with hits.doc(..) is very slow.

Any ideas?

Volker

--
To unsubscribe, e-mail:
<ma...@jakarta.apache.org>
For additional commands, e-mail:
<ma...@jakarta.apache.org>




--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>