You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Hamed Ghavamnia <gh...@gmail.com> on 2014/01/25 08:27:23 UTC

Lucene performance

Hello,

I searched a lot about lucene limits and its performance, but I still don't
know how much I can count on it. I'm storing logs and indexing them with
lucene. The event per second is 2000. The format of each log is generally
'fieldname' : 'fieldvalue'.
What search performance should I expect after a few days. Right now I'm
having around 25 seconds of query response time on around 500 million logs.
Each log is converted into a document and the field values are stored as
well as being indexed. I have around 10 fields in each log.
Is my query time normal, of am I making a huge mistake?
How much does storing fields make a difference, would it be better if I
didn't store the fields.

Thanks.

Re: Lucene performance

Posted by Hamed Ghavamnia <gh...@gmail.com>.
Thanks, I've put some time checks on the different parts of my search, it
seems like the directory opening part is taking most of the response time.
I'm using MMapDirectory, but it doesn't seem to speed up my directory
opening process.
I've split my indexes during creation into different folders, and merging
them by using a MultiReader with multithreading enable.
I'm wondering if I can open my directories by using multithreading as well
to speed up the process.

Best,
Hamed


On Sat, Jan 25, 2014 at 4:14 PM, Erick Erickson <er...@gmail.com>wrote:

> You'll have to do some tuning with that kind of ingestion rate, and
> you're talking about a significant size cluster here. At 172M
> documents/day or so, you're not going to store very many days per
> node.
>
> Storing doesn't make much of any difference as far as search
> speed is concerned, the raw data is stored in separate files
> (*.fdt and *.fdx files) and doesn't affect _search_. They are
> accessed to assemble the response.
>
> Otherwise there's not a lot of info to go on here. Here are some
> resources:
> http://wiki.apache.org/solr/SolrPerformanceFactors
>
> http://searchhub.org/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>
> Best,
> Erick
>
> On Sat, Jan 25, 2014 at 1:27 AM, Hamed Ghavamnia <gh...@gmail.com>
> wrote:
> > Hello,
> >
> > I searched a lot about lucene limits and its performance, but I still
> don't
> > know how much I can count on it. I'm storing logs and indexing them with
> > lucene. The event per second is 2000. The format of each log is generally
> > 'fieldname' : 'fieldvalue'.
> > What search performance should I expect after a few days. Right now I'm
> > having around 25 seconds of query response time on around 500 million
> logs.
> > Each log is converted into a document and the field values are stored as
> > well as being indexed. I have around 10 fields in each log.
> > Is my query time normal, of am I making a huge mistake?
> > How much does storing fields make a difference, would it be better if I
> > didn't store the fields.
> >
> > Thanks.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Lucene performance

Posted by Erick Erickson <er...@gmail.com>.
You'll have to do some tuning with that kind of ingestion rate, and
you're talking about a significant size cluster here. At 172M
documents/day or so, you're not going to store very many days per
node.

Storing doesn't make much of any difference as far as search
speed is concerned, the raw data is stored in separate files
(*.fdt and *.fdx files) and doesn't affect _search_. They are
accessed to assemble the response.

Otherwise there's not a lot of info to go on here. Here are some
resources:
http://wiki.apache.org/solr/SolrPerformanceFactors
http://searchhub.org/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

Best,
Erick

On Sat, Jan 25, 2014 at 1:27 AM, Hamed Ghavamnia <gh...@gmail.com> wrote:
> Hello,
>
> I searched a lot about lucene limits and its performance, but I still don't
> know how much I can count on it. I'm storing logs and indexing them with
> lucene. The event per second is 2000. The format of each log is generally
> 'fieldname' : 'fieldvalue'.
> What search performance should I expect after a few days. Right now I'm
> having around 25 seconds of query response time on around 500 million logs.
> Each log is converted into a document and the field values are stored as
> well as being indexed. I have around 10 fields in each log.
> Is my query time normal, of am I making a huge mistake?
> How much does storing fields make a difference, would it be better if I
> didn't store the fields.
>
> Thanks.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org