You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by prashant ullegaddi <pr...@gmail.com> on 2009/08/03 06:33:46 UTC

How to improve search time?

Hi,

I've a single index of size 87GB containing around 50M documents. When I
search for any query,
best search time I observed was 8sec. And when query is expanded with
synonyms, search takes
minutes (~ 2-3min). Is there a better way to search so that overall search
time reduces?

Thanks,
Prashant.

Re: How to improve search time?

Posted by Shashi Kant <sh...@gmail.com>.

To add to all these excellent suggestions: I would suggest creating a
"baby index" out of the master index -  pull out say 1000 docs into a
test index and query. Helps in narrowing down the problem.


On Tue, Aug 4, 2009 at 8:55 AM, Matthew Hall<mh...@informatics.jax.org> wrote:
> Also, how long does it take Luke to do a search against the same index.
>
> That way you can remove any of the timing that your application is adding
> into the mix.
>
> If Luke doesn't take the minimum of 8 seconds... then you know its an issue
> with your app.  (or at least a large part of it)
>
> Matt
>
> Ian Lea wrote:
>>
>> Still surprising that your searches are taking so long.
>>
>> Have you worked through everything on
>> http://wiki.apache.org/lucene-java/ImproveSearchingSpeed, suggested by
>> someone earlier in this thread?  Are you sure that the problem is
>> really with lucene? Is it the search itself that takes a long time, or
>> retrieving data for the hits?  What does query.toString() look like?
>> How many hits does a search typically match?  Is a search on document
>> id effectively instant?
>>
>> You have to supply more detail if you want better answers.
>>
>>
>> --
>> Ian.
>>
>>
>> On Tue, Aug 4, 2009 at 12:21 PM, prashant
>> ullegaddi<pr...@gmail.com> wrote:
>>
>>>
>>> Shahi,
>>>
>>> Our queries are free text queries. But they will be expanded into:
>>> Multifield, Boolean.
>>> We are also expanding the original query using SynExpand of lucene. A
>>> simple
>>> query
>>> gets expanded to say a query of page size.
>>>
>>> And we are not storing any other fields except key (document IDs), target
>>> URLs and titles.
>>>
>>> Prashant.
>>>
>>> On Tue, Aug 4, 2009 at 1:31 PM, Shashi Kant <sh...@gmail.com> wrote:
>>>
>>>
>>>>
>>>> Prashant, I have had better luck with even larger sized indices on
>>>> similar platforms. Could you elaborate what types of queries you are
>>>> running, Multifield? Boolean? combinations? etc. Also you might want
>>>> to remove unnecessary stored fields from the index and move them to a
>>>> relational db to squeeze out better performance.
>>>>
>>>>
>>>> Shashi
>>>>
>>>>
>>>> On Tue, Aug 4, 2009 at 3:18 AM, prashant
>>>> ullegaddi<pr...@gmail.com> wrote:
>>>>
>>>>>
>>>>> I did that as well. Actually, we had 32 indexes initially. We searched
>>>>>
>>>>
>>>> them.
>>>>
>>>>>
>>>>> It was even horrible.
>>>>> After that I merged them into 4 indexes. And did the same. No gain!
>>>>>
>>>>> Then, I had to merge 32 indexes into one.
>>>>>
>>>>> On Tue, Aug 4, 2009 at 10:48 AM, Anshum <an...@gmail.com> wrote:
>>>>>
>>>>>
>>>>>>
>>>>>> Hi Prashant,
>>>>>> 8 seconds as the minimum time is a little too much, though considering
>>>>>> you're using just 4G of RAM its still ok.
>>>>>> I would advice you to break your index into smaller indexes, perhaps
>>>>>> selectively query the indexes (if that's possible for your
>>>>>> application)
>>>>>>
>>>>
>>>> and
>>>>
>>>>>>
>>>>>> use a parallelmultisearcher. Its just something that you might try and
>>>>>> like.
>>>>>> All said and done, parallelizing would only get you a bell-curve like
>>>>>> performance graph, so you'd have to figure out the sweet spot there.
>>>>>>
>>>>>> --
>>>>>> Anshum Gupta
>>>>>> Naukri Labs!
>>>>>> http://ai-cafe.blogspot.com
>>>>>>
>>>>>> The facts expressed here belong to everybody, the opinions to me. The
>>>>>> distinction is yours to draw............
>>>>>>
>>>>>>
>>>>>> On Tue, Aug 4, 2009 at 10:08 AM, prashant ullegaddi <
>>>>>> prashullegaddi@gmail.com> wrote:
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> I'm running it on Quadcore, 2.4GHz each, 4GB RAM.
>>>>>>>
>>>>>>> Prashant.
>>>>>>>
>>>>>>> On Tue, Aug 4, 2009 at 8:38 AM, Otis Gospodnetic <
>>>>>>> otis_gospodnetic@yahoo.com
>>>>>>>
>>>>>>>>
>>>>>>>> wrote:
>>>>>>>>              With such a large index be prepared to put it on a
>>>>>>>> server with lots
>>>>>>>>
>>>>
>>>> of
>>>>
>>>>>>>
>>>>>>> RAM
>>>>>>>
>>>>>>>>
>>>>>>>> (even if you follow all the tips from the Wiki).
>>>>>>>> When reporting performance numbers, you really ought to tell us
>>>>>>>>
>>>>
>>>> about
>>>>
>>>>>>>
>>>>>>> your
>>>>>>>
>>>>>>>>
>>>>>>>> hardware, types of queries, etc.
>>>>>>>>
>>>>>>>> Otis
>>>>>>>> --
>>>>>>>> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
>>>>>>>> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ----- Original Message ----
>>>>>>>>
>>>>>>>>>
>>>>>>>>> From: prashant ullegaddi <pr...@gmail.com>
>>>>>>>>> To: java-user@lucene.apache.org
>>>>>>>>> Sent: Monday, August 3, 2009 12:33:46 AM
>>>>>>>>> Subject: How to improve search time?
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I've a single index of size 87GB containing around 50M documents.
>>>>>>>>>
>>>>>>
>>>>>> When
>>>>>>
>>>>>>>
>>>>>>> I
>>>>>>>
>>>>>>>>>
>>>>>>>>> search for any query,
>>>>>>>>> best search time I observed was 8sec. And when query is expanded
>>>>>>>>>
>>>>
>>>> with
>>>>
>>>>>>>>>
>>>>>>>>> synonyms, search takes
>>>>>>>>> minutes (~ 2-3min). Is there a better way to search so that
>>>>>>>>>
>>>>
>>>> overall
>>>>
>>>>>>>>
>>>>>>>> search
>>>>>>>>
>>>>>>>>>
>>>>>>>>> time reduces?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Prashant.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>>
>>>>>>>>
>>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
>
> --
> Matthew Hall
> Software Engineer
> Mouse Genome Informatics
> mhall@informatics.jax.org
> (207) 288-6012
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>



-- 

Phone# (617) 714-4775
Cell# (617) 642-6745
Google Voice# (617) 575-9264

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: How to improve search time?

Posted by Matthew Hall <mh...@informatics.jax.org>.

Also, how long does it take Luke to do a search against the same index.

That way you can remove any of the timing that your application is 
adding into the mix.

If Luke doesn't take the minimum of 8 seconds... then you know its an 
issue with your app.  (or at least a large part of it)

Matt

Ian Lea wrote:
> Still surprising that your searches are taking so long.
>
> Have you worked through everything on
> http://wiki.apache.org/lucene-java/ImproveSearchingSpeed, suggested by
> someone earlier in this thread?  Are you sure that the problem is
> really with lucene? Is it the search itself that takes a long time, or
> retrieving data for the hits?  What does query.toString() look like?
> How many hits does a search typically match?  Is a search on document
> id effectively instant?
>
> You have to supply more detail if you want better answers.
>
>
> --
> Ian.
>
>
> On Tue, Aug 4, 2009 at 12:21 PM, prashant
> ullegaddi<pr...@gmail.com> wrote:
>   
>> Shahi,
>>
>> Our queries are free text queries. But they will be expanded into:
>> Multifield, Boolean.
>> We are also expanding the original query using SynExpand of lucene. A simple
>> query
>> gets expanded to say a query of page size.
>>
>> And we are not storing any other fields except key (document IDs), target
>> URLs and titles.
>>
>> Prashant.
>>
>> On Tue, Aug 4, 2009 at 1:31 PM, Shashi Kant <sh...@gmail.com> wrote:
>>
>>     
>>> Prashant, I have had better luck with even larger sized indices on
>>> similar platforms. Could you elaborate what types of queries you are
>>> running, Multifield? Boolean? combinations? etc. Also you might want
>>> to remove unnecessary stored fields from the index and move them to a
>>> relational db to squeeze out better performance.
>>>
>>>
>>> Shashi
>>>
>>>
>>> On Tue, Aug 4, 2009 at 3:18 AM, prashant
>>> ullegaddi<pr...@gmail.com> wrote:
>>>       
>>>> I did that as well. Actually, we had 32 indexes initially. We searched
>>>>         
>>> them.
>>>       
>>>> It was even horrible.
>>>> After that I merged them into 4 indexes. And did the same. No gain!
>>>>
>>>> Then, I had to merge 32 indexes into one.
>>>>
>>>> On Tue, Aug 4, 2009 at 10:48 AM, Anshum <an...@gmail.com> wrote:
>>>>
>>>>         
>>>>> Hi Prashant,
>>>>> 8 seconds as the minimum time is a little too much, though considering
>>>>> you're using just 4G of RAM its still ok.
>>>>> I would advice you to break your index into smaller indexes, perhaps
>>>>> selectively query the indexes (if that's possible for your application)
>>>>>           
>>> and
>>>       
>>>>> use a parallelmultisearcher. Its just something that you might try and
>>>>> like.
>>>>> All said and done, parallelizing would only get you a bell-curve like
>>>>> performance graph, so you'd have to figure out the sweet spot there.
>>>>>
>>>>> --
>>>>> Anshum Gupta
>>>>> Naukri Labs!
>>>>> http://ai-cafe.blogspot.com
>>>>>
>>>>> The facts expressed here belong to everybody, the opinions to me. The
>>>>> distinction is yours to draw............
>>>>>
>>>>>
>>>>> On Tue, Aug 4, 2009 at 10:08 AM, prashant ullegaddi <
>>>>> prashullegaddi@gmail.com> wrote:
>>>>>
>>>>>           
>>>>>> I'm running it on Quadcore, 2.4GHz each, 4GB RAM.
>>>>>>
>>>>>> Prashant.
>>>>>>
>>>>>> On Tue, Aug 4, 2009 at 8:38 AM, Otis Gospodnetic <
>>>>>> otis_gospodnetic@yahoo.com
>>>>>>             
>>>>>>> wrote:
>>>>>>>               
>>>>>>> With such a large index be prepared to put it on a server with lots
>>>>>>>               
>>> of
>>>       
>>>>>> RAM
>>>>>>             
>>>>>>> (even if you follow all the tips from the Wiki).
>>>>>>> When reporting performance numbers, you really ought to tell us
>>>>>>>               
>>> about
>>>       
>>>>>> your
>>>>>>             
>>>>>>> hardware, types of queries, etc.
>>>>>>>
>>>>>>> Otis
>>>>>>> --
>>>>>>> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
>>>>>>> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ----- Original Message ----
>>>>>>>               
>>>>>>>> From: prashant ullegaddi <pr...@gmail.com>
>>>>>>>> To: java-user@lucene.apache.org
>>>>>>>> Sent: Monday, August 3, 2009 12:33:46 AM
>>>>>>>> Subject: How to improve search time?
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I've a single index of size 87GB containing around 50M documents.
>>>>>>>>                 
>>>>> When
>>>>>           
>>>>>> I
>>>>>>             
>>>>>>>> search for any query,
>>>>>>>> best search time I observed was 8sec. And when query is expanded
>>>>>>>>                 
>>> with
>>>       
>>>>>>>> synonyms, search takes
>>>>>>>> minutes (~ 2-3min). Is there a better way to search so that
>>>>>>>>                 
>>> overall
>>>       
>>>>>>> search
>>>>>>>               
>>>>>>>> time reduces?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Prashant.
>>>>>>>>                 
>>>>>>>
>>>>>>>               
>>> ---------------------------------------------------------------------
>>>       
>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>
>>>>>>>
>>>>>>>               
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>       
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>   


-- 
Matthew Hall
Software Engineer
Mouse Genome Informatics
mhall@informatics.jax.org
(207) 288-6012



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: How to improve search time?

Posted by Ian Lea <ia...@gmail.com>.

Still surprising that your searches are taking so long.

Have you worked through everything on
http://wiki.apache.org/lucene-java/ImproveSearchingSpeed, suggested by
someone earlier in this thread?  Are you sure that the problem is
really with lucene? Is it the search itself that takes a long time, or
retrieving data for the hits?  What does query.toString() look like?
How many hits does a search typically match?  Is a search on document
id effectively instant?

You have to supply more detail if you want better answers.


--
Ian.


On Tue, Aug 4, 2009 at 12:21 PM, prashant
ullegaddi<pr...@gmail.com> wrote:
> Shahi,
>
> Our queries are free text queries. But they will be expanded into:
> Multifield, Boolean.
> We are also expanding the original query using SynExpand of lucene. A simple
> query
> gets expanded to say a query of page size.
>
> And we are not storing any other fields except key (document IDs), target
> URLs and titles.
>
> Prashant.
>
> On Tue, Aug 4, 2009 at 1:31 PM, Shashi Kant <sh...@gmail.com> wrote:
>
>> Prashant, I have had better luck with even larger sized indices on
>> similar platforms. Could you elaborate what types of queries you are
>> running, Multifield? Boolean? combinations? etc. Also you might want
>> to remove unnecessary stored fields from the index and move them to a
>> relational db to squeeze out better performance.
>>
>>
>> Shashi
>>
>>
>> On Tue, Aug 4, 2009 at 3:18 AM, prashant
>> ullegaddi<pr...@gmail.com> wrote:
>> > I did that as well. Actually, we had 32 indexes initially. We searched
>> them.
>> > It was even horrible.
>> > After that I merged them into 4 indexes. And did the same. No gain!
>> >
>> > Then, I had to merge 32 indexes into one.
>> >
>> > On Tue, Aug 4, 2009 at 10:48 AM, Anshum <an...@gmail.com> wrote:
>> >
>> >> Hi Prashant,
>> >> 8 seconds as the minimum time is a little too much, though considering
>> >> you're using just 4G of RAM its still ok.
>> >> I would advice you to break your index into smaller indexes, perhaps
>> >> selectively query the indexes (if that's possible for your application)
>> and
>> >> use a parallelmultisearcher. Its just something that you might try and
>> >> like.
>> >> All said and done, parallelizing would only get you a bell-curve like
>> >> performance graph, so you'd have to figure out the sweet spot there.
>> >>
>> >> --
>> >> Anshum Gupta
>> >> Naukri Labs!
>> >> http://ai-cafe.blogspot.com
>> >>
>> >> The facts expressed here belong to everybody, the opinions to me. The
>> >> distinction is yours to draw............
>> >>
>> >>
>> >> On Tue, Aug 4, 2009 at 10:08 AM, prashant ullegaddi <
>> >> prashullegaddi@gmail.com> wrote:
>> >>
>> >> > I'm running it on Quadcore, 2.4GHz each, 4GB RAM.
>> >> >
>> >> > Prashant.
>> >> >
>> >> > On Tue, Aug 4, 2009 at 8:38 AM, Otis Gospodnetic <
>> >> > otis_gospodnetic@yahoo.com
>> >> > > wrote:
>> >> >
>> >> > > With such a large index be prepared to put it on a server with lots
>> of
>> >> > RAM
>> >> > > (even if you follow all the tips from the Wiki).
>> >> > > When reporting performance numbers, you really ought to tell us
>> about
>> >> > your
>> >> > > hardware, types of queries, etc.
>> >> > >
>> >> > > Otis
>> >> > > --
>> >> > > Sematext is hiring -- http://sematext.com/about/jobs.html?mls
>> >> > > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
>> >> > >
>> >> > >
>> >> > >
>> >> > > ----- Original Message ----
>> >> > > > From: prashant ullegaddi <pr...@gmail.com>
>> >> > > > To: java-user@lucene.apache.org
>> >> > > > Sent: Monday, August 3, 2009 12:33:46 AM
>> >> > > > Subject: How to improve search time?
>> >> > > >
>> >> > > > Hi,
>> >> > > >
>> >> > > > I've a single index of size 87GB containing around 50M documents.
>> >> When
>> >> > I
>> >> > > > search for any query,
>> >> > > > best search time I observed was 8sec. And when query is expanded
>> with
>> >> > > > synonyms, search takes
>> >> > > > minutes (~ 2-3min). Is there a better way to search so that
>> overall
>> >> > > search
>> >> > > > time reduces?
>> >> > > >
>> >> > > > Thanks,
>> >> > > > Prashant.
>> >> > >
>> >> > >
>> >> > >
>> ---------------------------------------------------------------------
>> >> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >> > > For additional commands, e-mail: java-user-help@lucene.apache.org
>> >> > >
>> >> > >
>> >> >
>> >>
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: How to improve search time?

Posted by prashant ullegaddi <pr...@gmail.com>.

Shahi,

Our queries are free text queries. But they will be expanded into:
Multifield, Boolean.
We are also expanding the original query using SynExpand of lucene. A simple
query
gets expanded to say a query of page size.

And we are not storing any other fields except key (document IDs), target
URLs and titles.

Prashant.

On Tue, Aug 4, 2009 at 1:31 PM, Shashi Kant <sh...@gmail.com> wrote:

> Prashant, I have had better luck with even larger sized indices on
> similar platforms. Could you elaborate what types of queries you are
> running, Multifield? Boolean? combinations? etc. Also you might want
> to remove unnecessary stored fields from the index and move them to a
> relational db to squeeze out better performance.
>
>
> Shashi
>
>
> On Tue, Aug 4, 2009 at 3:18 AM, prashant
> ullegaddi<pr...@gmail.com> wrote:
> > I did that as well. Actually, we had 32 indexes initially. We searched
> them.
> > It was even horrible.
> > After that I merged them into 4 indexes. And did the same. No gain!
> >
> > Then, I had to merge 32 indexes into one.
> >
> > On Tue, Aug 4, 2009 at 10:48 AM, Anshum <an...@gmail.com> wrote:
> >
> >> Hi Prashant,
> >> 8 seconds as the minimum time is a little too much, though considering
> >> you're using just 4G of RAM its still ok.
> >> I would advice you to break your index into smaller indexes, perhaps
> >> selectively query the indexes (if that's possible for your application)
> and
> >> use a parallelmultisearcher. Its just something that you might try and
> >> like.
> >> All said and done, parallelizing would only get you a bell-curve like
> >> performance graph, so you'd have to figure out the sweet spot there.
> >>
> >> --
> >> Anshum Gupta
> >> Naukri Labs!
> >> http://ai-cafe.blogspot.com
> >>
> >> The facts expressed here belong to everybody, the opinions to me. The
> >> distinction is yours to draw............
> >>
> >>
> >> On Tue, Aug 4, 2009 at 10:08 AM, prashant ullegaddi <
> >> prashullegaddi@gmail.com> wrote:
> >>
> >> > I'm running it on Quadcore, 2.4GHz each, 4GB RAM.
> >> >
> >> > Prashant.
> >> >
> >> > On Tue, Aug 4, 2009 at 8:38 AM, Otis Gospodnetic <
> >> > otis_gospodnetic@yahoo.com
> >> > > wrote:
> >> >
> >> > > With such a large index be prepared to put it on a server with lots
> of
> >> > RAM
> >> > > (even if you follow all the tips from the Wiki).
> >> > > When reporting performance numbers, you really ought to tell us
> about
> >> > your
> >> > > hardware, types of queries, etc.
> >> > >
> >> > > Otis
> >> > > --
> >> > > Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> >> > > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
> >> > >
> >> > >
> >> > >
> >> > > ----- Original Message ----
> >> > > > From: prashant ullegaddi <pr...@gmail.com>
> >> > > > To: java-user@lucene.apache.org
> >> > > > Sent: Monday, August 3, 2009 12:33:46 AM
> >> > > > Subject: How to improve search time?
> >> > > >
> >> > > > Hi,
> >> > > >
> >> > > > I've a single index of size 87GB containing around 50M documents.
> >> When
> >> > I
> >> > > > search for any query,
> >> > > > best search time I observed was 8sec. And when query is expanded
> with
> >> > > > synonyms, search takes
> >> > > > minutes (~ 2-3min). Is there a better way to search so that
> overall
> >> > > search
> >> > > > time reduces?
> >> > > >
> >> > > > Thanks,
> >> > > > Prashant.
> >> > >
> >> > >
> >> > >
> ---------------------------------------------------------------------
> >> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> >> > >
> >> > >
> >> >
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: How to improve search time?

Posted by Ganesh <em...@yahoo.co.in>.

Hello Shashi,

Could you please provide me your DB related information. How big the db size,  memory etc. 

I am currently having 100 million records splitted in 10 indexes in the same system. I am using ParallelSearcher and search speed is also good. 

Regards
Ganesh

----- Original Message ----- 
From: "Shashi Kant" <sh...@gmail.com>
To: <ja...@lucene.apache.org>
Sent: Tuesday, August 04, 2009 1:31 PM
Subject: Re: How to improve search time?


> Prashant, I have had better luck with even larger sized indices on
> similar platforms. Could you elaborate what types of queries you are
> running, Multifield? Boolean? combinations? etc. Also you might want
> to remove unnecessary stored fields from the index and move them to a
> relational db to squeeze out better performance.
> 
> 
> Shashi
> 
> 
> On Tue, Aug 4, 2009 at 3:18 AM, prashant
> ullegaddi<pr...@gmail.com> wrote:
>> I did that as well. Actually, we had 32 indexes initially. We searched them.
>> It was even horrible.
>> After that I merged them into 4 indexes. And did the same. No gain!
>>
>> Then, I had to merge 32 indexes into one.
>>
>> On Tue, Aug 4, 2009 at 10:48 AM, Anshum <an...@gmail.com> wrote:
>>
>>> Hi Prashant,
>>> 8 seconds as the minimum time is a little too much, though considering
>>> you're using just 4G of RAM its still ok.
>>> I would advice you to break your index into smaller indexes, perhaps
>>> selectively query the indexes (if that's possible for your application) and
>>> use a parallelmultisearcher. Its just something that you might try and
>>> like.
>>> All said and done, parallelizing would only get you a bell-curve like
>>> performance graph, so you'd have to figure out the sweet spot there.
>>>
>>> --
>>> Anshum Gupta
>>> Naukri Labs!
>>> http://ai-cafe.blogspot.com
>>>
>>> The facts expressed here belong to everybody, the opinions to me. The
>>> distinction is yours to draw............
>>>
>>>
>>> On Tue, Aug 4, 2009 at 10:08 AM, prashant ullegaddi <
>>> prashullegaddi@gmail.com> wrote:
>>>
>>> > I'm running it on Quadcore, 2.4GHz each, 4GB RAM.
>>> >
>>> > Prashant.
>>> >
>>> > On Tue, Aug 4, 2009 at 8:38 AM, Otis Gospodnetic <
>>> > otis_gospodnetic@yahoo.com
>>> > > wrote:
>>> >
>>> > > With such a large index be prepared to put it on a server with lots of
>>> > RAM
>>> > > (even if you follow all the tips from the Wiki).
>>> > > When reporting performance numbers, you really ought to tell us about
>>> > your
>>> > > hardware, types of queries, etc.
>>> > >
>>> > > Otis
>>> > > --
>>> > > Sematext is hiring -- http://sematext.com/about/jobs.html?mls
>>> > > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
>>> > >
>>> > >
>>> > >
>>> > > ----- Original Message ----
>>> > > > From: prashant ullegaddi <pr...@gmail.com>
>>> > > > To: java-user@lucene.apache.org
>>> > > > Sent: Monday, August 3, 2009 12:33:46 AM
>>> > > > Subject: How to improve search time?
>>> > > >
>>> > > > Hi,
>>> > > >
>>> > > > I've a single index of size 87GB containing around 50M documents.
>>> When
>>> > I
>>> > > > search for any query,
>>> > > > best search time I observed was 8sec. And when query is expanded with
>>> > > > synonyms, search takes
>>> > > > minutes (~ 2-3min). Is there a better way to search so that overall
>>> > > search
>>> > > > time reduces?
>>> > > >
>>> > > > Thanks,
>>> > > > Prashant.
>>> > >
>>> > >
>>> > > ---------------------------------------------------------------------
>>> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> > > For additional commands, e-mail: java-user-help@lucene.apache.org
>>> > >
>>> > >
>>> >
>>>
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
Send instant messages to your online friends http://in.messenger.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: How to improve search time?

Posted by Shashi Kant <sh...@gmail.com>.

Prashant, I have had better luck with even larger sized indices on
similar platforms. Could you elaborate what types of queries you are
running, Multifield? Boolean? combinations? etc. Also you might want
to remove unnecessary stored fields from the index and move them to a
relational db to squeeze out better performance.


Shashi


On Tue, Aug 4, 2009 at 3:18 AM, prashant
ullegaddi<pr...@gmail.com> wrote:
> I did that as well. Actually, we had 32 indexes initially. We searched them.
> It was even horrible.
> After that I merged them into 4 indexes. And did the same. No gain!
>
> Then, I had to merge 32 indexes into one.
>
> On Tue, Aug 4, 2009 at 10:48 AM, Anshum <an...@gmail.com> wrote:
>
>> Hi Prashant,
>> 8 seconds as the minimum time is a little too much, though considering
>> you're using just 4G of RAM its still ok.
>> I would advice you to break your index into smaller indexes, perhaps
>> selectively query the indexes (if that's possible for your application) and
>> use a parallelmultisearcher. Its just something that you might try and
>> like.
>> All said and done, parallelizing would only get you a bell-curve like
>> performance graph, so you'd have to figure out the sweet spot there.
>>
>> --
>> Anshum Gupta
>> Naukri Labs!
>> http://ai-cafe.blogspot.com
>>
>> The facts expressed here belong to everybody, the opinions to me. The
>> distinction is yours to draw............
>>
>>
>> On Tue, Aug 4, 2009 at 10:08 AM, prashant ullegaddi <
>> prashullegaddi@gmail.com> wrote:
>>
>> > I'm running it on Quadcore, 2.4GHz each, 4GB RAM.
>> >
>> > Prashant.
>> >
>> > On Tue, Aug 4, 2009 at 8:38 AM, Otis Gospodnetic <
>> > otis_gospodnetic@yahoo.com
>> > > wrote:
>> >
>> > > With such a large index be prepared to put it on a server with lots of
>> > RAM
>> > > (even if you follow all the tips from the Wiki).
>> > > When reporting performance numbers, you really ought to tell us about
>> > your
>> > > hardware, types of queries, etc.
>> > >
>> > > Otis
>> > > --
>> > > Sematext is hiring -- http://sematext.com/about/jobs.html?mls
>> > > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
>> > >
>> > >
>> > >
>> > > ----- Original Message ----
>> > > > From: prashant ullegaddi <pr...@gmail.com>
>> > > > To: java-user@lucene.apache.org
>> > > > Sent: Monday, August 3, 2009 12:33:46 AM
>> > > > Subject: How to improve search time?
>> > > >
>> > > > Hi,
>> > > >
>> > > > I've a single index of size 87GB containing around 50M documents.
>> When
>> > I
>> > > > search for any query,
>> > > > best search time I observed was 8sec. And when query is expanded with
>> > > > synonyms, search takes
>> > > > minutes (~ 2-3min). Is there a better way to search so that overall
>> > > search
>> > > > time reduces?
>> > > >
>> > > > Thanks,
>> > > > Prashant.
>> > >
>> > >
>> > > ---------------------------------------------------------------------
>> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > > For additional commands, e-mail: java-user-help@lucene.apache.org
>> > >
>> > >
>> >
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: How to improve search time?

Posted by prashant ullegaddi <pr...@gmail.com>.

I did that as well. Actually, we had 32 indexes initially. We searched them.
It was even horrible.
After that I merged them into 4 indexes. And did the same. No gain!

Then, I had to merge 32 indexes into one.

On Tue, Aug 4, 2009 at 10:48 AM, Anshum <an...@gmail.com> wrote:

> Hi Prashant,
> 8 seconds as the minimum time is a little too much, though considering
> you're using just 4G of RAM its still ok.
> I would advice you to break your index into smaller indexes, perhaps
> selectively query the indexes (if that's possible for your application) and
> use a parallelmultisearcher. Its just something that you might try and
> like.
> All said and done, parallelizing would only get you a bell-curve like
> performance graph, so you'd have to figure out the sweet spot there.
>
> --
> Anshum Gupta
> Naukri Labs!
> http://ai-cafe.blogspot.com
>
> The facts expressed here belong to everybody, the opinions to me. The
> distinction is yours to draw............
>
>
> On Tue, Aug 4, 2009 at 10:08 AM, prashant ullegaddi <
> prashullegaddi@gmail.com> wrote:
>
> > I'm running it on Quadcore, 2.4GHz each, 4GB RAM.
> >
> > Prashant.
> >
> > On Tue, Aug 4, 2009 at 8:38 AM, Otis Gospodnetic <
> > otis_gospodnetic@yahoo.com
> > > wrote:
> >
> > > With such a large index be prepared to put it on a server with lots of
> > RAM
> > > (even if you follow all the tips from the Wiki).
> > > When reporting performance numbers, you really ought to tell us about
> > your
> > > hardware, types of queries, etc.
> > >
> > > Otis
> > > --
> > > Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> > > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
> > >
> > >
> > >
> > > ----- Original Message ----
> > > > From: prashant ullegaddi <pr...@gmail.com>
> > > > To: java-user@lucene.apache.org
> > > > Sent: Monday, August 3, 2009 12:33:46 AM
> > > > Subject: How to improve search time?
> > > >
> > > > Hi,
> > > >
> > > > I've a single index of size 87GB containing around 50M documents.
> When
> > I
> > > > search for any query,
> > > > best search time I observed was 8sec. And when query is expanded with
> > > > synonyms, search takes
> > > > minutes (~ 2-3min). Is there a better way to search so that overall
> > > search
> > > > time reduces?
> > > >
> > > > Thanks,
> > > > Prashant.
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> >
>

Re: How to improve search time?

Posted by Anshum <an...@gmail.com>.

Hi Prashant,
8 seconds as the minimum time is a little too much, though considering
you're using just 4G of RAM its still ok.
I would advice you to break your index into smaller indexes, perhaps
selectively query the indexes (if that's possible for your application) and
use a parallelmultisearcher. Its just something that you might try and like.
All said and done, parallelizing would only get you a bell-curve like
performance graph, so you'd have to figure out the sweet spot there.

--
Anshum Gupta
Naukri Labs!
http://ai-cafe.blogspot.com

The facts expressed here belong to everybody, the opinions to me. The
distinction is yours to draw............


On Tue, Aug 4, 2009 at 10:08 AM, prashant ullegaddi <
prashullegaddi@gmail.com> wrote:

> I'm running it on Quadcore, 2.4GHz each, 4GB RAM.
>
> Prashant.
>
> On Tue, Aug 4, 2009 at 8:38 AM, Otis Gospodnetic <
> otis_gospodnetic@yahoo.com
> > wrote:
>
> > With such a large index be prepared to put it on a server with lots of
> RAM
> > (even if you follow all the tips from the Wiki).
> > When reporting performance numbers, you really ought to tell us about
> your
> > hardware, types of queries, etc.
> >
> > Otis
> > --
> > Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
> >
> >
> >
> > ----- Original Message ----
> > > From: prashant ullegaddi <pr...@gmail.com>
> > > To: java-user@lucene.apache.org
> > > Sent: Monday, August 3, 2009 12:33:46 AM
> > > Subject: How to improve search time?
> > >
> > > Hi,
> > >
> > > I've a single index of size 87GB containing around 50M documents. When
> I
> > > search for any query,
> > > best search time I observed was 8sec. And when query is expanded with
> > > synonyms, search takes
> > > minutes (~ 2-3min). Is there a better way to search so that overall
> > search
> > > time reduces?
> > >
> > > Thanks,
> > > Prashant.
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>

Re: How to improve search time?

Posted by prashant ullegaddi <pr...@gmail.com>.

I'm running it on Quadcore, 2.4GHz each, 4GB RAM.

Prashant.

On Tue, Aug 4, 2009 at 8:38 AM, Otis Gospodnetic <otis_gospodnetic@yahoo.com
> wrote:

> With such a large index be prepared to put it on a server with lots of RAM
> (even if you follow all the tips from the Wiki).
> When reporting performance numbers, you really ought to tell us about your
> hardware, types of queries, etc.
>
> Otis
> --
> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
>
>
>
> ----- Original Message ----
> > From: prashant ullegaddi <pr...@gmail.com>
> > To: java-user@lucene.apache.org
> > Sent: Monday, August 3, 2009 12:33:46 AM
> > Subject: How to improve search time?
> >
> > Hi,
> >
> > I've a single index of size 87GB containing around 50M documents. When I
> > search for any query,
> > best search time I observed was 8sec. And when query is expanded with
> > synonyms, search takes
> > minutes (~ 2-3min). Is there a better way to search so that overall
> search
> > time reduces?
> >
> > Thanks,
> > Prashant.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: How to improve search time?

Posted by Otis Gospodnetic <ot...@yahoo.com>.

With such a large index be prepared to put it on a server with lots of RAM (even if you follow all the tips from the Wiki).
When reporting performance numbers, you really ought to tell us about your hardware, types of queries, etc.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



----- Original Message ----
> From: prashant ullegaddi <pr...@gmail.com>
> To: java-user@lucene.apache.org
> Sent: Monday, August 3, 2009 12:33:46 AM
> Subject: How to improve search time?
> 
> Hi,
> 
> I've a single index of size 87GB containing around 50M documents. When I
> search for any query,
> best search time I observed was 8sec. And when query is expanded with
> synonyms, search takes
> minutes (~ 2-3min). Is there a better way to search so that overall search
> time reduces?
> 
> Thanks,
> Prashant.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: How to improve search time?

Posted by Phil Whelan <ph...@gmail.com>.

Hi Prashant,

Take a look at this...
http://wiki.apache.org/lucene-java/ImproveSearchingSpeed

Cheers,
Phil

On Sun, Aug 2, 2009 at 9:33 PM, prashant
ullegaddi<pr...@gmail.com> wrote:
> Hi,
>
> I've a single index of size 87GB containing around 50M documents. When I
> search for any query,
> best search time I observed was 8sec. And when query is expanded with
> synonyms, search takes
> minutes (~ 2-3min). Is there a better way to search so that overall search
> time reduces?
>
> Thanks,
> Prashant.
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org