You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@tomcat.apache.org by Tim Funk <fu...@joedog.org> on 2007/03/01 14:00:11 UTC

[OT] Re: Lucene and DB speeds

A common way is to add timers around the commands and then using a 
logging library to output those times (under info, debug, or trace)

You'll probably get more help from the lucene user list

-Tim

Sriram Narayanan wrote:
> Hi all:
> 
> When I query for large datasets, I see a delay between the search and
> the results. For e.g., if I were to query a database containing about
> 2 GB of data, with a 400+ MB Licene index, then I get search results
> after anywhere between 5 seconds to 35 seconds.
> 
> Any tips on how I could test for where the search and retrival is taking 
> time ?

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: [OT] Re: Lucene and DB speeds

Posted by Sriram Narayanan <sr...@gmail.com>.
On 3/1/07, Christopher Schultz <ch...@christopherschultz.net> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Sriram,
>
> 400MB is not really that big of an index. A friend of mine runs a Lucene
> index at the US Library of Congress that is several GB and they search
> it /very/ quickly. Of course, they have some monstrous hardware, too.
>
Ack. We'll change the hardware, etc, but first ask on the lucene user list.

> >> Any tips on how I could test for where the search and retrieval is
> >> taking time ?
>
> Tim Funk wrote:
> > A common way is to add timers around the commands and then using a
> > logging library to output those times (under info, debug, or trace)
>
> Lucens is a pretty opaque API, unfortunately. I'm guessing that if it's
> taking up to 35 seconds to get a response from the index, there's not
> much more benchmarking that could be done.
>
> Then again, Sriram, you should check to make sure that the entire 35
> seconds is taken up by Lucene calls. IS it possible that your Lucene
> search is only part of the time? For instance, it is common practice to
> search a Lucene index and then use those results to query a database. Is
> it possible that other parts of your transaction are dominating the
> wall-clock time?
>

Ok. We've got a lot of insights into what we were doing wrong, and we
intend to post those results today to the correct mailing list.

> > You'll probably get more help from the Lucene user list.
>

Ack. I have some new figures on that front.

> Absolutely. You've hit the Apache Tomcat mailing list, which is a web
> application server. You didn't even mention that you were talking about
> a web application ;)
>

What a dumb mistake to make. I'm a member of jackrabbit-user, and I
mis posted my mail to tomcat-user instead !!!

> Good luck,
> - -chris
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.6 (MingW32)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFF5vsJ9CaO5/Lv0PARAmokAKCIpy0haQOVp7+I8Av3TFCb3R8kIACePEo2
> WSWEuq6C77JhFdNa3pIrkho=
> =GRQ2
> -----END PGP SIGNATURE-----
>
> ---------------------------------------------------------------------
> To start a new topic, e-mail: users@tomcat.apache.org
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org
>
>

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: [OT] Re: Lucene and DB speeds

Posted by Christopher Schultz <ch...@christopherschultz.net>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Sriram,

>> When I query for large datasets, I see a delay between the search and
>> the results. For e.g., if I were to query a database containing about
>> 2 GB of data, with a 400+ MB Licene index, then I get search results
>> after anywhere between 5 seconds to 35 seconds.

400MB is not really that big of an index. A friend of mine runs a Lucene
index at the US Library of Congress that is several GB and they search
it /very/ quickly. Of course, they have some monstrous hardware, too.

>> Any tips on how I could test for where the search and retrieval is
>> taking time ?

Tim Funk wrote:
> A common way is to add timers around the commands and then using a
> logging library to output those times (under info, debug, or trace)

Lucens is a pretty opaque API, unfortunately. I'm guessing that if it's
taking up to 35 seconds to get a response from the index, there's not
much more benchmarking that could be done.

Then again, Sriram, you should check to make sure that the entire 35
seconds is taken up by Lucene calls. IS it possible that your Lucene
search is only part of the time? For instance, it is common practice to
search a Lucene index and then use those results to query a database. Is
it possible that other parts of your transaction are dominating the
wall-clock time?

> You'll probably get more help from the Lucene user list.

Absolutely. You've hit the Apache Tomcat mailing list, which is a web
application server. You didn't even mention that you were talking about
a web application ;)

Good luck,
- -chris

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFF5vsJ9CaO5/Lv0PARAmokAKCIpy0haQOVp7+I8Av3TFCb3R8kIACePEo2
WSWEuq6C77JhFdNa3pIrkho=
=GRQ2
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org