You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by keshav prawasi <kp...@gmail.com> on 2012/02/17 11:19:44 UTC

Higher Latency at the start of application

Hi,

 As part of our application we have to search over a Lucene Index of size ~
8 GB containing ~ 40 million document units. We not only search the
documents but also have to retrieve them for further ranking according to
our use case and present the top results to the user. So retrieval of hits
from the index is the biggest and the only significant contributor in the
overall latency in our application. Still latency in the steady state of
application(when it has run for more than a day or two) is acceptable and
quite good. We use the index only for reading and do not perform any
simultaneous writes or deletes etc. that would interfere with steady state
of the running application. We open and read the index as
FSDirectory(READ_ONLY).

 Still, when we start our application, latency for initial few requests is
very high compared to steady state. It could be due to no caching initially
at the OS end. So we are trying to look into modification options in our
code that would bring latency even at the start of application within an
acceptable margin. I wanted to know what options are available and suitable
to my use case. Using RAMDirectory isn't a good idea as the index size is
already little infeasible for that and the size might grow in future.
Should we split the index into sub-indices and load some of them into
RAMDirectory? If yes, what would be implications in terms of overall
latency(we now have to collate results from all the sub-indices). Can you
guys give me some other suggestions for this?

Thanks,
Keshav

Re: Higher Latency at the start of application

Posted by Stephen Howe <si...@gmail.com>.
If you have enough RAM, have you looked into using a MMap directory instead
of the FSDirectory? That will push your index up into RAM instead of having
to keep hitting the disk to look stuff up.

Also, have you thought about running some precanned queries, prior to
exposing the app to the user, to trigger the caching and improve
performance?

On Fri, Feb 17, 2012 at 5:19 AM, keshav prawasi <kp...@gmail.com> wrote:

> Hi,
>
>  As part of our application we have to search over a Lucene Index of size ~
> 8 GB containing ~ 40 million document units. We not only search the
> documents but also have to retrieve them for further ranking according to
> our use case and present the top results to the user. So retrieval of hits
> from the index is the biggest and the only significant contributor in the
> overall latency in our application. Still latency in the steady state of
> application(when it has run for more than a day or two) is acceptable and
> quite good. We use the index only for reading and do not perform any
> simultaneous writes or deletes etc. that would interfere with steady state
> of the running application. We open and read the index as
> FSDirectory(READ_ONLY).
>
>  Still, when we start our application, latency for initial few requests is
> very high compared to steady state. It could be due to no caching initially
> at the OS end. So we are trying to look into modification options in our
> code that would bring latency even at the start of application within an
> acceptable margin. I wanted to know what options are available and suitable
> to my use case. Using RAMDirectory isn't a good idea as the index size is
> already little infeasible for that and the size might grow in future.
> Should we split the index into sub-indices and load some of them into
> RAMDirectory? If yes, what would be implications in terms of overall
> latency(we now have to collate results from all the sub-indices). Can you
> guys give me some other suggestions for this?
>
> Thanks,
> Keshav
>