You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Li Li <fa...@gmail.com> on 2012/06/08 10:30:38 UTC

what's better for in memory searching?

hi all
   I want to use lucene 3.6 providing searching service. my data is
not very large, raw data is less that 1GB and I want to use load all
indexes into memory. also I need save all indexes into disk
persistently.
   I originally want to use RAMDirectory. But when I read its javadoc.

   Warning: This class is not intended to work with huge indexes.
Everything beyond several hundred megabytes
 will waste resources (GC cycles), because it uses an internal buffer
size of 1024 bytes, producing millions of byte
 [1024] arrays. This class is optimized for small memory-resident
indexes. It also has bad concurrency on
 multithreaded environments.
It is recommended to materialize large indexes on disk and use
MMapDirectory, which is a high-performance
 directory implementation working directly on the file system cache of
the operating system, so copying data to
 Java heap space is not useful.

    should I use MMapDirectory? it seems another contrib instantiated.
anyone test it with RAMDirectory?

Re: what's better for in memory searching?

Posted by Mikhail Khludnev <mk...@griddynamics.com>.

If I get it right, it's kind of per process setting swappiness.

On Tue, Jun 12, 2012 at 3:57 AM, Li Li <fa...@gmail.com> wrote:

> is this method equivalent to set vm.swappiness which is global?
> or it can set the swappiness for jvm process?
>
> On Tue, Jun 12, 2012 at 5:11 AM, Mikhail Khludnev
> <mk...@griddynamics.com> wrote:
> > Point about premature optimization makes sense for me. However some time
> > ago I've bookmarked potentially useful approach
> >
> http://lucene.472066.n3.nabble.com/High-response-time-after-being-idle-tp3616599p3617604.html
> .
> >
> > On Mon, Jun 11, 2012 at 3:02 PM, Toke Eskildsen <te@statsbiblioteket.dk
> >wrote:
> >
> >> On Mon, 2012-06-11 at 11:38 +0200, Li Li wrote:
> >> > yes, I need average query time less than 10 ms. The faster the better.
> >> > I have enough memory for lucene because I know there are not too much
> >> > data. there are not many modifications. every day there are about
> >> > hundreds of document update. if indexes are not in physical memory,
> >> > then IO operations will cost a few ms.
> >>
> >> I'm with Michael on this one: It seems that you're doing a premature
> >> optimization. Guessing that your final index will be < 5GB in size with
> >> 1 million documents (give or take 900.000:-), relatively simple queries
> >> and so on, an average response time of 10 ms should be attainable even
> >> on spinning drives. One hundred document updates per day are not many,
> >> so again I would not expect problems.
> >>
> >> As is often the case on this mailing list, the advice is "try it". Using
> >> a normal on-disk index and doing some warm up is the easy solution to
> >> implement and nearly all of your work on this will be usable for a
> >> RAM-based solution, if you are not satisfied with the speed. Or you
> >> could buy a small & cheap SSD and have no more worries...
> >>
> >> Regards,
> >> Toke Eskildsen
> >>
> >>
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> > Tech Lead
> > Grid Dynamics
> >
> > <http://www.griddynamics.com>
> >  <mk...@griddynamics.com>
>



-- 
Sincerely yours
Mikhail Khludnev
Tech Lead
Grid Dynamics

<http://www.griddynamics.com>
 <mk...@griddynamics.com>

Re: what's better for in memory searching?

Posted by Li Li <fa...@gmail.com>.

is this method equivalent to set vm.swappiness which is global?
or it can set the swappiness for jvm process?

On Tue, Jun 12, 2012 at 5:11 AM, Mikhail Khludnev
<mk...@griddynamics.com> wrote:
> Point about premature optimization makes sense for me. However some time
> ago I've bookmarked potentially useful approach
> http://lucene.472066.n3.nabble.com/High-response-time-after-being-idle-tp3616599p3617604.html.
>
> On Mon, Jun 11, 2012 at 3:02 PM, Toke Eskildsen <te...@statsbiblioteket.dk>wrote:
>
>> On Mon, 2012-06-11 at 11:38 +0200, Li Li wrote:
>> > yes, I need average query time less than 10 ms. The faster the better.
>> > I have enough memory for lucene because I know there are not too much
>> > data. there are not many modifications. every day there are about
>> > hundreds of document update. if indexes are not in physical memory,
>> > then IO operations will cost a few ms.
>>
>> I'm with Michael on this one: It seems that you're doing a premature
>> optimization. Guessing that your final index will be < 5GB in size with
>> 1 million documents (give or take 900.000:-), relatively simple queries
>> and so on, an average response time of 10 ms should be attainable even
>> on spinning drives. One hundred document updates per day are not many,
>> so again I would not expect problems.
>>
>> As is often the case on this mailing list, the advice is "try it". Using
>> a normal on-disk index and doing some warm up is the easy solution to
>> implement and nearly all of your work on this will be usable for a
>> RAM-based solution, if you are not satisfied with the speed. Or you
>> could buy a small & cheap SSD and have no more worries...
>>
>> Regards,
>> Toke Eskildsen
>>
>>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Tech Lead
> Grid Dynamics
>
> <http://www.griddynamics.com>
>  <mk...@griddynamics.com>

Re: what's better for in memory searching?

Posted by Mikhail Khludnev <mk...@griddynamics.com>.

Point about premature optimization makes sense for me. However some time
ago I've bookmarked potentially useful approach
http://lucene.472066.n3.nabble.com/High-response-time-after-being-idle-tp3616599p3617604.html.

On Mon, Jun 11, 2012 at 3:02 PM, Toke Eskildsen <te...@statsbiblioteket.dk>wrote:

> On Mon, 2012-06-11 at 11:38 +0200, Li Li wrote:
> > yes, I need average query time less than 10 ms. The faster the better.
> > I have enough memory for lucene because I know there are not too much
> > data. there are not many modifications. every day there are about
> > hundreds of document update. if indexes are not in physical memory,
> > then IO operations will cost a few ms.
>
> I'm with Michael on this one: It seems that you're doing a premature
> optimization. Guessing that your final index will be < 5GB in size with
> 1 million documents (give or take 900.000:-), relatively simple queries
> and so on, an average response time of 10 ms should be attainable even
> on spinning drives. One hundred document updates per day are not many,
> so again I would not expect problems.
>
> As is often the case on this mailing list, the advice is "try it". Using
> a normal on-disk index and doing some warm up is the easy solution to
> implement and nearly all of your work on this will be usable for a
> RAM-based solution, if you are not satisfied with the speed. Or you
> could buy a small & cheap SSD and have no more worries...
>
> Regards,
> Toke Eskildsen
>
>


-- 
Sincerely yours
Mikhail Khludnev
Tech Lead
Grid Dynamics

<http://www.griddynamics.com>
 <mk...@griddynamics.com>

Re: what's better for in memory searching?

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.

On Mon, 2012-06-11 at 11:38 +0200, Li Li wrote:
> yes, I need average query time less than 10 ms. The faster the better.
> I have enough memory for lucene because I know there are not too much
> data. there are not many modifications. every day there are about
> hundreds of document update. if indexes are not in physical memory,
> then IO operations will cost a few ms.

I'm with Michael on this one: It seems that you're doing a premature
optimization. Guessing that your final index will be < 5GB in size with
1 million documents (give or take 900.000:-), relatively simple queries
and so on, an average response time of 10 ms should be attainable even
on spinning drives. One hundred document updates per day are not many,
so again I would not expect problems.

As is often the case on this mailing list, the advice is "try it". Using
a normal on-disk index and doing some warm up is the easy solution to
implement and nearly all of your work on this will be usable for a
RAM-based solution, if you are not satisfied with the speed. Or you
could buy a small & cheap SSD and have no more worries...

Regards,
Toke Eskildsen

Re: what's better for in memory searching?

Posted by Li Li <fa...@gmail.com>.

yes, I need average query time less than 10 ms. The faster the better.
I have enough memory for lucene because I know there are not too much
data. there are not many modifications. every day there are about
hundreds of document update. if indexes are not in physical memory,
then IO operations will cost a few ms.
btw, the full gc may also add uncertainty, So I need optimize it as
much as possible.
On Mon, Jun 11, 2012 at 5:27 PM, Michael Kuhlmann <ku...@solarier.de> wrote:
> You cannot guarantee this when you're running out of RAM. You'd have a
> problem then anyway.
>
> Why are you caring that much? Did you yet have performance issues? 1GB
> should load really fast, and both auto warming and OS cache should help a
> lot as well. With such an index, you usually don't need to fine tune
> performance that much.
>
> Did you think about using a SSD? Since you want to persist your index,
> you'll need to live with disk IO anyway.
>
> Greetings,
> Kuli
>
> Am 11.06.2012 11:20, schrieb Li Li:
>
>> I am sorry. I make a mistake. even use RAMDirectory, I can not
>> guarantee they are not swapped out.
>>
>> On Mon, Jun 11, 2012 at 4:45 PM, Michael Kuhlmann<ku...@solarier.de>
>>  wrote:
>>>
>>> Set the swapiness to 0 to avoid memory pages being swapped to disk too
>>> early.
>>>
>>> http://en.wikipedia.org/wiki/Swappiness
>>>
>>> -Kuli
>>>
>>> Am 11.06.2012 10:38, schrieb Li Li:
>>>
>>>> I have roughly read the codes of RAMDirectory. it use a list of 1024
>>>> byte arrays and many overheads.
>>>> But as far as I know, using MMapDirectory, I can't prevent the page
>>>> faults. OS will swap less frequent pages out. Even if I allocate
>>>> enough memory for JVM, I can guarantee all the files in the directory
>>>> are in memory. am I understanding right? if it is, then some less
>>>> frequent queries will be slow.  How can I let them always in memory?
>>>>
>>>> On Fri, Jun 8, 2012 at 5:53 PM, Lance Norskog<go...@gmail.com>
>>>>  wrote:
>>>>>
>>>>>
>>>>> Yes, use MMapDirectory. It is faster and uses memory more efficiently
>>>>> than RAMDirectory. This sounds wrong, but it is true. With
>>>>> RAMDirectory, Java has to work harder doing garbage collection.
>>>>>
>>>>> On Fri, Jun 8, 2012 at 1:30 AM, Li Li<fa...@gmail.com>    wrote:
>>>>>>
>>>>>>
>>>>>> hi all
>>>>>>   I want to use lucene 3.6 providing searching service. my data is
>>>>>> not very large, raw data is less that 1GB and I want to use load all
>>>>>> indexes into memory. also I need save all indexes into disk
>>>>>> persistently.
>>>>>>   I originally want to use RAMDirectory. But when I read its javadoc.
>>>>>>
>>>>>>   Warning: This class is not intended to work with huge indexes.
>>>>>> Everything beyond several hundred megabytes
>>>>>>  will waste resources (GC cycles), because it uses an internal buffer
>>>>>> size of 1024 bytes, producing millions of byte
>>>>>>  [1024] arrays. This class is optimized for small memory-resident
>>>>>> indexes. It also has bad concurrency on
>>>>>>  multithreaded environments.
>>>>>> It is recommended to materialize large indexes on disk and use
>>>>>> MMapDirectory, which is a high-performance
>>>>>>  directory implementation working directly on the file system cache of
>>>>>> the operating system, so copying data to
>>>>>>  Java heap space is not useful.
>>>>>>
>>>>>>    should I use MMapDirectory? it seems another contrib instantiated.
>>>>>> anyone test it with RAMDirectory?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Lance Norskog
>>>>> goksron@gmail.com
>>>
>>>
>>>
>

Re: what's better for in memory searching?

Posted by Michael Kuhlmann <ku...@solarier.de>.

You cannot guarantee this when you're running out of RAM. You'd have a 
problem then anyway.

Why are you caring that much? Did you yet have performance issues? 1GB 
should load really fast, and both auto warming and OS cache should help 
a lot as well. With such an index, you usually don't need to fine tune 
performance that much.

Did you think about using a SSD? Since you want to persist your index, 
you'll need to live with disk IO anyway.

Greetings,
Kuli

Am 11.06.2012 11:20, schrieb Li Li:
> I am sorry. I make a mistake. even use RAMDirectory, I can not
> guarantee they are not swapped out.
>
> On Mon, Jun 11, 2012 at 4:45 PM, Michael Kuhlmann<ku...@solarier.de>  wrote:
>> Set the swapiness to 0 to avoid memory pages being swapped to disk too
>> early.
>>
>> http://en.wikipedia.org/wiki/Swappiness
>>
>> -Kuli
>>
>> Am 11.06.2012 10:38, schrieb Li Li:
>>
>>> I have roughly read the codes of RAMDirectory. it use a list of 1024
>>> byte arrays and many overheads.
>>> But as far as I know, using MMapDirectory, I can't prevent the page
>>> faults. OS will swap less frequent pages out. Even if I allocate
>>> enough memory for JVM, I can guarantee all the files in the directory
>>> are in memory. am I understanding right? if it is, then some less
>>> frequent queries will be slow.  How can I let them always in memory?
>>>
>>> On Fri, Jun 8, 2012 at 5:53 PM, Lance Norskog<go...@gmail.com>    wrote:
>>>>
>>>> Yes, use MMapDirectory. It is faster and uses memory more efficiently
>>>> than RAMDirectory. This sounds wrong, but it is true. With
>>>> RAMDirectory, Java has to work harder doing garbage collection.
>>>>
>>>> On Fri, Jun 8, 2012 at 1:30 AM, Li Li<fa...@gmail.com>    wrote:
>>>>>
>>>>> hi all
>>>>>    I want to use lucene 3.6 providing searching service. my data is
>>>>> not very large, raw data is less that 1GB and I want to use load all
>>>>> indexes into memory. also I need save all indexes into disk
>>>>> persistently.
>>>>>    I originally want to use RAMDirectory. But when I read its javadoc.
>>>>>
>>>>>    Warning: This class is not intended to work with huge indexes.
>>>>> Everything beyond several hundred megabytes
>>>>>   will waste resources (GC cycles), because it uses an internal buffer
>>>>> size of 1024 bytes, producing millions of byte
>>>>>   [1024] arrays. This class is optimized for small memory-resident
>>>>> indexes. It also has bad concurrency on
>>>>>   multithreaded environments.
>>>>> It is recommended to materialize large indexes on disk and use
>>>>> MMapDirectory, which is a high-performance
>>>>>   directory implementation working directly on the file system cache of
>>>>> the operating system, so copying data to
>>>>>   Java heap space is not useful.
>>>>>
>>>>>     should I use MMapDirectory? it seems another contrib instantiated.
>>>>> anyone test it with RAMDirectory?
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Lance Norskog
>>>> goksron@gmail.com
>>
>>

Re: what's better for in memory searching?

Posted by Li Li <fa...@gmail.com>.

I am sorry. I make a mistake. even use RAMDirectory, I can not
guarantee they are not swapped out.

On Mon, Jun 11, 2012 at 4:45 PM, Michael Kuhlmann <ku...@solarier.de> wrote:
> Set the swapiness to 0 to avoid memory pages being swapped to disk too
> early.
>
> http://en.wikipedia.org/wiki/Swappiness
>
> -Kuli
>
> Am 11.06.2012 10:38, schrieb Li Li:
>
>> I have roughly read the codes of RAMDirectory. it use a list of 1024
>> byte arrays and many overheads.
>> But as far as I know, using MMapDirectory, I can't prevent the page
>> faults. OS will swap less frequent pages out. Even if I allocate
>> enough memory for JVM, I can guarantee all the files in the directory
>> are in memory. am I understanding right? if it is, then some less
>> frequent queries will be slow.  How can I let them always in memory?
>>
>> On Fri, Jun 8, 2012 at 5:53 PM, Lance Norskog<go...@gmail.com>  wrote:
>>>
>>> Yes, use MMapDirectory. It is faster and uses memory more efficiently
>>> than RAMDirectory. This sounds wrong, but it is true. With
>>> RAMDirectory, Java has to work harder doing garbage collection.
>>>
>>> On Fri, Jun 8, 2012 at 1:30 AM, Li Li<fa...@gmail.com>  wrote:
>>>>
>>>> hi all
>>>>   I want to use lucene 3.6 providing searching service. my data is
>>>> not very large, raw data is less that 1GB and I want to use load all
>>>> indexes into memory. also I need save all indexes into disk
>>>> persistently.
>>>>   I originally want to use RAMDirectory. But when I read its javadoc.
>>>>
>>>>   Warning: This class is not intended to work with huge indexes.
>>>> Everything beyond several hundred megabytes
>>>>  will waste resources (GC cycles), because it uses an internal buffer
>>>> size of 1024 bytes, producing millions of byte
>>>>  [1024] arrays. This class is optimized for small memory-resident
>>>> indexes. It also has bad concurrency on
>>>>  multithreaded environments.
>>>> It is recommended to materialize large indexes on disk and use
>>>> MMapDirectory, which is a high-performance
>>>>  directory implementation working directly on the file system cache of
>>>> the operating system, so copying data to
>>>>  Java heap space is not useful.
>>>>
>>>>    should I use MMapDirectory? it seems another contrib instantiated.
>>>> anyone test it with RAMDirectory?
>>>
>>>
>>>
>>>
>>> --
>>> Lance Norskog
>>> goksron@gmail.com
>
>

Re: what's better for in memory searching?

Posted by Paul Libbrecht <pa...@hoplahup.net>.

Le 11 juin 2012 à 11:16, Li Li a écrit :

> do you mean software RAM disk?

Right. OS level.

> using RAM to simulate disk?

Yes.
That generally makes a disk which is boost fast in reading and writing.

> How to deal with Persistence?

Synchronization (slaving?).

paul

Re: what's better for in memory searching?

Posted by Li Li <fa...@gmail.com>.

do you mean software RAM disk? using RAM to simulate disk? How to deal
with Persistence?

maybe I can hack by increase RAMOutputStream.BUFFER_SIZE from 1024 to 1024*1024.
it may have a waste. but I can adjust my merge policy to avoid to much segments.
I will have a "big" segment and a "small" segment. Every night I will
merge them. new added documents will flush into a new segment and I
will merge the new generated segment and the small one.
Our update operations are not very frequent.

On Mon, Jun 11, 2012 at 4:59 PM, Paul Libbrecht <pa...@hoplahup.net> wrote:
> Li Li,
>
> have you considered allocating a RAM-Disk?
> It's not the most flexible thing... but it's certainly close, in performance to a RAMDirectory.
> MMapping on that is likely to be useless but I doubt you can set it to zero.
> That'd need experiment.
>
> Also, doesn't caching and auto-warming provide the lowest latency for all "expected queries" ?
>
> Paul
>
>
> Le 11 juin 2012 à 10:50, Li Li a écrit :
>
>>   I want to use lucene 3.6 providing searching service. my data is
>> not very large, raw data is less that 1GB and I want to use load all
>> indexes into memory. also I need save all indexes into disk
>> persistently.
>>   I originally want to use RAMDirectory. But when I read its javadoc.
>
>

Re: what's better for in memory searching?

Posted by Paul Libbrecht <pa...@hoplahup.net>.

Li Li,

have you considered allocating a RAM-Disk?
It's not the most flexible thing... but it's certainly close, in performance to a RAMDirectory.
MMapping on that is likely to be useless but I doubt you can set it to zero.
That'd need experiment.

Also, doesn't caching and auto-warming provide the lowest latency for all "expected queries" ?

Paul


Le 11 juin 2012 à 10:50, Li Li a écrit :

>   I want to use lucene 3.6 providing searching service. my data is
> not very large, raw data is less that 1GB and I want to use load all
> indexes into memory. also I need save all indexes into disk
> persistently.
>   I originally want to use RAMDirectory. But when I read its javadoc.

Re: what's better for in memory searching?

Posted by Li Li <fa...@gmail.com>.

1. this setting is global, I just want my lucene searching program
don't swap. for other less important programs, it can still swap.
2. do I need call MappedByteBuffer.load() explicitly? or I have to
warm up the indexes to guarantee all my files are in physical memory?

On Mon, Jun 11, 2012 at 4:45 PM, Michael Kuhlmann <ku...@solarier.de> wrote:
> Set the swapiness to 0 to avoid memory pages being swapped to disk too
> early.
>
> http://en.wikipedia.org/wiki/Swappiness
>
> -Kuli
>
> Am 11.06.2012 10:38, schrieb Li Li:
>
>> I have roughly read the codes of RAMDirectory. it use a list of 1024
>> byte arrays and many overheads.
>> But as far as I know, using MMapDirectory, I can't prevent the page
>> faults. OS will swap less frequent pages out. Even if I allocate
>> enough memory for JVM, I can guarantee all the files in the directory
>> are in memory. am I understanding right? if it is, then some less
>> frequent queries will be slow.  How can I let them always in memory?
>>
>> On Fri, Jun 8, 2012 at 5:53 PM, Lance Norskog<go...@gmail.com>  wrote:
>>>
>>> Yes, use MMapDirectory. It is faster and uses memory more efficiently
>>> than RAMDirectory. This sounds wrong, but it is true. With
>>> RAMDirectory, Java has to work harder doing garbage collection.
>>>
>>> On Fri, Jun 8, 2012 at 1:30 AM, Li Li<fa...@gmail.com>  wrote:
>>>>
>>>> hi all
>>>>   I want to use lucene 3.6 providing searching service. my data is
>>>> not very large, raw data is less that 1GB and I want to use load all
>>>> indexes into memory. also I need save all indexes into disk
>>>> persistently.
>>>>   I originally want to use RAMDirectory. But when I read its javadoc.
>>>>
>>>>   Warning: This class is not intended to work with huge indexes.
>>>> Everything beyond several hundred megabytes
>>>>  will waste resources (GC cycles), because it uses an internal buffer
>>>> size of 1024 bytes, producing millions of byte
>>>>  [1024] arrays. This class is optimized for small memory-resident
>>>> indexes. It also has bad concurrency on
>>>>  multithreaded environments.
>>>> It is recommended to materialize large indexes on disk and use
>>>> MMapDirectory, which is a high-performance
>>>>  directory implementation working directly on the file system cache of
>>>> the operating system, so copying data to
>>>>  Java heap space is not useful.
>>>>
>>>>    should I use MMapDirectory? it seems another contrib instantiated.
>>>> anyone test it with RAMDirectory?
>>>
>>>
>>>
>>>
>>> --
>>> Lance Norskog
>>> goksron@gmail.com
>
>

Re: what's better for in memory searching?

Posted by Li Li <fa...@gmail.com>.

I found this. http://unix.stackexchange.com/questions/10214/per-process-swapiness-for-linux
it can provide  fine grained control of swapping

On Mon, Jun 11, 2012 at 4:45 PM, Michael Kuhlmann <ku...@solarier.de> wrote:
> Set the swapiness to 0 to avoid memory pages being swapped to disk too
> early.
>
> http://en.wikipedia.org/wiki/Swappiness
>
> -Kuli
>
> Am 11.06.2012 10:38, schrieb Li Li:
>
>> I have roughly read the codes of RAMDirectory. it use a list of 1024
>> byte arrays and many overheads.
>> But as far as I know, using MMapDirectory, I can't prevent the page
>> faults. OS will swap less frequent pages out. Even if I allocate
>> enough memory for JVM, I can guarantee all the files in the directory
>> are in memory. am I understanding right? if it is, then some less
>> frequent queries will be slow.  How can I let them always in memory?
>>
>> On Fri, Jun 8, 2012 at 5:53 PM, Lance Norskog<go...@gmail.com>  wrote:
>>>
>>> Yes, use MMapDirectory. It is faster and uses memory more efficiently
>>> than RAMDirectory. This sounds wrong, but it is true. With
>>> RAMDirectory, Java has to work harder doing garbage collection.
>>>
>>> On Fri, Jun 8, 2012 at 1:30 AM, Li Li<fa...@gmail.com>  wrote:
>>>>
>>>> hi all
>>>>   I want to use lucene 3.6 providing searching service. my data is
>>>> not very large, raw data is less that 1GB and I want to use load all
>>>> indexes into memory. also I need save all indexes into disk
>>>> persistently.
>>>>   I originally want to use RAMDirectory. But when I read its javadoc.
>>>>
>>>>   Warning: This class is not intended to work with huge indexes.
>>>> Everything beyond several hundred megabytes
>>>>  will waste resources (GC cycles), because it uses an internal buffer
>>>> size of 1024 bytes, producing millions of byte
>>>>  [1024] arrays. This class is optimized for small memory-resident
>>>> indexes. It also has bad concurrency on
>>>>  multithreaded environments.
>>>> It is recommended to materialize large indexes on disk and use
>>>> MMapDirectory, which is a high-performance
>>>>  directory implementation working directly on the file system cache of
>>>> the operating system, so copying data to
>>>>  Java heap space is not useful.
>>>>
>>>>    should I use MMapDirectory? it seems another contrib instantiated.
>>>> anyone test it with RAMDirectory?
>>>
>>>
>>>
>>>
>>> --
>>> Lance Norskog
>>> goksron@gmail.com
>
>

Re: what's better for in memory searching?

Posted by Michael Kuhlmann <ku...@solarier.de>.

Set the swapiness to 0 to avoid memory pages being swapped to disk too 
early.

http://en.wikipedia.org/wiki/Swappiness

-Kuli

Am 11.06.2012 10:38, schrieb Li Li:
> I have roughly read the codes of RAMDirectory. it use a list of 1024
> byte arrays and many overheads.
> But as far as I know, using MMapDirectory, I can't prevent the page
> faults. OS will swap less frequent pages out. Even if I allocate
> enough memory for JVM, I can guarantee all the files in the directory
> are in memory. am I understanding right? if it is, then some less
> frequent queries will be slow.  How can I let them always in memory?
>
> On Fri, Jun 8, 2012 at 5:53 PM, Lance Norskog<go...@gmail.com>  wrote:
>> Yes, use MMapDirectory. It is faster and uses memory more efficiently
>> than RAMDirectory. This sounds wrong, but it is true. With
>> RAMDirectory, Java has to work harder doing garbage collection.
>>
>> On Fri, Jun 8, 2012 at 1:30 AM, Li Li<fa...@gmail.com>  wrote:
>>> hi all
>>>    I want to use lucene 3.6 providing searching service. my data is
>>> not very large, raw data is less that 1GB and I want to use load all
>>> indexes into memory. also I need save all indexes into disk
>>> persistently.
>>>    I originally want to use RAMDirectory. But when I read its javadoc.
>>>
>>>    Warning: This class is not intended to work with huge indexes.
>>> Everything beyond several hundred megabytes
>>>   will waste resources (GC cycles), because it uses an internal buffer
>>> size of 1024 bytes, producing millions of byte
>>>   [1024] arrays. This class is optimized for small memory-resident
>>> indexes. It also has bad concurrency on
>>>   multithreaded environments.
>>> It is recommended to materialize large indexes on disk and use
>>> MMapDirectory, which is a high-performance
>>>   directory implementation working directly on the file system cache of
>>> the operating system, so copying data to
>>>   Java heap space is not useful.
>>>
>>>     should I use MMapDirectory? it seems another contrib instantiated.
>>> anyone test it with RAMDirectory?
>>
>>
>>
>> --
>> Lance Norskog
>> goksron@gmail.com

Re: what's better for in memory searching?

Posted by Li Li <fa...@gmail.com>.

I have roughly read the codes of RAMDirectory. it use a list of 1024
byte arrays and many overheads.
But as far as I know, using MMapDirectory, I can't prevent the page
faults. OS will swap less frequent pages out. Even if I allocate
enough memory for JVM, I can guarantee all the files in the directory
are in memory. am I understanding right? if it is, then some less
frequent queries will be slow.  How can I let them always in memory?

On Fri, Jun 8, 2012 at 5:53 PM, Lance Norskog <go...@gmail.com> wrote:
> Yes, use MMapDirectory. It is faster and uses memory more efficiently
> than RAMDirectory. This sounds wrong, but it is true. With
> RAMDirectory, Java has to work harder doing garbage collection.
>
> On Fri, Jun 8, 2012 at 1:30 AM, Li Li <fa...@gmail.com> wrote:
>> hi all
>>   I want to use lucene 3.6 providing searching service. my data is
>> not very large, raw data is less that 1GB and I want to use load all
>> indexes into memory. also I need save all indexes into disk
>> persistently.
>>   I originally want to use RAMDirectory. But when I read its javadoc.
>>
>>   Warning: This class is not intended to work with huge indexes.
>> Everything beyond several hundred megabytes
>>  will waste resources (GC cycles), because it uses an internal buffer
>> size of 1024 bytes, producing millions of byte
>>  [1024] arrays. This class is optimized for small memory-resident
>> indexes. It also has bad concurrency on
>>  multithreaded environments.
>> It is recommended to materialize large indexes on disk and use
>> MMapDirectory, which is a high-performance
>>  directory implementation working directly on the file system cache of
>> the operating system, so copying data to
>>  Java heap space is not useful.
>>
>>    should I use MMapDirectory? it seems another contrib instantiated.
>> anyone test it with RAMDirectory?
>
>
>
> --
> Lance Norskog
> goksron@gmail.com

Re: what's better for in memory searching?

Posted by Lance Norskog <go...@gmail.com>.

Yes, use MMapDirectory. It is faster and uses memory more efficiently
than RAMDirectory. This sounds wrong, but it is true. With
RAMDirectory, Java has to work harder doing garbage collection.

On Fri, Jun 8, 2012 at 1:30 AM, Li Li <fa...@gmail.com> wrote:
> hi all
>   I want to use lucene 3.6 providing searching service. my data is
> not very large, raw data is less that 1GB and I want to use load all
> indexes into memory. also I need save all indexes into disk
> persistently.
>   I originally want to use RAMDirectory. But when I read its javadoc.
>
>   Warning: This class is not intended to work with huge indexes.
> Everything beyond several hundred megabytes
>  will waste resources (GC cycles), because it uses an internal buffer
> size of 1024 bytes, producing millions of byte
>  [1024] arrays. This class is optimized for small memory-resident
> indexes. It also has bad concurrency on
>  multithreaded environments.
> It is recommended to materialize large indexes on disk and use
> MMapDirectory, which is a high-performance
>  directory implementation working directly on the file system cache of
> the operating system, so copying data to
>  Java heap space is not useful.
>
>    should I use MMapDirectory? it seems another contrib instantiated.
> anyone test it with RAMDirectory?



-- 
Lance Norskog
goksron@gmail.com