You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Emmanuel Espina <es...@gmail.com> on 2013/06/28 23:29:00 UTC

In memory index (current status in Lucene)

I'm building a distributed index (mostly as a reasearch project for
school) and I'm evaluating indexing the entire collection in memory
(like google, facebook and others have done years ago). The obvious
reason for this is performance considering that the replication will
give me a reasonably good durability of the data (despite being in
volatile memory).

What is the current status of Lucene for this kind of indexes?
RAMDirectory in it's documentation has a scary warning that says that
"is not intended to work with huge indexes", and that sounds more like
it is an implementation for testing rather than something for
production.

Of course there is no real context for this question, because it is a
reasearch topic. Testing it's limits would be the closest to a context
I have :p

Thanks
Emmanuel

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: In memory index (current status in Lucene)

Posted by Sanne Grinovero <sa...@gmail.com>.

There is a decent implementation for a fully in-memory Directory in
the Infinispan project:
https://github.com/infinispan/infinispan/tree/master/lucene

This is however not taking advantage of off-heap buffers but storing
the index in the heap itself; the reason being that Infinispan can in
this case deal with index replication over the network.

While I am one of the maintainers of the above code, I would agree on
preferring the MMap implementation over the Infinispan one if you are
interested in a single node only for the sake of simplicity:
Infinispan's implementation is not (yet) significantly faster than
FSDirectory, at least in most of my tests, although it wins by a small
margin in some tests; GC is indeed the bottleneck, but we hope to
improve on that.

Sanne

On 4 July 2013 22:59, Adrien Grand <jp...@gmail.com> wrote:
> On Tue, Jul 2, 2013 at 10:09 AM, Toke Eskildsen <te...@statsbiblioteket.dk> wrote:
>> I wonder if Java's ByteBuffer could be used to make a more GC-friendly
>> RAMDirectory?
>
> For the record, there is an open issue about it:
> https://issues.apache.org/jira/browse/LUCENE-2292.
>
> --
> Adrien
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: In memory index (current status in Lucene)

Posted by Adrien Grand <jp...@gmail.com>.

On Tue, Jul 2, 2013 at 10:09 AM, Toke Eskildsen <te...@statsbiblioteket.dk> wrote:
> I wonder if Java's ByteBuffer could be used to make a more GC-friendly
> RAMDirectory?

For the record, there is an open issue about it:
https://issues.apache.org/jira/browse/LUCENE-2292.

--
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: In memory index (current status in Lucene)

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.

On Mon, 2013-07-01 at 16:07 +0200, Emmanuel Espina wrote:
> Just to add to this conversation, I found an interesting link to
> Mike's blog about memory resident indexes (using another virtual
> machine) http://blog.mikemccandless.com/2012/07/lucene-index-in-ram-with-azuls-zing-jvm.html

Testing the Zing with MMapDirectory vs. RAMDirectory would be a great
addition to Mike's blog post.

I wonder if Java's ByteBuffer could be used to make a more GC-friendly
RAMDirectory?

Regards,
Toke Eskildsen, State and University Library, Denmark



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: In memory index (current status in Lucene)

Posted by Steven Schlansker <st...@likeness.com>.

On Jul 1, 2013, at 2:41 PM, Lance Norskog <go...@gmail.com> wrote:

> My current open source project is a Directory that is just like RAMDirectory, but everything is memory-mapped. The idea is it creates a disk file, opens it, and immediately deletes the file. The file still exists until the IndexReader/Writer/Searcher closes it. But, it cannot be found from the file system. This is just like a RAMDirectory, but without memory limitations.
> 
> It's proving to be harder than it looked.
> 
> The application is to store encrypted indexes in memory, with the decrypted contents in this non-findable format. I'm in medical document analysis now, and we can't store anything on disk in the clear.

I'm worried that this might not actually be secure.  It certainly would be hard to find the data if the file is deleted in this way, but there are multiple ways to expose this confidential information (e.g. fsck reattaching the inode if it is lost, or directly by executing "ln /proc/<pid>/fd/<fdno> /recovered-file" or other such trickery.

I would not trust this approach to keep the data secure, especially if there are potential lawsuits involved.

Best,
Steven

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: In memory index (current status in Lucene)

Posted by Uwe Schindler <uw...@thetaphi.de>.

You mean tmpfs - not RAM disk. Tmpfs is cool, as it plays wonderful winth mmap (mmap just maps the RAM used by the tmpfs into the user's address space).

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Ramkumar R. Aiyengar [mailto:andyetitmoves@gmail.com]
> Sent: Thursday, July 04, 2013 10:14 PM
> To: java-user@lucene.apache.org
> Subject: Re: In memory index (current status in Lucene)
> 
> Have you tried using MMapDirectory over a RAM disk (assuming you are on
> Linux)? You can avoid writing to disk (and thus the other ways to get to it
> persistently as Steven mentions), but still MMap it.
> On 1 Jul 2013 22:41, "Lance Norskog" <go...@gmail.com> wrote:
> 
> > My current open source project is a Directory that is just like
> > RAMDirectory, but everything is memory-mapped. The idea is it creates
> > a disk file, opens it, and immediately deletes the file. The file
> > still exists until the IndexReader/Writer/Searcher closes it. But, it
> > cannot be found from the file system. This is just like a
> > RAMDirectory, but without memory limitations.
> >
> > It's proving to be harder than it looked.
> >
> > The application is to store encrypted indexes in memory, with the
> > decrypted contents in this non-findable format. I'm in medical
> > document analysis now, and we can't store anything on disk in the clear.
> >
> > Lance
> >
> > On 07/01/2013 07:07 AM, Emmanuel Espina wrote:
> >
> >> Hi Erick! Nice to hear from you again! From time to time my interest
> >> in these "Lucene things" returns and I do some experiments :p
> >>
> >> Just to add to this conversation, I found an interesting link to
> >> Mike's blog about memory resident indexes (using another virtual
> >> machine) http://blog.mikemccandless.**com/2012/07/lucene-index-in-
> **
> >> ram-with-azuls-zing-
> jvm.html<http://blog.mikemccandless.com/2012/07/l
> >> ucene-index-in-ram-with-azuls-zing-jvm.html>
> >> and also (which is not exactly what I asked but seems related) there
> >> is a Google Summer of Code project to build a memory residen term
> >> resident:
> >> http://www.google-melange.com/**gsoc/project/google/gsoc2013/**
> >> billybob/42001<http://www.google-
> melange.com/gsoc/project/google/gsoc
> >> 2013/billybob/42001>
> >>
> >> Thanks
> >> Emmanuel
> >>
> >>
> >> 2013/7/1 Erick Erickson <er...@gmail.com>:
> >>
> >>> Hey Emma! It's been a while....
> >>>
> >>> Building on what Steven said, here's Uwe's blog on MMapDirectory and
> >>> Lucene:
> >>> http://blog.thetaphi.de/2012/**07/use-lucenes-mmapdirectory-**
> >>> on-64bit.html<http://blog.thetaphi.de/2012/07/use-lucenes-
> mmapdirect
> >>> ory-on-64bit.html>
> >>>
> >>> I've always considered RAMDirectory for rather restricted use-cases.
> >>> I.e. if I know without doubt that the index is both relatively
> >>> static and bounded. The other use I've seen is to use it to index
> >>> single documents on-the-fly for some reason (say complex processing
> >>> of a single result) then throw it out afterwards.
> >>>
> >>> How are things going?
> >>>
> >>> Erick
> >>>
> >>>
> >>>
> >>> On Fri, Jun 28, 2013 at 5:36 PM, Steven Schlansker
> >>> <steven@likeness.com
> >>> >wrote:
> >>>
> >>>  On Jun 28, 2013, at 2:29 PM, Emmanuel Espina
> >>> <es...@gmail.com>
> >>>> wrote:
> >>>>
> >>>>  I'm building a distributed index (mostly as a reasearch project
> >>>> for
> >>>>> school) and I'm evaluating indexing the entire collection in
> >>>>> memory (like google, facebook and others have done years ago). The
> >>>>> obvious reason for this is performance considering that the
> >>>>> replication will give me a reasonably good durability of the data
> >>>>> (despite being in volatile memory).
> >>>>>
> >>>>> What is the current status of Lucene for this kind of indexes?
> >>>>> RAMDirectory in it's documentation has a scary warning that says
> >>>>> that "is not intended to work with huge indexes", and that sounds
> >>>>> more like it is an implementation for testing rather than
> >>>>> something for production.
> >>>>>
> >>>>> Of course there is no real context for this question, because it
> >>>>> is a reasearch topic. Testing it's limits would be the closest to
> >>>>> a context I have :p
> >>>>>
> >>>> You could consider MMapDirectory, which will end up putting the
> >>>> active portions of the index in memory (via the filesystem buffer
> >>>> cache).
> >>>>
> >>>> The benefit is that you don't completely destroy the Java heap
> >>>> (RAMDirectory causes immense GC pressure if you are not careful)
> >>>> and you don't have to commit all of your ram to index usage all the
> >>>> time.
> >>>>
> >>>> The downside is that if your working set exceeds the amount of RAM
> >>>> available for buffer cache, you will get silent performance
> >>>> degradation as you fall back to disk reads for the missing blocks.
> >>>>
> >>>> Maybe this is OK for your use case, maybe not.
> >>>>
> >>>>
> >>>> ------------------------------**------------------------------**
> >>>> ---------
> >>>> To unsubscribe, e-mail:
> >>>> java-user-unsubscribe@lucene.**apache.org<java-user-
> unsubscribe@luc
> >>>> ene.apache.org> For additional commands, e-mail:
> >>>> java-user-help@lucene.apache.**org<java-user-
> help@lucene.apache.org
> >>>> >
> >>>>
> >>>>
> >>>>  ------------------------------**------------------------------**
> >> ---------
> >> To unsubscribe, e-mail:
> >> java-user-unsubscribe@lucene.**apache.org<java-user-
> unsubscribe@lucen
> >> e.apache.org> For additional commands, e-mail:
> >> java-user-help@lucene.apache.**org<java-user-
> help@lucene.apache.org>
> >>
> >>
> >
> > ------------------------------**------------------------------**------
> > --- To unsubscribe, e-mail:
> > java-user-unsubscribe@lucene.**apache.org<java-user-
> unsubscribe@lucene
> > .apache.org> For additional commands, e-mail:
> > java-user-help@lucene.apache.**org<java-user-
> help@lucene.apache.org>
> >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: In memory index (current status in Lucene)

Posted by "Ramkumar R. Aiyengar" <an...@gmail.com>.

Have you tried using MMapDirectory over a RAM disk (assuming you are on
Linux)? You can avoid writing to disk (and thus the other ways to get to it
persistently as Steven mentions), but still MMap it.
On 1 Jul 2013 22:41, "Lance Norskog" <go...@gmail.com> wrote:

> My current open source project is a Directory that is just like
> RAMDirectory, but everything is memory-mapped. The idea is it creates a
> disk file, opens it, and immediately deletes the file. The file still
> exists until the IndexReader/Writer/Searcher closes it. But, it cannot be
> found from the file system. This is just like a RAMDirectory, but without
> memory limitations.
>
> It's proving to be harder than it looked.
>
> The application is to store encrypted indexes in memory, with the
> decrypted contents in this non-findable format. I'm in medical document
> analysis now, and we can't store anything on disk in the clear.
>
> Lance
>
> On 07/01/2013 07:07 AM, Emmanuel Espina wrote:
>
>> Hi Erick! Nice to hear from you again! From time to time my interest
>> in these "Lucene things" returns and I do some experiments :p
>>
>> Just to add to this conversation, I found an interesting link to
>> Mike's blog about memory resident indexes (using another virtual
>> machine) http://blog.mikemccandless.**com/2012/07/lucene-index-in-**
>> ram-with-azuls-zing-jvm.html<http://blog.mikemccandless.com/2012/07/lucene-index-in-ram-with-azuls-zing-jvm.html>
>> and also (which is not exactly what I asked but seems related) there
>> is a Google Summer of Code project to build a memory residen term
>> resident: http://www.google-melange.com/**gsoc/project/google/gsoc2013/**
>> billybob/42001<http://www.google-melange.com/gsoc/project/google/gsoc2013/billybob/42001>
>>
>> Thanks
>> Emmanuel
>>
>>
>> 2013/7/1 Erick Erickson <er...@gmail.com>:
>>
>>> Hey Emma! It's been a while....
>>>
>>> Building on what Steven said, here's Uwe's blog on
>>> MMapDirectory and Lucene:
>>> http://blog.thetaphi.de/2012/**07/use-lucenes-mmapdirectory-**
>>> on-64bit.html<http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html>
>>>
>>> I've always considered RAMDirectory for rather restricted
>>> use-cases. I.e. if I know without doubt that the index
>>> is both relatively static and bounded. The other use I've
>>> seen is to use it to index single documents on-the-fly for
>>> some reason (say complex processing of a single result)
>>> then throw it out afterwards.
>>>
>>> How are things going?
>>>
>>> Erick
>>>
>>>
>>>
>>> On Fri, Jun 28, 2013 at 5:36 PM, Steven Schlansker <steven@likeness.com
>>> >wrote:
>>>
>>>  On Jun 28, 2013, at 2:29 PM, Emmanuel Espina <es...@gmail.com>
>>>> wrote:
>>>>
>>>>  I'm building a distributed index (mostly as a reasearch project for
>>>>> school) and I'm evaluating indexing the entire collection in memory
>>>>> (like google, facebook and others have done years ago). The obvious
>>>>> reason for this is performance considering that the replication will
>>>>> give me a reasonably good durability of the data (despite being in
>>>>> volatile memory).
>>>>>
>>>>> What is the current status of Lucene for this kind of indexes?
>>>>> RAMDirectory in it's documentation has a scary warning that says that
>>>>> "is not intended to work with huge indexes", and that sounds more like
>>>>> it is an implementation for testing rather than something for
>>>>> production.
>>>>>
>>>>> Of course there is no real context for this question, because it is a
>>>>> reasearch topic. Testing it's limits would be the closest to a context
>>>>> I have :p
>>>>>
>>>> You could consider MMapDirectory, which will end up putting the active
>>>> portions
>>>> of the index in memory (via the filesystem buffer cache).
>>>>
>>>> The benefit is that you don't completely destroy the Java heap
>>>> (RAMDirectory causes immense
>>>> GC pressure if you are not careful) and you don't have to commit all of
>>>> your ram to index usage all the time.
>>>>
>>>> The downside is that if your working set exceeds the amount of RAM
>>>> available for buffer cache, you will get silent performance degradation
>>>> as
>>>> you fall back to disk reads for the missing blocks.
>>>>
>>>> Maybe this is OK for your use case, maybe not.
>>>>
>>>>
>>>> ------------------------------**------------------------------**
>>>> ---------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.**apache.org<ja...@lucene.apache.org>
>>>> For additional commands, e-mail: java-user-help@lucene.apache.**org<ja...@lucene.apache.org>
>>>>
>>>>
>>>>  ------------------------------**------------------------------**
>> ---------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.**apache.org<ja...@lucene.apache.org>
>> For additional commands, e-mail: java-user-help@lucene.apache.**org<ja...@lucene.apache.org>
>>
>>
>
> ------------------------------**------------------------------**---------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.**apache.org<ja...@lucene.apache.org>
> For additional commands, e-mail: java-user-help@lucene.apache.**org<ja...@lucene.apache.org>
>
>

Re: In memory index (current status in Lucene)

Posted by Lance Norskog <go...@gmail.com>.

My current open source project is a Directory that is just like 
RAMDirectory, but everything is memory-mapped. The idea is it creates a 
disk file, opens it, and immediately deletes the file. The file still 
exists until the IndexReader/Writer/Searcher closes it. But, it cannot 
be found from the file system. This is just like a RAMDirectory, but 
without memory limitations.

It's proving to be harder than it looked.

The application is to store encrypted indexes in memory, with the 
decrypted contents in this non-findable format. I'm in medical document 
analysis now, and we can't store anything on disk in the clear.

Lance

On 07/01/2013 07:07 AM, Emmanuel Espina wrote:
> Hi Erick! Nice to hear from you again! From time to time my interest
> in these "Lucene things" returns and I do some experiments :p
>
> Just to add to this conversation, I found an interesting link to
> Mike's blog about memory resident indexes (using another virtual
> machine) http://blog.mikemccandless.com/2012/07/lucene-index-in-ram-with-azuls-zing-jvm.html
> and also (which is not exactly what I asked but seems related) there
> is a Google Summer of Code project to build a memory residen term
> resident: http://www.google-melange.com/gsoc/project/google/gsoc2013/billybob/42001
>
> Thanks
> Emmanuel
>
>
> 2013/7/1 Erick Erickson <er...@gmail.com>:
>> Hey Emma! It's been a while....
>>
>> Building on what Steven said, here's Uwe's blog on
>> MMapDirectory and Lucene:
>> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>>
>> I've always considered RAMDirectory for rather restricted
>> use-cases. I.e. if I know without doubt that the index
>> is both relatively static and bounded. The other use I've
>> seen is to use it to index single documents on-the-fly for
>> some reason (say complex processing of a single result)
>> then throw it out afterwards.
>>
>> How are things going?
>>
>> Erick
>>
>>
>>
>> On Fri, Jun 28, 2013 at 5:36 PM, Steven Schlansker <st...@likeness.com>wrote:
>>
>>> On Jun 28, 2013, at 2:29 PM, Emmanuel Espina <es...@gmail.com>
>>> wrote:
>>>
>>>> I'm building a distributed index (mostly as a reasearch project for
>>>> school) and I'm evaluating indexing the entire collection in memory
>>>> (like google, facebook and others have done years ago). The obvious
>>>> reason for this is performance considering that the replication will
>>>> give me a reasonably good durability of the data (despite being in
>>>> volatile memory).
>>>>
>>>> What is the current status of Lucene for this kind of indexes?
>>>> RAMDirectory in it's documentation has a scary warning that says that
>>>> "is not intended to work with huge indexes", and that sounds more like
>>>> it is an implementation for testing rather than something for
>>>> production.
>>>>
>>>> Of course there is no real context for this question, because it is a
>>>> reasearch topic. Testing it's limits would be the closest to a context
>>>> I have :p
>>> You could consider MMapDirectory, which will end up putting the active
>>> portions
>>> of the index in memory (via the filesystem buffer cache).
>>>
>>> The benefit is that you don't completely destroy the Java heap
>>> (RAMDirectory causes immense
>>> GC pressure if you are not careful) and you don't have to commit all of
>>> your ram to index usage all the time.
>>>
>>> The downside is that if your working set exceeds the amount of RAM
>>> available for buffer cache, you will get silent performance degradation as
>>> you fall back to disk reads for the missing blocks.
>>>
>>> Maybe this is OK for your use case, maybe not.
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: In memory index (current status in Lucene)

Posted by Emmanuel Espina <es...@gmail.com>.

Hi Erick! Nice to hear from you again! From time to time my interest
in these "Lucene things" returns and I do some experiments :p

Just to add to this conversation, I found an interesting link to
Mike's blog about memory resident indexes (using another virtual
machine) http://blog.mikemccandless.com/2012/07/lucene-index-in-ram-with-azuls-zing-jvm.html
and also (which is not exactly what I asked but seems related) there
is a Google Summer of Code project to build a memory residen term
resident: http://www.google-melange.com/gsoc/project/google/gsoc2013/billybob/42001

Thanks
Emmanuel


2013/7/1 Erick Erickson <er...@gmail.com>:
> Hey Emma! It's been a while....
>
> Building on what Steven said, here's Uwe's blog on
> MMapDirectory and Lucene:
> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>
> I've always considered RAMDirectory for rather restricted
> use-cases. I.e. if I know without doubt that the index
> is both relatively static and bounded. The other use I've
> seen is to use it to index single documents on-the-fly for
> some reason (say complex processing of a single result)
> then throw it out afterwards.
>
> How are things going?
>
> Erick
>
>
>
> On Fri, Jun 28, 2013 at 5:36 PM, Steven Schlansker <st...@likeness.com>wrote:
>
>>
>> On Jun 28, 2013, at 2:29 PM, Emmanuel Espina <es...@gmail.com>
>> wrote:
>>
>> > I'm building a distributed index (mostly as a reasearch project for
>> > school) and I'm evaluating indexing the entire collection in memory
>> > (like google, facebook and others have done years ago). The obvious
>> > reason for this is performance considering that the replication will
>> > give me a reasonably good durability of the data (despite being in
>> > volatile memory).
>> >
>> > What is the current status of Lucene for this kind of indexes?
>> > RAMDirectory in it's documentation has a scary warning that says that
>> > "is not intended to work with huge indexes", and that sounds more like
>> > it is an implementation for testing rather than something for
>> > production.
>> >
>> > Of course there is no real context for this question, because it is a
>> > reasearch topic. Testing it's limits would be the closest to a context
>> > I have :p
>>
>> You could consider MMapDirectory, which will end up putting the active
>> portions
>> of the index in memory (via the filesystem buffer cache).
>>
>> The benefit is that you don't completely destroy the Java heap
>> (RAMDirectory causes immense
>> GC pressure if you are not careful) and you don't have to commit all of
>> your ram to index usage all the time.
>>
>> The downside is that if your working set exceeds the amount of RAM
>> available for buffer cache, you will get silent performance degradation as
>> you fall back to disk reads for the missing blocks.
>>
>> Maybe this is OK for your use case, maybe not.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: In memory index (current status in Lucene)

Posted by Erick Erickson <er...@gmail.com>.

Hey Emma! It's been a while....

Building on what Steven said, here's Uwe's blog on
MMapDirectory and Lucene:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

I've always considered RAMDirectory for rather restricted
use-cases. I.e. if I know without doubt that the index
is both relatively static and bounded. The other use I've
seen is to use it to index single documents on-the-fly for
some reason (say complex processing of a single result)
then throw it out afterwards.

How are things going?

Erick



On Fri, Jun 28, 2013 at 5:36 PM, Steven Schlansker <st...@likeness.com>wrote:

>
> On Jun 28, 2013, at 2:29 PM, Emmanuel Espina <es...@gmail.com>
> wrote:
>
> > I'm building a distributed index (mostly as a reasearch project for
> > school) and I'm evaluating indexing the entire collection in memory
> > (like google, facebook and others have done years ago). The obvious
> > reason for this is performance considering that the replication will
> > give me a reasonably good durability of the data (despite being in
> > volatile memory).
> >
> > What is the current status of Lucene for this kind of indexes?
> > RAMDirectory in it's documentation has a scary warning that says that
> > "is not intended to work with huge indexes", and that sounds more like
> > it is an implementation for testing rather than something for
> > production.
> >
> > Of course there is no real context for this question, because it is a
> > reasearch topic. Testing it's limits would be the closest to a context
> > I have :p
>
> You could consider MMapDirectory, which will end up putting the active
> portions
> of the index in memory (via the filesystem buffer cache).
>
> The benefit is that you don't completely destroy the Java heap
> (RAMDirectory causes immense
> GC pressure if you are not careful) and you don't have to commit all of
> your ram to index usage all the time.
>
> The downside is that if your working set exceeds the amount of RAM
> available for buffer cache, you will get silent performance degradation as
> you fall back to disk reads for the missing blocks.
>
> Maybe this is OK for your use case, maybe not.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: In memory index (current status in Lucene)

Posted by Steven Schlansker <st...@likeness.com>.

On Jun 28, 2013, at 2:29 PM, Emmanuel Espina <es...@gmail.com> wrote:

> I'm building a distributed index (mostly as a reasearch project for
> school) and I'm evaluating indexing the entire collection in memory
> (like google, facebook and others have done years ago). The obvious
> reason for this is performance considering that the replication will
> give me a reasonably good durability of the data (despite being in
> volatile memory).
> 
> What is the current status of Lucene for this kind of indexes?
> RAMDirectory in it's documentation has a scary warning that says that
> "is not intended to work with huge indexes", and that sounds more like
> it is an implementation for testing rather than something for
> production.
> 
> Of course there is no real context for this question, because it is a
> reasearch topic. Testing it's limits would be the closest to a context
> I have :p

You could consider MMapDirectory, which will end up putting the active portions
of the index in memory (via the filesystem buffer cache).

The benefit is that you don't completely destroy the Java heap (RAMDirectory causes immense
GC pressure if you are not careful) and you don't have to commit all of your ram to index usage all the time.

The downside is that if your working set exceeds the amount of RAM available for buffer cache, you will get silent performance degradation as you fall back to disk reads for the missing blocks.

Maybe this is OK for your use case, maybe not.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org