You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Shai Erera <se...@gmail.com> on 2011/12/20 14:13:22 UTC

Plans to remove RAMDirectory?

Hi

Uwe mentioned on LUCENE-3653 that there are plans to remove RAMDirectory
from Trunk and move to tests only: "RAMDirectory is written for tests, not
for production use. There are already plans to remove it from Lucene trunk
and move to tests only." (see full
comment<https://issues.apache.org/jira/browse/LUCENE-3653?focusedCommentId=13172338&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13172338>
)

I wasn't aware of such plans - were there emails about it or it has been
discussed on IRC?

I disagree that RAMDirectory is useful only for tests. For example, when
someone wants to index on Hadoop, RAMDirectory can be very useful (even
though it's not the only solution). Also, RAMDirectory is still more
efficient than MMapDirectory, if you want to index (and then search) on a
small (sometimes even transient) amount of data. We use it in several cases
for such purposes.

If RAMDirectory needs to improve (for instance, allocate bigger byte[]
chunks), then IMO we should do that, rather than drop it entirely from
core. I think it's a very valuable Directory implementation that Lucene
offers, and I'd hate to see it disappear.

Shai

RE: Plans to remove RAMDirectory?

Posted by Uwe Schindler <uw...@thetaphi.de>.
Done: https://issues.apache.org/jira/browse/LUCENE-3659

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: DM Smith [mailto:dmsmith555@gmail.com]
> Sent: Tuesday, December 20, 2011 4:08 PM
> To: dev@lucene.apache.org
> Subject: Re: Plans to remove RAMDirectory?
> 
> How about an issue to track this? I'd be glad to do it, but I'm not really
the
> "reporter" for it.
> 
> -- DM
> 
> On 12/20/2011 09:51 AM, Shai Erera wrote:
> > Thanks for the clarification Uwe. If the whole idea is a new
> > RAMDirectory implementation, that is more efficient, then it's ok. I
> > think that the ideas you write are interesting.
> >
> > Have you tried MMapDir for read access in comparison to RAMDirectory
> > for a
> >> larger index
> >>
> > I have, and I support the decision not to use RAMDirectory for such
cases.
> > BUT, MMapDir is not recommended for use on all platforms / JDKs.
> > Second, it cannot be used on e.g. HDFS. So sometimes RAMDirectory is
> > the best you can do.
> >
> > Again, if the whole idea is improving RAMDirectory's implementation,
> > then that I totally agree with and it makes sense. My point was that
> > we should not lose the ability to load indexes into RAM.
> >
> > Shai
> >
> > On Tue, Dec 20, 2011 at 3:36 PM, Uwe Schindler<uw...@thetaphi.de>  wrote:
> >
> >> Hi,****
> >>
> >> ** **
> >>
> >> You misunderstood the whole thing. The idea was to maybe replace
> >> RAMDirectory by a “clone” of MMapDirectory that uses large
> >> DirectByteBuffers outside the JVM heap. The current RAMDirectory is
> >> very limited (buffersize hardcoded to 8 KB, if you have a 50 Gigabyte
> >> Index in this RAMDirectory, your GC simply drives crazy – we
> >> investigated this several times for customers. RAMDirectory was in
> >> fact several times slower than a simple disk-based MMapDir). Also the
> >> locking on the RAMFile class is horrible, as for large indexes you
> >> have to change buffer several times when seeking/reading/…, which
> >> does heavily locking. In contrast, MMapDir is completely
> >> lock-free!****
> >>
> >> ** **
> >>
> >> Until there is no replacement we will not remove it, but the current
> >> RAMDirectory is not useable for large indexes. That’s a limitation
> >> and the design of this class does not support anything else. It’s
> >> currently unfixable and instead of putting work into fixing it, the
> >> time should be spent in working on a new ByteBuffer-based RAMDir with
> >> larger blocs/blocs that merge or IOContext helping to calculate the
> >> file size before writing it (e.g. when triggering a merge you know
> >> the approximate size of the file before, so you can allocate a buffer
> >> that’s better than 8 Kilobytes). Also directByteBuffer helps to make GC
> happy, as the RAMdir is outside JVM heap.
> >> ****
> >>
> >> ** **
> >>
> >> **Ø  **Also, RAMDirectory is still more efficient than MMapDirectory,
> >> if you want to index (and then search) on a small (sometimes even
> >> transient) amount of data****
> >>
> >> ** **
> >>
> >> That’s not true, as RAMdir uses more time for switching buffers than
> >> reading the data. The proble m is that MMapDir does not support
> >> **writing** and that why we plan to improve this. Have you tried
> >> MMapDir for read access in comparison to RAMDirectory for a larger
> >> index, it outperforms several times (depending on OS and if file data
is in FS
> cache already).
> >> The new directory will simply mimic the MMapIndexInput, add
> >> MMapIndexOutput, but not based on a mmaped buffer, instead a
> >> in-memory (Direct)ByteBuffer (outside or inside JVM heap – both will be
> supported).
> >> This simplifies code a lot.****
> >>
> >> ** **
> >>
> >> The discussions about the limitations of crappy RAMDirectory were
> >> discussed on conferences, sorry. We did **not**decide to remove it
> >> (without a patch/replacement). The whole “message” on the issue was
> >> that RAMDirectory is a bad idea. The recommended approach at the
> >> moment to handle large in-ram directories would be to use a tmpfs on
> >> Linux/Solaris and use MMapDir on top (for larger indexes). The MMap
> >> would then directly map the RAM of the underlying tmpfs.****
> >>
> >> ** **
> >>
> >> Uwe****
> >>
> >> ** **
> >>
> >> -----****
> >>
> >> Uwe Schindler****
> >>
> >> H.-H.-Meier-Allee 63, D-28213 Bremen****
> >>
> >> http://www.thetaphi.de****
> >>
> >> eMail: uwe@thetaphi.de****
> >>
> >> ** **
> >>
> >> *From:* Shai Erera [mailto:serera@gmail.com]
> >> *Sent:* Tuesday, December 20, 2011 2:13 PM
> >> *To:* dev@lucene.apache.org
> >> *Subject:* Plans to remove RAMDirectory?****
> >>
> >> ** **
> >>
> >> Hi
> >>
> >> Uwe mentioned on LUCENE-3653 that there are plans to remove
> >> RAMDirectory from Trunk and move to tests only: "RAMDirectory is
> >> written for tests, not for production use. There are already plans to
> >> remove it from Lucene trunk and move to tests only." (see full
> >> comment<https://issues.apache.org/jira/browse/LUCENE-
> 3653?focusedComm
> >> entId=13172338&page=com.atlassian.jira.plugin.system.issuetabpanels:c
> >> omment-tabpanel#comment-13172338>
> >> )
> >>
> >> I wasn't aware of such plans - were there emails about it or it has
> >> been discussed on IRC?
> >>
> >> I disagree that RAMDirectory is useful only for tests. For example,
> >> when someone wants to index on Hadoop, RAMDirectory can be very
> >> useful (even though it's not the only solution). Also, RAMDirectory
> >> is still more efficient than MMapDirectory, if you want to index (and
> >> then search) on a small (sometimes even transient) amount of data. We
> >> use it in several cases for such purposes.
> >>
> >> If RAMDirectory needs to improve (for instance, allocate bigger
> >> byte[] chunks), then IMO we should do that, rather than drop it
> >> entirely from core. I think it's a very valuable Directory
> >> implementation that Lucene offers, and I'd hate to see it disappear.
> >>
> >> Shai****
> >>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional
> commands, e-mail: dev-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Plans to remove RAMDirectory?

Posted by DM Smith <dm...@gmail.com>.
How about an issue to track this? I'd be glad to do it, but I'm not 
really the "reporter" for it.

-- DM

On 12/20/2011 09:51 AM, Shai Erera wrote:
> Thanks for the clarification Uwe. If the whole idea is a new RAMDirectory
> implementation, that is more efficient, then it's ok. I think that the
> ideas you write are interesting.
>
> Have you tried MMapDir for read access in comparison to RAMDirectory for a
>> larger index
>>
> I have, and I support the decision not to use RAMDirectory for such cases.
> BUT, MMapDir is not recommended for use on all platforms / JDKs. Second, it
> cannot be used on e.g. HDFS. So sometimes RAMDirectory is the best you can
> do.
>
> Again, if the whole idea is improving RAMDirectory's implementation, then
> that I totally agree with and it makes sense. My point was that we should
> not lose the ability to load indexes into RAM.
>
> Shai
>
> On Tue, Dec 20, 2011 at 3:36 PM, Uwe Schindler<uw...@thetaphi.de>  wrote:
>
>> Hi,****
>>
>> ** **
>>
>> You misunderstood the whole thing. The idea was to maybe replace
>> RAMDirectory by a “clone” of MMapDirectory that uses large
>> DirectByteBuffers outside the JVM heap. The current RAMDirectory is very
>> limited (buffersize hardcoded to 8 KB, if you have a 50 Gigabyte Index in
>> this RAMDirectory, your GC simply drives crazy – we investigated this
>> several times for customers. RAMDirectory was in fact several times slower
>> than a simple disk-based MMapDir). Also the locking on the RAMFile class is
>> horrible, as for large indexes you have to change buffer several times when
>> seeking/reading/…, which does heavily locking. In contrast, MMapDir is
>> completely lock-free!****
>>
>> ** **
>>
>> Until there is no replacement we will not remove it, but the current
>> RAMDirectory is not useable for large indexes. That’s a limitation and the
>> design of this class does not support anything else. It’s currently
>> unfixable and instead of putting work into fixing it, the time should be
>> spent in working on a new ByteBuffer-based RAMDir with larger blocs/blocs
>> that merge or IOContext helping to calculate the file size before writing
>> it (e.g. when triggering a merge you know the approximate size of the file
>> before, so you can allocate a buffer that’s better than 8 Kilobytes). Also
>> directByteBuffer helps to make GC happy, as the RAMdir is outside JVM heap.
>> ****
>>
>> ** **
>>
>> **Ø  **Also, RAMDirectory is still more efficient than MMapDirectory, if
>> you want to index (and then search) on a small (sometimes even transient)
>> amount of data****
>>
>> ** **
>>
>> That’s not true, as RAMdir uses more time for switching buffers than
>> reading the data. The proble m is that MMapDir does not support **writing**
>> and that why we plan to improve this. Have you tried MMapDir for read
>> access in comparison to RAMDirectory for a larger index, it outperforms
>> several times (depending on OS and if file data is in FS cache already).
>> The new directory will simply mimic the MMapIndexInput, add
>> MMapIndexOutput, but not based on a mmaped buffer, instead a in-memory
>> (Direct)ByteBuffer (outside or inside JVM heap – both will be supported).
>> This simplifies code a lot.****
>>
>> ** **
>>
>> The discussions about the limitations of crappy RAMDirectory were
>> discussed on conferences, sorry. We did **not**decide to remove it
>> (without a patch/replacement). The whole “message” on the issue was that
>> RAMDirectory is a bad idea. The recommended approach at the moment to
>> handle large in-ram directories would be to use a tmpfs on Linux/Solaris
>> and use MMapDir on top (for larger indexes). The MMap would then directly
>> map the RAM of the underlying tmpfs.****
>>
>> ** **
>>
>> Uwe****
>>
>> ** **
>>
>> -----****
>>
>> Uwe Schindler****
>>
>> H.-H.-Meier-Allee 63, D-28213 Bremen****
>>
>> http://www.thetaphi.de****
>>
>> eMail: uwe@thetaphi.de****
>>
>> ** **
>>
>> *From:* Shai Erera [mailto:serera@gmail.com]
>> *Sent:* Tuesday, December 20, 2011 2:13 PM
>> *To:* dev@lucene.apache.org
>> *Subject:* Plans to remove RAMDirectory?****
>>
>> ** **
>>
>> Hi
>>
>> Uwe mentioned on LUCENE-3653 that there are plans to remove RAMDirectory
>> from Trunk and move to tests only: "RAMDirectory is written for tests, not
>> for production use. There are already plans to remove it from Lucene trunk
>> and move to tests only." (see full comment<https://issues.apache.org/jira/browse/LUCENE-3653?focusedCommentId=13172338&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13172338>
>> )
>>
>> I wasn't aware of such plans - were there emails about it or it has been
>> discussed on IRC?
>>
>> I disagree that RAMDirectory is useful only for tests. For example, when
>> someone wants to index on Hadoop, RAMDirectory can be very useful (even
>> though it's not the only solution). Also, RAMDirectory is still more
>> efficient than MMapDirectory, if you want to index (and then search) on a
>> small (sometimes even transient) amount of data. We use it in several cases
>> for such purposes.
>>
>> If RAMDirectory needs to improve (for instance, allocate bigger byte[]
>> chunks), then IMO we should do that, rather than drop it entirely from
>> core. I think it's a very valuable Directory implementation that Lucene
>> offers, and I'd hate to see it disappear.
>>
>> Shai****
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Plans to remove RAMDirectory?

Posted by Shai Erera <se...@gmail.com>.
Thanks for the clarification Uwe. If the whole idea is a new RAMDirectory
implementation, that is more efficient, then it's ok. I think that the
ideas you write are interesting.

Have you tried MMapDir for read access in comparison to RAMDirectory for a
> larger index
>

I have, and I support the decision not to use RAMDirectory for such cases.
BUT, MMapDir is not recommended for use on all platforms / JDKs. Second, it
cannot be used on e.g. HDFS. So sometimes RAMDirectory is the best you can
do.

Again, if the whole idea is improving RAMDirectory's implementation, then
that I totally agree with and it makes sense. My point was that we should
not lose the ability to load indexes into RAM.

Shai

On Tue, Dec 20, 2011 at 3:36 PM, Uwe Schindler <uw...@thetaphi.de> wrote:

> Hi,****
>
> ** **
>
> You misunderstood the whole thing. The idea was to maybe replace
> RAMDirectory by a “clone” of MMapDirectory that uses large
> DirectByteBuffers outside the JVM heap. The current RAMDirectory is very
> limited (buffersize hardcoded to 8 KB, if you have a 50 Gigabyte Index in
> this RAMDirectory, your GC simply drives crazy – we investigated this
> several times for customers. RAMDirectory was in fact several times slower
> than a simple disk-based MMapDir). Also the locking on the RAMFile class is
> horrible, as for large indexes you have to change buffer several times when
> seeking/reading/…, which does heavily locking. In contrast, MMapDir is
> completely lock-free!****
>
> ** **
>
> Until there is no replacement we will not remove it, but the current
> RAMDirectory is not useable for large indexes. That’s a limitation and the
> design of this class does not support anything else. It’s currently
> unfixable and instead of putting work into fixing it, the time should be
> spent in working on a new ByteBuffer-based RAMDir with larger blocs/blocs
> that merge or IOContext helping to calculate the file size before writing
> it (e.g. when triggering a merge you know the approximate size of the file
> before, so you can allocate a buffer that’s better than 8 Kilobytes). Also
> directByteBuffer helps to make GC happy, as the RAMdir is outside JVM heap.
> ****
>
> ** **
>
> **Ø  **Also, RAMDirectory is still more efficient than MMapDirectory, if
> you want to index (and then search) on a small (sometimes even transient)
> amount of data****
>
> ** **
>
> That’s not true, as RAMdir uses more time for switching buffers than
> reading the data. The proble m is that MMapDir does not support **writing**
> and that why we plan to improve this. Have you tried MMapDir for read
> access in comparison to RAMDirectory for a larger index, it outperforms
> several times (depending on OS and if file data is in FS cache already).
> The new directory will simply mimic the MMapIndexInput, add
> MMapIndexOutput, but not based on a mmaped buffer, instead a in-memory
> (Direct)ByteBuffer (outside or inside JVM heap – both will be supported).
> This simplifies code a lot.****
>
> ** **
>
> The discussions about the limitations of crappy RAMDirectory were
> discussed on conferences, sorry. We did **not**decide to remove it
> (without a patch/replacement). The whole “message” on the issue was that
> RAMDirectory is a bad idea. The recommended approach at the moment to
> handle large in-ram directories would be to use a tmpfs on Linux/Solaris
> and use MMapDir on top (for larger indexes). The MMap would then directly
> map the RAM of the underlying tmpfs.****
>
> ** **
>
> Uwe****
>
> ** **
>
> -----****
>
> Uwe Schindler****
>
> H.-H.-Meier-Allee 63, D-28213 Bremen****
>
> http://www.thetaphi.de****
>
> eMail: uwe@thetaphi.de****
>
> ** **
>
> *From:* Shai Erera [mailto:serera@gmail.com]
> *Sent:* Tuesday, December 20, 2011 2:13 PM
> *To:* dev@lucene.apache.org
> *Subject:* Plans to remove RAMDirectory?****
>
> ** **
>
> Hi
>
> Uwe mentioned on LUCENE-3653 that there are plans to remove RAMDirectory
> from Trunk and move to tests only: "RAMDirectory is written for tests, not
> for production use. There are already plans to remove it from Lucene trunk
> and move to tests only." (see full comment<https://issues.apache.org/jira/browse/LUCENE-3653?focusedCommentId=13172338&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13172338>
> )
>
> I wasn't aware of such plans - were there emails about it or it has been
> discussed on IRC?
>
> I disagree that RAMDirectory is useful only for tests. For example, when
> someone wants to index on Hadoop, RAMDirectory can be very useful (even
> though it's not the only solution). Also, RAMDirectory is still more
> efficient than MMapDirectory, if you want to index (and then search) on a
> small (sometimes even transient) amount of data. We use it in several cases
> for such purposes.
>
> If RAMDirectory needs to improve (for instance, allocate bigger byte[]
> chunks), then IMO we should do that, rather than drop it entirely from
> core. I think it's a very valuable Directory implementation that Lucene
> offers, and I'd hate to see it disappear.
>
> Shai****
>

RE: Plans to remove RAMDirectory?

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi,

 

You misunderstood the whole thing. The idea was to maybe replace
RAMDirectory by a “clone” of MMapDirectory that uses large DirectByteBuffers
outside the JVM heap. The current RAMDirectory is very limited (buffersize
hardcoded to 8 KB, if you have a 50 Gigabyte Index in this RAMDirectory,
your GC simply drives crazy – we investigated this several times for
customers. RAMDirectory was in fact several times slower than a simple
disk-based MMapDir). Also the locking on the RAMFile class is horrible, as
for large indexes you have to change buffer several times when
seeking/reading/…, which does heavily locking. In contrast, MMapDir is
completely lock-free!

 

Until there is no replacement we will not remove it, but the current
RAMDirectory is not useable for large indexes. That’s a limitation and the
design of this class does not support anything else. It’s currently
unfixable and instead of putting work into fixing it, the time should be
spent in working on a new ByteBuffer-based RAMDir with larger blocs/blocs
that merge or IOContext helping to calculate the file size before writing it
(e.g. when triggering a merge you know the approximate size of the file
before, so you can allocate a buffer that’s better than 8 Kilobytes). Also
directByteBuffer helps to make GC happy, as the RAMdir is outside JVM heap.

 

Ø  Also, RAMDirectory is still more efficient than MMapDirectory, if you
want to index (and then search) on a small (sometimes even transient) amount
of data

 

That’s not true, as RAMdir uses more time for switching buffers than reading
the data. The proble m is that MMapDir does not support *writing* and that
why we plan to improve this. Have you tried MMapDir for read access in
comparison to RAMDirectory for a larger index, it outperforms several times
(depending on OS and if file data is in FS cache already). The new directory
will simply mimic the MMapIndexInput, add MMapIndexOutput, but not based on
a mmaped buffer, instead a in-memory (Direct)ByteBuffer (outside or inside
JVM heap – both will be supported). This simplifies code a lot.

 

The discussions about the limitations of crappy RAMDirectory were discussed
on conferences, sorry. We did *not*decide to remove it (without a
patch/replacement). The whole “message” on the issue was that RAMDirectory
is a bad idea. The recommended approach at the moment to handle large in-ram
directories would be to use a tmpfs on Linux/Solaris and use MMapDir on top
(for larger indexes). The MMap would then directly map the RAM of the
underlying tmpfs.

 

Uwe

 

-----

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

 <http://www.thetaphi.de/> http://www.thetaphi.de

eMail: uwe@thetaphi.de

 

From: Shai Erera [mailto:serera@gmail.com] 
Sent: Tuesday, December 20, 2011 2:13 PM
To: dev@lucene.apache.org
Subject: Plans to remove RAMDirectory?

 

Hi

Uwe mentioned on LUCENE-3653 that there are plans to remove RAMDirectory
from Trunk and move to tests only: "RAMDirectory is written for tests, not
for production use. There are already plans to remove it from Lucene trunk
and move to tests only." (see full comment
<https://issues.apache.org/jira/browse/LUCENE-3653?focusedCommentId=13172338
&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comme
nt-13172338> )

I wasn't aware of such plans - were there emails about it or it has been
discussed on IRC?

I disagree that RAMDirectory is useful only for tests. For example, when
someone wants to index on Hadoop, RAMDirectory can be very useful (even
though it's not the only solution). Also, RAMDirectory is still more
efficient than MMapDirectory, if you want to index (and then search) on a
small (sometimes even transient) amount of data. We use it in several cases
for such purposes.

If RAMDirectory needs to improve (for instance, allocate bigger byte[]
chunks), then IMO we should do that, rather than drop it entirely from core.
I think it's a very valuable Directory implementation that Lucene offers,
and I'd hate to see it disappear.

Shai