You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Ravikumar Govindarajan <ra...@gmail.com> on 2016/10/20 07:25:56 UTC

Can ByteBufferIndexInput use buffering?

When we use NIOFSDirectory, lucene internally uses buffering via
BufferedIndexInput (1KB etc...) while reading from the file..

However, for MmapDirectory (ByteBufferIndexInput) there is no such
buffering & data is read from the mapped bytes directly...

Will it be too much of a performance drag if I wrap ByteBufferIndexInput
with a BufferedIndex? I mean like, is it an anti-pattern of zero-copy reads
etc...

Any help is much appreciated

--
Ravi

Re: Can ByteBufferIndexInput use buffering?

Posted by Michael McCandless <lu...@mikemccandless.com>.
The fact that MMapIndexInput does no buffering is an important
performance gain vs NIOFSDirectory which e.g. on seeking to a term
loads way too many bytes.

Why do you want to add buffering to it?

The OS should already do a good job keeping recently accessed pages
hot, doing the buffering for you.

Mike McCandless

http://blog.mikemccandless.com


On Thu, Oct 20, 2016 at 3:25 AM, Ravikumar Govindarajan
<ra...@gmail.com> wrote:
> When we use NIOFSDirectory, lucene internally uses buffering via
> BufferedIndexInput (1KB etc...) while reading from the file..
>
> However, for MmapDirectory (ByteBufferIndexInput) there is no such
> buffering & data is read from the mapped bytes directly...
>
> Will it be too much of a performance drag if I wrap ByteBufferIndexInput
> with a BufferedIndex? I mean like, is it an anti-pattern of zero-copy reads
> etc...
>
> Any help is much appreciated
>
> --
> Ravi

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Can ByteBufferIndexInput use buffering?

Posted by Ravikumar Govindarajan <ra...@gmail.com>.
Thanks Mike & Uwe for the clarification...

We have SSDs in production & currently using normal file-reads (NIOFS...)
with 64KB buffers. The time taken for disk-reads is a small part of overall
time taken (init, parsing, CPU etc...), especially for searches. A tiny
sample of the production log is here...

QueryTime:49ms. | No of Results:1 | DiskReadCount=245 | DiskReadTimeInMs=17ms

QueryTime:75ms. | No of Results:2 | DiskReadCount=234 | DiskReadTimeInMs=19ms

QueryTime:64ms. | No of Results:57 | DiskReadCount=628 | DiskReadTimeInMs=13ms

QueryTime:70ms. | No of Results:18 | DiskReadCount=190 | DiskReadTimeInMs=35ms

QueryTime:58ms. | No of Results:1 | DiskReadCount=531 | DiskReadTimeInMs=22ms


We were using fairly large buffer size, but so far the 'DiskReadTime"
seems to be in control (May be due to page cache). However, there is a
considerable read-amplification on SSDs


It was when some part of troubleshooting exercise that we thought we should
switch over to Mmap. I have a legacy API using BufferedIndexInput & was
required to wrap Mmap index-input using that & that's when I ran into this
buffering issue. Since it had a 'supposedly' benign effect, I thought may
be could buffer it

Like you said, it's better to push for changes in legacy code rather than
adopting clumsy buffering for ByteBufferIndexInput

--
Ravi

On Thu, Oct 20, 2016 at 9:27 PM, Uwe Schindler <uw...@thetaphi.de> wrote:

> Hi,
>
> adding buffering to ByteBufferIndexInput would not only be an
> anti-pattern, it would also slowdown. What is the sense of coping data from
> memory location A to memory location B before reading?
>
> I'd suggest to read this and understand what virtual memory and
> ByteBufferIndexInput does before trying to do anything like this:
> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>
> Kind regards,
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
> > -----Original Message-----
> > From: Ravikumar Govindarajan [mailto:ravikumar.govindarajan@gmail.com]
> > Sent: Thursday, October 20, 2016 9:26 AM
> > To: java-user@lucene.apache.org
> > Subject: Can ByteBufferIndexInput use buffering?
> >
> > When we use NIOFSDirectory, lucene internally uses buffering via
> > BufferedIndexInput (1KB etc...) while reading from the file..
> >
> > However, for MmapDirectory (ByteBufferIndexInput) there is no such
> > buffering & data is read from the mapped bytes directly...
> >
> > Will it be too much of a performance drag if I wrap ByteBufferIndexInput
> > with a BufferedIndex? I mean like, is it an anti-pattern of zero-copy
> reads
> > etc...
> >
> > Any help is much appreciated
> >
> > --
> > Ravi
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

RE: Can ByteBufferIndexInput use buffering?

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi,

adding buffering to ByteBufferIndexInput would not only be an anti-pattern, it would also slowdown. What is the sense of coping data from memory location A to memory location B before reading?

I'd suggest to read this and understand what virtual memory and ByteBufferIndexInput does before trying to do anything like this: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

Kind regards,
Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Ravikumar Govindarajan [mailto:ravikumar.govindarajan@gmail.com]
> Sent: Thursday, October 20, 2016 9:26 AM
> To: java-user@lucene.apache.org
> Subject: Can ByteBufferIndexInput use buffering?
> 
> When we use NIOFSDirectory, lucene internally uses buffering via
> BufferedIndexInput (1KB etc...) while reading from the file..
> 
> However, for MmapDirectory (ByteBufferIndexInput) there is no such
> buffering & data is read from the mapped bytes directly...
> 
> Will it be too much of a performance drag if I wrap ByteBufferIndexInput
> with a BufferedIndex? I mean like, is it an anti-pattern of zero-copy reads
> etc...
> 
> Any help is much appreciated
> 
> --
> Ravi


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org