You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Gopikrishnan Subramani <go...@gmail.com> on 2005/08/02 22:05:40 UTC

IO bandwidth throttling

Hello,

Is there a way I can control the IO bandwidth utilized by Lucene? 

Here is my scenario. RAMDirectory is used to build a in-memory index
and finally the index size approaches a limit, the contents are
flushed to a FSDirectory. The index size could be approximately 512
MB. I'm a bit concerned that this might affect other NFS clients of
the same NFS server, even if it's for a short period. To avoid this I
would like to implement some sort of limit on the amount of data
transferred to the filesystem akin to the --bwlimit with the rsync
command.

Thanks,
Gopi

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: IO bandwidth throttling

Posted by Chris Lamprecht <cl...@gmail.com>.
Hi Ben,

Yes -- I would prefer to let the OS handle his, especially if it can
save me some coding.  Is there is a way (under linux) to limit a
certain process's disk utilization?  It seems like nice(1) just
modifies the scheduling priority, I'm assuming this means CPU
scheduling.  In my case, I'm trying to prevent a cron task that
updates the lucene search index from consuming the disk, causing the
search to slow down.

-chris

On 9/1/05, Ben Gollmer <be...@jatosoft.com> wrote:
> Chris Lamprecht wrote:
> > I've wanted something similar, for the same purpose -- to keep lucene
> > from consuming disk I/O resources when another process is running on
> > the same machine.
> 
> Sorry for jumping in (I'm a Lucene newb) but isn't this better handled
> by the OS? On a Unix box I would just renice the process or set some
> ulimits. Adding code to each application that might possibly need
> bandwidth or memory restrictions seems redundant, not to mention a chore :)
> 
> 
> Cheers,
> --
> Ben
> 
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: IO bandwidth throttling

Posted by Dan Armbrust <da...@gmail.com>.
Ben Gollmer wrote:

 >Chris Lamprecht wrote:
 > 
 >
 >>I've wanted something similar, for the same purpose -- to keep lucene
 >>from consuming disk I/O resources when another process is running on
 >>the same machine.
 >>   
 >>
 >
 >Sorry for jumping in (I'm a Lucene newb) but isn't this better handled
 >by the OS? On a Unix box I would just renice the process or set some
 >ulimits. Adding code to each application that might possibly need
 >bandwidth or memory restrictions seems redundant, not to mention a 
chore :)
 >
 >
 >Cheers,
 > 
 >
 As far as I know, 'nice' only controls CPU usage, not disk IO usage.  
Since lucene doesn't do a lot of CPU intensive work, renicing a lucene 
process does little to allow other processes to get access to your disks.

 On Windows, things are similar.  A single process, even with the 
priority forced down, can still kill the performance of the entire 
machine if it is doing large amounts of disk io (on the system disk).

 Dan

-- 
****************************
Daniel Armbrust
Biomedical Informatics
Mayo Clinic Rochester
daniel.armbrust(at)mayo.edu
http://informatics.mayo.edu/
 




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: IO bandwidth throttling

Posted by Ben Gollmer <be...@jatosoft.com>.
Chris Lamprecht wrote:
> I've wanted something similar, for the same purpose -- to keep lucene
> from consuming disk I/O resources when another process is running on
> the same machine.

Sorry for jumping in (I'm a Lucene newb) but isn't this better handled
by the OS? On a Unix box I would just renice the process or set some
ulimits. Adding code to each application that might possibly need
bandwidth or memory restrictions seems redundant, not to mention a chore :)


Cheers,
-- 
Ben

RE: IO bandwidth throttling

Posted by Indu Abeyaratna <ia...@aconex.com>.
I don’t know whether this is helpful to you.
Our application is base on entity "project". Say we have 10 projects, then
we create 10 indexers.  Then we use Jboss cache to load those indexes as
when needed.

Jboss cache has functionality to limit the size of memory usage.

Base points is breaking the index into multiple indexes, and load them as
when needed, that way you don’t need to have full index in the memory at a
given time.

Indu

-----Original Message-----
From: Chris Lamprecht [mailto:clamprecht@gmail.com]
Sent: Wednesday, 3 August 2005 8:20 AM
To: java-user@lucene.apache.org
Subject: Re: IO bandwidth throttling


I've wanted something similar, for the same purpose -- to keep lucene
from consuming disk I/O resources when another process is running on
the same machine.

A general solution might be to define a simple interface such as

interface IndexInputOutputListener {
    void willReadBytes(int numberOfBytes);
    void willWriteBytes(int numberOfBytes);
}

If non-null, then Lucene can call these methods just before it calls
IndexInput.readBytes() or IndexOutput.writeBytes()   (referring to the
Lucene 1.9 API).  People could implement these methods to do whatever
they want, including throttling I/O, or just keeping I/O profiling
statistics.

Also the above interface could be split into two, an
IndexInputListener and IndexOutputListener, if people are likely to
use one and not the other.

In the meantime, you may be able to do what you want by extending or
decorating FSDirectory, and do the throttling just before or after
calling the real FSDirectory's read/write methods (in FSIndexOutput
and FSIndexInput inner classes).

-chris

On 8/2/05, Otis Gospodnetic <ot...@yahoo.com> wrote:
> Hi,
>
> > Ok, let me rephrase the question. Assuming the RAMDirectory holds
> > approximately 500 MB of data which needs to be written to the
> > filesystem, I'm afraid that sending this much data in one shot might
> > choke the NFS. Is there a parameter with FSDirectory with which I can
> > instruct Lucene to restrict the amount of data my application writes
> > to the filesystem per second.
>
> There is no support for that, but if you add it, this may be a nice
> enhancement.
>
> > BTW, I could see maxMergeDocs, but not maxBufferedDocs with the
> > IndexWriter. Otis, is there really some parameter with that name? I
> > think maxMergeDocs may not work well here because the size of the
> > documents are pretty random, ranging from few hundred bytes to tens
> > of megabytes.
>
> That's the one.   You are looking at 1.4.3, and I was referring to the
> new name of that parameter from 1.9-dev1.
>
> Otis
>
> > On 8/3/05, Otis Gospodnetic <ot...@yahoo.com> wrote:
> > > While not exactly what you are describing, you can use one of the
> > > IndexWriter parameters (maxBufferedDocs) to control the size of the
> > > RAMDirectory that's used as a buffer during indexing.
> > >
> > > Otis
> > >
> > > --- Gopikrishnan Subramani <go...@gmail.com> wrote:
> > >
> > > > Hello,
> > > >
> > > > Is there a way I can control the IO bandwidth utilized by Lucene?
> > > >
> > > > Here is my scenario. RAMDirectory is used to build a in-memory
> > index
> > > > and finally the index size approaches a limit, the contents are
> > > > flushed to a FSDirectory. The index size could be approximately
> > 512
> > > > MB. I'm a bit concerned that this might affect other NFS clients
> > of
> > > > the same NFS server, even if it's for a short period. To avoid
> > this I
> > > > would like to implement some sort of limit on the amount of data
> > > > transferred to the filesystem akin to the --bwlimit with the
> > rsync
> > > > command.
> > > >
> > > > Thanks,
> > > > Gopi
> > > >
> > > >
> > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > >
> > > >
> > >
> > >
> > >
> > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: IO bandwidth throttling

Posted by Chris Lamprecht <cl...@gmail.com>.
I've wanted something similar, for the same purpose -- to keep lucene
from consuming disk I/O resources when another process is running on
the same machine.

A general solution might be to define a simple interface such as

interface IndexInputOutputListener {
    void willReadBytes(int numberOfBytes);
    void willWriteBytes(int numberOfBytes);
}

If non-null, then Lucene can call these methods just before it calls
IndexInput.readBytes() or IndexOutput.writeBytes()   (referring to the
Lucene 1.9 API).  People could implement these methods to do whatever
they want, including throttling I/O, or just keeping I/O profiling
statistics.

Also the above interface could be split into two, an
IndexInputListener and IndexOutputListener, if people are likely to
use one and not the other.

In the meantime, you may be able to do what you want by extending or
decorating FSDirectory, and do the throttling just before or after
calling the real FSDirectory's read/write methods (in FSIndexOutput
and FSIndexInput inner classes).

-chris

On 8/2/05, Otis Gospodnetic <ot...@yahoo.com> wrote:
> Hi,
> 
> > Ok, let me rephrase the question. Assuming the RAMDirectory holds
> > approximately 500 MB of data which needs to be written to the
> > filesystem, I'm afraid that sending this much data in one shot might
> > choke the NFS. Is there a parameter with FSDirectory with which I can
> > instruct Lucene to restrict the amount of data my application writes
> > to the filesystem per second.
> 
> There is no support for that, but if you add it, this may be a nice
> enhancement.
> 
> > BTW, I could see maxMergeDocs, but not maxBufferedDocs with the
> > IndexWriter. Otis, is there really some parameter with that name? I
> > think maxMergeDocs may not work well here because the size of the
> > documents are pretty random, ranging from few hundred bytes to tens
> > of megabytes.
> 
> That's the one.   You are looking at 1.4.3, and I was referring to the
> new name of that parameter from 1.9-dev1.
> 
> Otis
> 
> > On 8/3/05, Otis Gospodnetic <ot...@yahoo.com> wrote:
> > > While not exactly what you are describing, you can use one of the
> > > IndexWriter parameters (maxBufferedDocs) to control the size of the
> > > RAMDirectory that's used as a buffer during indexing.
> > >
> > > Otis
> > >
> > > --- Gopikrishnan Subramani <go...@gmail.com> wrote:
> > >
> > > > Hello,
> > > >
> > > > Is there a way I can control the IO bandwidth utilized by Lucene?
> > > >
> > > > Here is my scenario. RAMDirectory is used to build a in-memory
> > index
> > > > and finally the index size approaches a limit, the contents are
> > > > flushed to a FSDirectory. The index size could be approximately
> > 512
> > > > MB. I'm a bit concerned that this might affect other NFS clients
> > of
> > > > the same NFS server, even if it's for a short period. To avoid
> > this I
> > > > would like to implement some sort of limit on the amount of data
> > > > transferred to the filesystem akin to the --bwlimit with the
> > rsync
> > > > command.
> > > >
> > > > Thanks,
> > > > Gopi
> > > >
> > > >
> > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > >
> > > >
> > >
> > >
> > >
> > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: IO bandwidth throttling

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Hi,

> Ok, let me rephrase the question. Assuming the RAMDirectory holds
> approximately 500 MB of data which needs to be written to the
> filesystem, I'm afraid that sending this much data in one shot might
> choke the NFS. Is there a parameter with FSDirectory with which I can
> instruct Lucene to restrict the amount of data my application writes
> to the filesystem per second.

There is no support for that, but if you add it, this may be a nice
enhancement.

> BTW, I could see maxMergeDocs, but not maxBufferedDocs with the
> IndexWriter. Otis, is there really some parameter with that name? I
> think maxMergeDocs may not work well here because the size of the
> documents are pretty random, ranging from few hundred bytes to tens
> of megabytes.

That's the one.   You are looking at 1.4.3, and I was referring to the
new name of that parameter from 1.9-dev1.

Otis

> On 8/3/05, Otis Gospodnetic <ot...@yahoo.com> wrote:
> > While not exactly what you are describing, you can use one of the
> > IndexWriter parameters (maxBufferedDocs) to control the size of the
> > RAMDirectory that's used as a buffer during indexing.
> > 
> > Otis
> > 
> > --- Gopikrishnan Subramani <go...@gmail.com> wrote:
> > 
> > > Hello,
> > >
> > > Is there a way I can control the IO bandwidth utilized by Lucene?
> > >
> > > Here is my scenario. RAMDirectory is used to build a in-memory
> index
> > > and finally the index size approaches a limit, the contents are
> > > flushed to a FSDirectory. The index size could be approximately
> 512
> > > MB. I'm a bit concerned that this might affect other NFS clients
> of
> > > the same NFS server, even if it's for a short period. To avoid
> this I
> > > would like to implement some sort of limit on the amount of data
> > > transferred to the filesystem akin to the --bwlimit with the
> rsync
> > > command.
> > >
> > > Thanks,
> > > Gopi
> > >
> > >
> ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> > 
> > 
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> > 
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: IO bandwidth throttling

Posted by Gopikrishnan Subramani <go...@gmail.com>.
Ok, let me rephrase the question. Assuming the RAMDirectory holds
approximately 500 MB of data which needs to be written to the
filesystem, I'm afraid that sending this much data in one shot might
choke the NFS. Is there a parameter with FSDirectory with which I can
instruct Lucene to restrict the amount of data my application writes
to the filesystem per second.

BTW, I could see maxMergeDocs, but not maxBufferedDocs with the
IndexWriter. Otis, is there really some parameter with that name? I
think maxMergeDocs may not work well here because the size of the
documents are pretty random, ranging from few hundred bytes to tens of
megabytes.

Thanks,
Gopi

On 8/3/05, Otis Gospodnetic <ot...@yahoo.com> wrote:
> While not exactly what you are describing, you can use one of the
> IndexWriter parameters (maxBufferedDocs) to control the size of the
> RAMDirectory that's used as a buffer during indexing.
> 
> Otis
> 
> --- Gopikrishnan Subramani <go...@gmail.com> wrote:
> 
> > Hello,
> >
> > Is there a way I can control the IO bandwidth utilized by Lucene?
> >
> > Here is my scenario. RAMDirectory is used to build a in-memory index
> > and finally the index size approaches a limit, the contents are
> > flushed to a FSDirectory. The index size could be approximately 512
> > MB. I'm a bit concerned that this might affect other NFS clients of
> > the same NFS server, even if it's for a short period. To avoid this I
> > would like to implement some sort of limit on the amount of data
> > transferred to the filesystem akin to the --bwlimit with the rsync
> > command.
> >
> > Thanks,
> > Gopi
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: IO bandwidth throttling

Posted by Otis Gospodnetic <ot...@yahoo.com>.
While not exactly what you are describing, you can use one of the
IndexWriter parameters (maxBufferedDocs) to control the size of the
RAMDirectory that's used as a buffer during indexing.

Otis

--- Gopikrishnan Subramani <go...@gmail.com> wrote:

> Hello,
> 
> Is there a way I can control the IO bandwidth utilized by Lucene? 
> 
> Here is my scenario. RAMDirectory is used to build a in-memory index
> and finally the index size approaches a limit, the contents are
> flushed to a FSDirectory. The index size could be approximately 512
> MB. I'm a bit concerned that this might affect other NFS clients of
> the same NFS server, even if it's for a short period. To avoid this I
> would like to implement some sort of limit on the amount of data
> transferred to the filesystem akin to the --bwlimit with the rsync
> command.
> 
> Thanks,
> Gopi
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org