You are viewing a plain text version of this content. The canonical link for it is here.

Posted to general@lucene.apache.org by sunnyfr <jo...@gmail.com> on 2009/04/01 17:20:49 UTC

replication caching high query and lot of update

Hi everybody,

This is my issue :
I've a master which update 20 000 docs every 30mn. (and a lot more nightly)
So my index is merging almost every update, segment increase too much.
So my master replicate all the index to the slave almost every 30mn.

My point is, my slaves are very slow during it get back index by replication
script.
I enclosed my cpu activity just during an update. 
http://www.nabble.com/file/p22828976/CPU.jpg CPU.jpg 
Then you can Imagine 15-20 request second turnd badly my cpu.

>What is the best configuration for tomcat with this kind of activity?

I've linux / solr 1.4 / 8G RAM  / 8CPU.
Data index size : 11G - 14Mdocs

>And about my cache, with a such activity, is it interesting to have a cache
stored or not ?? 

My big point is during replication, my respond time of my request is sooo
slow.

Thanks a lot,



-- 
View this message in context: http://www.nabble.com/replication-caching-high-query-and-lot-of-update-tp22828976p22828976.html
Sent from the Lucene - General mailing list archive at Nabble.com.

Re: replication caching high query and lot of update

Posted by Yonik Seeley <yo...@lucidimagination.com>.

On Thu, Apr 2, 2009 at 7:29 AM, sunnyfr <jo...@gmail.com> wrote:
> Just another question how many % of memory would you give to the jvm.
> I've 8G of ram (8cpu) and my index data is 11G. what would you reckon as
> xmx?

This is really Solr specific stuff and should be on solr-user.

You want to give the JVM the least amount of memory such that
everything still works, and allowing for a little index growth.  What
that amount is will depend a lot on which fields you sort on , which
fields you facet on, all your caches, etc.  The reason you want to
minimize the amount of JVM memory is so that the OS can cache
important parts of the index with the remaining free RAM.

Large heap sizes also lead to long GC pauses.

-Yonik
http://www.lucidimagination.com

Re: replication caching high query and lot of update

Posted by sunnyfr <jo...@gmail.com>.

Hi Ted,

Do you have advice for doing that ??? 
I've linux.

Just another question how many % of memory would you give to the jvm.
I've 8G of ram (8cpu) and my index data is 11G. what would you reckon as
xmx?


Thanks a lot,


Ted Dunning wrote:
> 
> This may be largely due to poor I/O scheduling at the OS layer.
> 
> Try switching to an I/O scheduler that puts reads ahead of writes.
> 
> On Wed, Apr 1, 2009 at 8:20 AM, sunnyfr <jo...@gmail.com> wrote:
> 
>> >And about my cache, with a such activity, is it interesting to have a
>> cache
>> stored or not ??
>>
>> My big point is during replication, my respond time of my request is sooo
>> slow.
>>
> 
> 
> 
> -- 
> Ted Dunning, CTO
> DeepDyve
> 
> 

-- 
View this message in context: http://www.nabble.com/replication-caching-high-query-and-lot-of-update-tp22828976p22845789.html
Sent from the Lucene - General mailing list archive at Nabble.com.

Re: replication caching high query and lot of update

Posted by Ted Dunning <te...@gmail.com>.

I hope that these are not yet production machines that you are working on.
Otherwise, this could turn into the painful sort of learning experience.

That said, changing I/O scheduler is not likely to cause total disaster.
Normally, this would be considered highly advanced system administration,
but it is relatively safer than many other things you could be doing.

On most modern linux machines there is a directory called /sys that allows
tuning of various aspects of the system.  For changing the I/O scheduler,
you need to find the directory /sys/block/<disk-device> where disk-device
corresponds to the disk you are using.  You can find out which disk using
mount.

On one of my machines, mount output looks like this:

... lots of stuff deleted ..
/dev/sdb1 on /data1 type ext3 (rw,relatime)

That means that /data1 is mount on disk /dev/sdb in partition 1.

Looking at /sys/block/sdb/queue, I see this:

$ ls /sys/block/sdb/queue/
iosched/           max_sectors_kb     read_ahead_kb
max_hw_sectors_kb  nr_requests        scheduler

The directory /sys/block/sdb/queue/iosched is used to tune whichever scheduler
you have chosen and /sys/block/sdb/queue/scheduler is used to select which
scheduler.

I can look at my machine to determine which scheduler I am using:

$ cat /sys/block/sdb/queue/scheduler
noop anticipatory [deadline] cfq

The square brackets indicate which one is in play.

You can find a bit more information at
http://en.wikipedia.org/wiki/Deadline_scheduler or
http://www.mjmwired.net/kernel/Documentation/block/switching-sched.txt

>From here you are pretty much on your own.  You will need to be very careful
to observe how your system works and you will need to understand that the
"files" under /sys are not actually files, but are actually clever little
illusions that let you configure the system (by writing to these "files")
and interrogate the status of the system (by reading them).

My recommendation on the tunables is:

read_expire      *Make this relatively small (say 100 or 200 ms)*
write_expire     *Make this relatively long (5000 to 10000 ms)*
fifo_batch       *Default is 16 which might be fine.  Try larger and
smaller values if you have time.*
writes_starved   *Default is 2, which could be increased reasonably.
Try 5 or 10.*
front_merges     *Leave alone.*

The goal with these recommendations is to aggressively starve write
performance in favor of read performance.  This will slow down your updates
somewhat but will hopefully avoid your current problem of long delays.

On Mon, Apr 6, 2009 at 3:07 AM, sunnyfr <jo...@gmail.com> wrote:

>
> Hi Ted,
>
> I'm newbie in linux, can you give me advice to set this ?
>
> Thanks,
>
>
> Ted Dunning wrote:
> >
> > This may be largely due to poor I/O scheduling at the OS layer.
> >
> > Try switching to an I/O scheduler that puts reads ahead of writes.
> >
> > On Wed, Apr 1, 2009 at 8:20 AM, sunnyfr <jo...@gmail.com> wrote:
> >
> >> >And about my cache, with a such activity, is it interesting to have a
> >> cache
> >> stored or not ??
> >>
> >> My big point is during replication, my respond time of my request is
> sooo
> >> slow.
> >>
> >
> >
> >
> > --
> > Ted Dunning, CTO
> > DeepDyve
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/replication-caching-high-query-and-lot-of-update-tp22828976p22905809.html
> Sent from the Lucene - General mailing list archive at Nabble.com.
>
>

-- 
Ted Dunning, CTO
DeepDyve

Re: replication caching high query and lot of update

Posted by sunnyfr <jo...@gmail.com>.

Hi Ted,

I'm newbie in linux, can you give me advice to set this ? 

Thanks,


Ted Dunning wrote:
> 
> This may be largely due to poor I/O scheduling at the OS layer.
> 
> Try switching to an I/O scheduler that puts reads ahead of writes.
> 
> On Wed, Apr 1, 2009 at 8:20 AM, sunnyfr <jo...@gmail.com> wrote:
> 
>> >And about my cache, with a such activity, is it interesting to have a
>> cache
>> stored or not ??
>>
>> My big point is during replication, my respond time of my request is sooo
>> slow.
>>
> 
> 
> 
> -- 
> Ted Dunning, CTO
> DeepDyve
> 
> 

-- 
View this message in context: http://www.nabble.com/replication-caching-high-query-and-lot-of-update-tp22828976p22905809.html
Sent from the Lucene - General mailing list archive at Nabble.com.

Re: replication caching high query and lot of update

Posted by Ted Dunning <te...@gmail.com>.

This may be largely due to poor I/O scheduling at the OS layer.

Try switching to an I/O scheduler that puts reads ahead of writes.

On Wed, Apr 1, 2009 at 8:20 AM, sunnyfr <jo...@gmail.com> wrote:

> >And about my cache, with a such activity, is it interesting to have a
> cache
> stored or not ??
>
> My big point is during replication, my respond time of my request is sooo
> slow.
>

-- 
Ted Dunning, CTO
DeepDyve