You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Po-Yu Chuang <ra...@gmail.com> on 2014/11/29 06:59:04 UTC

Constantly high disk read access (40-60M/s)

Hi all,

I am using Solr 4.9 with Tomcat. Thanks to the suggestions from Yonik and
Dmitry about the slow start up. Everything works fine now, but I noticed
that the load average of the server is high because there is constantly
heavy disk read access. Please point me some directions.

Some numbers about my system:
RAM: 18G
swap space: 2G
number of documents: 27 million
Solr home: 185G
disk read access constantly 40-60M/s
document cache size: 16K entries
document cache hit ratio: 0.65
query cache size: 16K
query cache hit ratio: 0.03

At first, I wondered if the disk read comes from swap, so I decreased the
swappiness from 60 to 10, but the disk read is still there, which means
that the disk read access does not result from swapping in.

Then, I tried different document cache size and query different size. The
effect on changing query cache size is not obvious. I tried 512, 16K, 256K
entries and the hit ratio is between 0.01 to 0.03.

For document cache, the larger cache size did improve the hit ratio of
document cache size (I tried 512, 16K, 256K, 512K, 1024K and the hit ratio
is between 0.58 - 0.87), but the disk read is still high.

Is adjusting document cache size a reasonable direction? Or I should just
increase the physical memory? Is there any method to estimate the right
size of document cache (or other caches) and to estimate the size of
physical memory needed?

Thanks,
Po-Yu

Re: Constantly high disk read access (40-60M/s)

Posted by Michael Sokolov <ms...@safaribooksonline.com>.

My read on docvalues is that "typical" use is to keep them in memory - 
at least when they are used, and if you are creating them, it makes 
sense to assume you are going to be using them?

-Mike

On 11/29/14 1:25 PM, Alexandre Rafalovitch wrote:
> There are also docValues files as well, right? And they have different
> memory requirements depending on how they are setup. (not 100% sure
> what I am trying to say here, though)
>
> Regards,
>     Alex.
> Personal: http://www.outerthoughts.com/ and @arafalov
> Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
> Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
>
>
> On 29 November 2014 at 13:16, Michael Sokolov
> <ms...@safaribooksonline.com> wrote:
>> Of course testing is best, but you can also get an idea of the size of the
>> non-storage part of your index by looking in the solr index folder and
>> subtracting the size of the files containing the stored fields from the
>> total size of the index.  This depends of course on the internal storage
>> strategy of Lucene and may change from release to release, but it is
>> documented. The .fdt and .fdx files are the stored field files (currently,
>> at least, and if you don't have everything in a compound file).  If you are
>> indexing term vectors (.tvd and .tvf files) as well, I think these may also
>> be able to be excluded from the index size also when calculating the
>> required memory, at least based on typical usage patterns for term vectors
>> (ie highlighting).
>>
>> I wonder if there's any value in providing this metric (total index size -
>> stored field size - term vector size) as part of the admin panel?  Is it
>> meaningful?  It seems like there would be a lot of cases where it could give
>> a good rule of thumb for memory sizing, and it would save having to root
>> around in the index folder.
>>
>> -Mike
>>
>>
>> On 11/29/14 12:16 PM, Erick Erickson wrote:
>>> bq: You should have memory to fit your whole database in disk cache and
>>> then
>>> some more.
>>>
>>> I have to disagree here if for no other reason than stored data, which
>>> is irrelevant
>>> for searching, may make up virtually none or virtually all of your
>>> on-disk space.
>>> Saying it all needs to fit in disk cache is too broad-brush a
>>> statement, gotta test.
>>>
>>> In this case, though, I _do_ think that there's not enough memory here,
>>> Toke's
>>> comments are spot on.
>>>
>>> On Sat, Nov 29, 2014 at 2:02 AM, Toke Eskildsen <te...@statsbiblioteket.dk>
>>> wrote:
>>>> Po-Yu Chuang [ratbert.chuang@gmail.com] wrote:
>>>>> [...] Everything works fine now, but I noticed that the load
>>>>> average of the server is high because there is constantly
>>>>> heavy disk read access. Please point me some directions.
>>>>> RAM: 18G
>>>>> Solr home: 185G
>>>>> disk read access constantly 40-60M/s
>>>> Solr search performance is tightly coupled to the speed of small random
>>>> reads. There are two obvious ways of ensuring that in these days:
>>>>
>>>> 1) Add more RAM to the server, so that the disk cache can hold a larger
>>>> part of the index. If you add enough RAM (depends on your index, but 50-100%
>>>> of the index size is a rule of thumb), you get "ideal" storage speed, by
>>>> which I mean that the bottleneck moves away from storage. If you are using
>>>> spinning drives, the 18GB of RAM is not a lot for a 185GB index.
>>>>
>>>> 2) Use SSDs instead of spinning drives (if you do not already do so). The
>>>> speed-up depends a lot on what you are doing, but is is a cheap upgrade and
>>>> it can later be coupled with extra RAM if it is not enough in itself.
>>>>
>>>> The Solr Wiki has this:
>>>> https://wiki.apache.org/solr/SolrPerformanceProblems
>>>> And I have this:
>>>> http://sbdevel.wordpress.com/2013/06/06/memory-is-overrated/
>>>>
>>>> - Toke Eskildsen
>>

Re: Constantly high disk read access (40-60M/s)

Posted by Alexandre Rafalovitch <ar...@gmail.com>.

There are also docValues files as well, right? And they have different
memory requirements depending on how they are setup. (not 100% sure
what I am trying to say here, though)

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 29 November 2014 at 13:16, Michael Sokolov
<ms...@safaribooksonline.com> wrote:
> Of course testing is best, but you can also get an idea of the size of the
> non-storage part of your index by looking in the solr index folder and
> subtracting the size of the files containing the stored fields from the
> total size of the index.  This depends of course on the internal storage
> strategy of Lucene and may change from release to release, but it is
> documented. The .fdt and .fdx files are the stored field files (currently,
> at least, and if you don't have everything in a compound file).  If you are
> indexing term vectors (.tvd and .tvf files) as well, I think these may also
> be able to be excluded from the index size also when calculating the
> required memory, at least based on typical usage patterns for term vectors
> (ie highlighting).
>
> I wonder if there's any value in providing this metric (total index size -
> stored field size - term vector size) as part of the admin panel?  Is it
> meaningful?  It seems like there would be a lot of cases where it could give
> a good rule of thumb for memory sizing, and it would save having to root
> around in the index folder.
>
> -Mike
>
>
> On 11/29/14 12:16 PM, Erick Erickson wrote:
>>
>> bq: You should have memory to fit your whole database in disk cache and
>> then
>> some more.
>>
>> I have to disagree here if for no other reason than stored data, which
>> is irrelevant
>> for searching, may make up virtually none or virtually all of your
>> on-disk space.
>> Saying it all needs to fit in disk cache is too broad-brush a
>> statement, gotta test.
>>
>> In this case, though, I _do_ think that there's not enough memory here,
>> Toke's
>> comments are spot on.
>>
>> On Sat, Nov 29, 2014 at 2:02 AM, Toke Eskildsen <te...@statsbiblioteket.dk>
>> wrote:
>>>
>>> Po-Yu Chuang [ratbert.chuang@gmail.com] wrote:
>>>>
>>>> [...] Everything works fine now, but I noticed that the load
>>>> average of the server is high because there is constantly
>>>> heavy disk read access. Please point me some directions.
>>>> RAM: 18G
>>>> Solr home: 185G
>>>> disk read access constantly 40-60M/s
>>>
>>> Solr search performance is tightly coupled to the speed of small random
>>> reads. There are two obvious ways of ensuring that in these days:
>>>
>>> 1) Add more RAM to the server, so that the disk cache can hold a larger
>>> part of the index. If you add enough RAM (depends on your index, but 50-100%
>>> of the index size is a rule of thumb), you get "ideal" storage speed, by
>>> which I mean that the bottleneck moves away from storage. If you are using
>>> spinning drives, the 18GB of RAM is not a lot for a 185GB index.
>>>
>>> 2) Use SSDs instead of spinning drives (if you do not already do so). The
>>> speed-up depends a lot on what you are doing, but is is a cheap upgrade and
>>> it can later be coupled with extra RAM if it is not enough in itself.
>>>
>>> The Solr Wiki has this:
>>> https://wiki.apache.org/solr/SolrPerformanceProblems
>>> And I have this:
>>> http://sbdevel.wordpress.com/2013/06/06/memory-is-overrated/
>>>
>>> - Toke Eskildsen
>
>

Re: Standardized index metrics (Was: Constantly high disk read access (40-60M/s))

Posted by Otis Gospodnetic <ot...@gmail.com>.

Hi,

On Sat, Nov 29, 2014 at 2:27 PM, Michael Sokolov <
msokolov@safaribooksonline.com> wrote:

> On 11/29/14 1:30 PM, Toke Eskildsen wrote:
>
>> Michael Sokolov [msokolov@safaribooksonline.com] wrote:
>>
>>> I wonder if there's any value in providing this metric (total index size
>>> - stored field size - term vector size) as part of the admin panel?  Is
>>> it meaningful?  It seems like there would be a lot of cases where it
>>> could give a good rule of thumb for memory sizing, and it would save
>>> having to root around in the index folder.
>>>
>> At Lucene/Solr Revolution, I talked with Alexandre Rafalovitch about
>> this. We know (https://lucidworks.com/blog/sizing-hardware-in-the-
>> abstract-why-we-dont-have-a-definitive-answer/) that we cannot get the
>> full picture of an index, but it is a weekly occurrence on this mailing
>> list that people asks questions where it helps to have a gist of the index
>> metrics and how the index is used.
>>
>> Some sort of "Copy the content of this concentrated metrics box, when you
>> need to talk with other people about your index"-functionality in the admin
>> panel might help with this. To get an idea of usage, it could also contain
>> a few non-filled fields, such as "peak queries per second" or "typical
>> queries".
>>
>> - Toke Eskildsen
>>
> Yes - the cautions about the need for prototyping are all very well, but
> even if you take that advice, and build a prototype, it's not clear how to
> tell whether your setup has enough memory or not. You can add more and
> measure response times, but even then you only have a gross measurement,
> and no way of knowing where, in detail, the memory is being used.  Also,
> you might be able to improve your system to make better use of memory with
> more precise information. It seems like we ought to be able to monitor a
> running system, observe its memory requirements over time, and report on
> those.
>

+1 to that!
I haven't been following this aspect of development super closely, but I
believe there are memory/size estimators for various things at Lucene level
that Elasticsearch is nicely exposing via its stats API.  I don't know the
specifics around those estimators without digging in, otherwise I'd open a
JIRA, because I think this is valuable information -- at Sematext we
regularly deal with hardware sizing, memory / CPU usage estimates, etc.
etc., so the more of this info is surfaced the easier it will be for people
to work with Solr.

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/

Re: Standardized index metrics (Was: Constantly high disk read access (40-60M/s))

Posted by Michael Sokolov <ms...@safaribooksonline.com>.

On 11/29/14 1:30 PM, Toke Eskildsen wrote:
> Michael Sokolov [msokolov@safaribooksonline.com] wrote:
>> I wonder if there's any value in providing this metric (total index size
>> - stored field size - term vector size) as part of the admin panel?  Is
>> it meaningful?  It seems like there would be a lot of cases where it
>> could give a good rule of thumb for memory sizing, and it would save
>> having to root around in the index folder.
> At Lucene/Solr Revolution, I talked with Alexandre Rafalovitch about this. We know (https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/) that we cannot get the full picture of an index, but it is a weekly occurrence on this mailing list that people asks questions where it helps to have a gist of the index metrics and how the index is used.
>
> Some sort of "Copy the content of this concentrated metrics box, when you need to talk with other people about your index"-functionality in the admin panel might help with this. To get an idea of usage, it could also contain a few non-filled fields, such as "peak queries per second" or "typical queries".
>
> - Toke Eskildsen
Yes - the cautions about the need for prototyping are all very well, but 
even if you take that advice, and build a prototype, it's not clear how 
to tell whether your setup has enough memory or not. You can add more 
and measure response times, but even then you only have a gross 
measurement, and no way of knowing where, in detail, the memory is being 
used.  Also, you might be able to improve your system to make better use 
of memory with more precise information. It seems like we ought to be 
able to monitor a running system, observe its memory requirements over 
time, and report on those.

-Mike

Standardized index metrics (Was: Constantly high disk read access (40-60M/s))

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.

Michael Sokolov [msokolov@safaribooksonline.com] wrote:
> I wonder if there's any value in providing this metric (total index size
> - stored field size - term vector size) as part of the admin panel?  Is
> it meaningful?  It seems like there would be a lot of cases where it
> could give a good rule of thumb for memory sizing, and it would save
> having to root around in the index folder.

At Lucene/Solr Revolution, I talked with Alexandre Rafalovitch about this. We know (https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/) that we cannot get the full picture of an index, but it is a weekly occurrence on this mailing list that people asks questions where it helps to have a gist of the index metrics and how the index is used.

Some sort of "Copy the content of this concentrated metrics box, when you need to talk with other people about your index"-functionality in the admin panel might help with this. To get an idea of usage, it could also contain a few non-filled fields, such as "peak queries per second" or "typical queries".

- Toke Eskildsen

Re: Constantly high disk read access (40-60M/s)

Posted by Michael Sokolov <ms...@safaribooksonline.com>.

Of course testing is best, but you can also get an idea of the size of 
the non-storage part of your index by looking in the solr index folder 
and subtracting the size of the files containing the stored fields from 
the total size of the index.  This depends of course on the internal 
storage strategy of Lucene and may change from release to release, but 
it is documented. The .fdt and .fdx files are the stored field files 
(currently, at least, and if you don't have everything in a compound 
file).  If you are indexing term vectors (.tvd and .tvf files) as well, 
I think these may also be able to be excluded from the index size also 
when calculating the required memory, at least based on typical usage 
patterns for term vectors (ie highlighting).

I wonder if there's any value in providing this metric (total index size 
- stored field size - term vector size) as part of the admin panel?  Is 
it meaningful?  It seems like there would be a lot of cases where it 
could give a good rule of thumb for memory sizing, and it would save 
having to root around in the index folder.

-Mike

On 11/29/14 12:16 PM, Erick Erickson wrote:
> bq: You should have memory to fit your whole database in disk cache and then
> some more.
>
> I have to disagree here if for no other reason than stored data, which
> is irrelevant
> for searching, may make up virtually none or virtually all of your
> on-disk space.
> Saying it all needs to fit in disk cache is too broad-brush a
> statement, gotta test.
>
> In this case, though, I _do_ think that there's not enough memory here, Toke's
> comments are spot on.
>
> On Sat, Nov 29, 2014 at 2:02 AM, Toke Eskildsen <te...@statsbiblioteket.dk> wrote:
>> Po-Yu Chuang [ratbert.chuang@gmail.com] wrote:
>>> [...] Everything works fine now, but I noticed that the load
>>> average of the server is high because there is constantly
>>> heavy disk read access. Please point me some directions.
>>> RAM: 18G
>>> Solr home: 185G
>>> disk read access constantly 40-60M/s
>> Solr search performance is tightly coupled to the speed of small random reads. There are two obvious ways of ensuring that in these days:
>>
>> 1) Add more RAM to the server, so that the disk cache can hold a larger part of the index. If you add enough RAM (depends on your index, but 50-100% of the index size is a rule of thumb), you get "ideal" storage speed, by which I mean that the bottleneck moves away from storage. If you are using spinning drives, the 18GB of RAM is not a lot for a 185GB index.
>>
>> 2) Use SSDs instead of spinning drives (if you do not already do so). The speed-up depends a lot on what you are doing, but is is a cheap upgrade and it can later be coupled with extra RAM if it is not enough in itself.
>>
>> The Solr Wiki has this: https://wiki.apache.org/solr/SolrPerformanceProblems
>> And I have this: http://sbdevel.wordpress.com/2013/06/06/memory-is-overrated/
>>
>> - Toke Eskildsen

Re: Constantly high disk read access (40-60M/s)

Posted by Erick Erickson <er...@gmail.com>.

bq: You should have memory to fit your whole database in disk cache and then
some more.

I have to disagree here if for no other reason than stored data, which
is irrelevant
for searching, may make up virtually none or virtually all of your
on-disk space.
Saying it all needs to fit in disk cache is too broad-brush a
statement, gotta test.

In this case, though, I _do_ think that there's not enough memory here, Toke's
comments are spot on.

On Sat, Nov 29, 2014 at 2:02 AM, Toke Eskildsen <te...@statsbiblioteket.dk> wrote:
> Po-Yu Chuang [ratbert.chuang@gmail.com] wrote:
>> [...] Everything works fine now, but I noticed that the load
>> average of the server is high because there is constantly
>> heavy disk read access. Please point me some directions.
>
>> RAM: 18G
>> Solr home: 185G
>> disk read access constantly 40-60M/s
>
> Solr search performance is tightly coupled to the speed of small random reads. There are two obvious ways of ensuring that in these days:
>
> 1) Add more RAM to the server, so that the disk cache can hold a larger part of the index. If you add enough RAM (depends on your index, but 50-100% of the index size is a rule of thumb), you get "ideal" storage speed, by which I mean that the bottleneck moves away from storage. If you are using spinning drives, the 18GB of RAM is not a lot for a 185GB index.
>
> 2) Use SSDs instead of spinning drives (if you do not already do so). The speed-up depends a lot on what you are doing, but is is a cheap upgrade and it can later be coupled with extra RAM if it is not enough in itself.
>
> The Solr Wiki has this: https://wiki.apache.org/solr/SolrPerformanceProblems
> And I have this: http://sbdevel.wordpress.com/2013/06/06/memory-is-overrated/
>
> - Toke Eskildsen

RE: Constantly high disk read access (40-60M/s)

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.

Po-Yu Chuang [ratbert.chuang@gmail.com] wrote:
> [...] Everything works fine now, but I noticed that the load
> average of the server is high because there is constantly
> heavy disk read access. Please point me some directions.

> RAM: 18G
> Solr home: 185G
> disk read access constantly 40-60M/s

Solr search performance is tightly coupled to the speed of small random reads. There are two obvious ways of ensuring that in these days:

1) Add more RAM to the server, so that the disk cache can hold a larger part of the index. If you add enough RAM (depends on your index, but 50-100% of the index size is a rule of thumb), you get "ideal" storage speed, by which I mean that the bottleneck moves away from storage. If you are using spinning drives, the 18GB of RAM is not a lot for a 185GB index.

2) Use SSDs instead of spinning drives (if you do not already do so). The speed-up depends a lot on what you are doing, but is is a cheap upgrade and it can later be coupled with extra RAM if it is not enough in itself.

The Solr Wiki has this: https://wiki.apache.org/solr/SolrPerformanceProblems
And I have this: http://sbdevel.wordpress.com/2013/06/06/memory-is-overrated/

- Toke Eskildsen

Re: Constantly high disk read access (40-60M/s)

Posted by svante karlsson <sa...@csi.se>.

You should have memory to fit your whole database in disk cache and then
some more. I prefer to have at least twice that to accommodate startup of
new searchers while still serving from the "old".

Less than that performance drops a lot.

> Solr home: 185G
If that is your database size then you need new machines....



2014-11-29 6:59 GMT+01:00 Po-Yu Chuang <ra...@gmail.com>:

> Hi all,
>
> I am using Solr 4.9 with Tomcat. Thanks to the suggestions from Yonik and
> Dmitry about the slow start up. Everything works fine now, but I noticed
> that the load average of the server is high because there is constantly
> heavy disk read access. Please point me some directions.
>
> Some numbers about my system:
> RAM: 18G
> swap space: 2G
> number of documents: 27 million
> Solr home: 185G
> disk read access constantly 40-60M/s
> document cache size: 16K entries
> document cache hit ratio: 0.65
> query cache size: 16K
> query cache hit ratio: 0.03
>
> At first, I wondered if the disk read comes from swap, so I decreased the
> swappiness from 60 to 10, but the disk read is still there, which means
> that the disk read access does not result from swapping in.
>
> Then, I tried different document cache size and query different size. The
> effect on changing query cache size is not obvious. I tried 512, 16K, 256K
> entries and the hit ratio is between 0.01 to 0.03.
>
> For document cache, the larger cache size did improve the hit ratio of
> document cache size (I tried 512, 16K, 256K, 512K, 1024K and the hit ratio
> is between 0.58 - 0.87), but the disk read is still high.
>
> Is adjusting document cache size a reasonable direction? Or I should just
> increase the physical memory? Is there any method to estimate the right
> size of document cache (or other caches) and to estimate the size of
> physical memory needed?
>
> Thanks,
> Po-Yu
>

Re: Constantly high disk read access (40-60M/s)

Posted by Po-Yu Chuang <ra...@gmail.com>.

Hi all,

Thanks for all your suggestions. Looks like I have to add a lot of RAM or
use SSD to hold my index data eventually. For now, I am trying to reduce
the size of the index data by removing unnecessary fields and set
stored="false" for some fields.

Thanks,
Po-Yu


On Mon, Dec 1, 2014 at 10:20 PM, Otis Gospodnetic <
otis.gospodnetic@gmail.com> wrote:

> Po-Yu,
>
> To add what others have said:
> * Your query cache is clearly not serving its purpose, so you are just
> wasting your heap on it.  Consider disabling it.
> * That's a pretty big index.  Do your queries really always have to go
> against the whole index?  Are there multiple "tenants" in this index that
> would let you break up the index into multiple smaller indices?  Can you
> segment your index by time?  Maybe by doing that some indices will be
> hotter and some colder, and the OS could do a better job caching.
> * You didn't say anything about your queries.  Maybe they can be tighten to
> pull less data off disk?
> * Add RAM :)
>
> Otis
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>
> On Sat, Nov 29, 2014 at 12:59 AM, Po-Yu Chuang <ra...@gmail.com>
> wrote:
>
> > Hi all,
> >
> > I am using Solr 4.9 with Tomcat. Thanks to the suggestions from Yonik and
> > Dmitry about the slow start up. Everything works fine now, but I noticed
> > that the load average of the server is high because there is constantly
> > heavy disk read access. Please point me some directions.
> >
> > Some numbers about my system:
> > RAM: 18G
> > swap space: 2G
> > number of documents: 27 million
> > Solr home: 185G
> > disk read access constantly 40-60M/s
> > document cache size: 16K entries
> > document cache hit ratio: 0.65
> > query cache size: 16K
> > query cache hit ratio: 0.03
> >
> > At first, I wondered if the disk read comes from swap, so I decreased the
> > swappiness from 60 to 10, but the disk read is still there, which means
> > that the disk read access does not result from swapping in.
> >
> > Then, I tried different document cache size and query different size. The
> > effect on changing query cache size is not obvious. I tried 512, 16K,
> 256K
> > entries and the hit ratio is between 0.01 to 0.03.
> >
> > For document cache, the larger cache size did improve the hit ratio of
> > document cache size (I tried 512, 16K, 256K, 512K, 1024K and the hit
> ratio
> > is between 0.58 - 0.87), but the disk read is still high.
> >
> > Is adjusting document cache size a reasonable direction? Or I should just
> > increase the physical memory? Is there any method to estimate the right
> > size of document cache (or other caches) and to estimate the size of
> > physical memory needed?
> >
> > Thanks,
> > Po-Yu
> >
>

Re: Constantly high disk read access (40-60M/s)

Posted by Otis Gospodnetic <ot...@gmail.com>.

Po-Yu,

To add what others have said:
* Your query cache is clearly not serving its purpose, so you are just
wasting your heap on it.  Consider disabling it.
* That's a pretty big index.  Do your queries really always have to go
against the whole index?  Are there multiple "tenants" in this index that
would let you break up the index into multiple smaller indices?  Can you
segment your index by time?  Maybe by doing that some indices will be
hotter and some colder, and the OS could do a better job caching.
* You didn't say anything about your queries.  Maybe they can be tighten to
pull less data off disk?
* Add RAM :)

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On Sat, Nov 29, 2014 at 12:59 AM, Po-Yu Chuang <ra...@gmail.com>
wrote:

> Hi all,
>
> I am using Solr 4.9 with Tomcat. Thanks to the suggestions from Yonik and
> Dmitry about the slow start up. Everything works fine now, but I noticed
> that the load average of the server is high because there is constantly
> heavy disk read access. Please point me some directions.
>
> Some numbers about my system:
> RAM: 18G
> swap space: 2G
> number of documents: 27 million
> Solr home: 185G
> disk read access constantly 40-60M/s
> document cache size: 16K entries
> document cache hit ratio: 0.65
> query cache size: 16K
> query cache hit ratio: 0.03
>
> At first, I wondered if the disk read comes from swap, so I decreased the
> swappiness from 60 to 10, but the disk read is still there, which means
> that the disk read access does not result from swapping in.
>
> Then, I tried different document cache size and query different size. The
> effect on changing query cache size is not obvious. I tried 512, 16K, 256K
> entries and the hit ratio is between 0.01 to 0.03.
>
> For document cache, the larger cache size did improve the hit ratio of
> document cache size (I tried 512, 16K, 256K, 512K, 1024K and the hit ratio
> is between 0.58 - 0.87), but the disk read is still high.
>
> Is adjusting document cache size a reasonable direction? Or I should just
> increase the physical memory? Is there any method to estimate the right
> size of document cache (or other caches) and to estimate the size of
> physical memory needed?
>
> Thanks,
> Po-Yu
>