You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Software Dev <st...@gmail.com> on 2014/03/26 01:12:55 UTC

What contributes to disk IO?

What are the main contributing factors for Solr Cloud generating a lot
of disk IO?

A lot of reads? Writes? Insufficient RAM?

I would think if there was enough disk cache available for the whole
index there would be little to no disk IO.

Re: What contributes to disk IO?

Posted by Shawn Heisey <so...@elyograg.org>.
On 3/25/2014 6:12 PM, Software Dev wrote:
> What are the main contributing factors for Solr Cloud generating a lot
> of disk IO?
>
> A lot of reads? Writes? Insufficient RAM?
>
> I would think if there was enough disk cache available for the whole
> index there would be little to no disk IO.

Toke's answer is spot on.

Something additional: In a previous thread, you mentioned optimizing.  
That will generate a lot of disk I/O, most of which is unavoidable.  
Actual disk write I/O is completely unavoidable with an optimize, no 
matter how big your disk cache is.  Unless you have TWICE your index 
size in RAM available for the disk cache, a large percentage of the read 
I/O from an optimize will also hit the disk, because the optimized index 
will push the old index out of the disk cache.  This slows down searches.

Thanks,
Shawn


Re: What contributes to disk IO?

Posted by Walter Underwood <wu...@wunderwood.org>.
A merge requires reading 100% of two or more segments, and writing all the non-deleted docs into a brand-new segment. So generally, a whole bunch of sequential reads and writes.

This Mike McCandless post from 2011 has lovely visualizations of merge behavior, plus some info about the number of writes needed.

http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html

wunder

On Mar 26, 2014, at 9:29 AM, Otis Gospodnetic <ot...@gmail.com> wrote:

> Lucene segment merges cause both reads and writes.  If you look at SPM,
> you'll see the number of index files and the number of segments, which will
> give you an idea what's going on at that level.
> 
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
> 
> 
> On Tue, Mar 25, 2014 at 8:12 PM, Software Dev <st...@gmail.com>wrote:
> 
>> What are the main contributing factors for Solr Cloud generating a lot
>> of disk IO?
>> 
>> A lot of reads? Writes? Insufficient RAM?
>> 
>> I would think if there was enough disk cache available for the whole
>> index there would be little to no disk IO.
>> 

--
Walter Underwood
wunder@wunderwood.org




Re: What contributes to disk IO?

Posted by Otis Gospodnetic <ot...@gmail.com>.
Lucene segment merges cause both reads and writes.  If you look at SPM,
you'll see the number of index files and the number of segments, which will
give you an idea what's going on at that level.

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Tue, Mar 25, 2014 at 8:12 PM, Software Dev <st...@gmail.com>wrote:

> What are the main contributing factors for Solr Cloud generating a lot
> of disk IO?
>
> A lot of reads? Writes? Insufficient RAM?
>
> I would think if there was enough disk cache available for the whole
> index there would be little to no disk IO.
>

Re: What contributes to disk IO?

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
On Wed, 2014-03-26 at 01:12 +0100, Software Dev wrote:
> What are the main contributing factors for Solr Cloud generating a lot
> of disk IO?
> 
> A lot of reads? Writes? Insufficient RAM?

Searching is heavy random I/O reads, indexing is bulk reads and writes.

> I would think if there was enough disk cache available for the whole
> index there would be little to no disk IO.

True for searching, but updates (of course) requires real storage
activity. Are you perhaps doing very frequent commits?

- Toke Eskildsen, State and University Library, Denmark