You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Dominique Bejean <do...@eolya.fr> on 2018/03/08 21:55:28 UTC

What are descent disk I/O for Solr and Zookeeper ?

Hi,

Disk I/O are critical for high performance Solrcloud.
I am looking for relevante disk I/O tests for both Solr node or Zookeeper
element and with these tests what are bad, correct or good results.

For instance how to know if these results with basic dd utility reports
correct disk performances ? And are these tests relevants ?

Write small files
# dd if=/dev/zero of=test bs=4k count=1024k conv=fdatasync
4294967296 bytes (4.3 GB) copied, 4.14932 s, 1.0 GB/s

Write medium files
# dd if=/dev/zero of=test bs=64k count=64k conv=fdatasync
4294967296 bytes (4.3 GB) copied, 3.07326 s, 1.4 GB/s

Write large files
# dd if=/dev/zero of=test bs=1024k count=4k conv=fdatasync
4294967296 bytes (4.3 GB) copied, 2.97767 s, 1.4 GB/s

Read small files
# dd if=test of=/dev/zero bs=4k
4294967296 bytes (4.3 GB) copied, 0.707424 s, 6.1 GB/s

Read medium files
# dd if=test of=/dev/zero bs=64k
4294967296 bytes (4.3 GB) copied, 0.545915 s, 7.9 GB/s

Read large files
# dd if=test of=/dev/zero bs=1024k
4294967296 bytes (4.3 GB) copied, 0.578093 s, 7.4 GB/s


Regards

Dominique



-- 
Dominique Béjean
06 08 46 12 43

Re: What are descent disk I/O for Solr and Zookeeper ?

Posted by Dominique Bejean <do...@eolya.fr>.
Hi Shawn,

I agree on Disk I/O versus available memory about Solr performances.
However for heavy indexing and heavy searching context, even with a lot of
RAM, disk I/O should be critical.

My concern is also about write I/O for Zookeeper transactions log. My
understanding is that is critical not as much for Solrcloud performances
but mainly for SolrCloud stability.

Sometimes even with best practices respect and all possible configuration
tuning, Solrcoud is not stable or not performant due to lake of hardware
resources. Monitoring CPU, CPU load, iowait, jvm GC, … should highlight
theses lake of ressources. If the hardware is undersized, we need metrics
in order to explain and demonstrate this to the customer (furthermore if
the infrastructure provider do not want admit there are issues with
hardware or virtualization). That was the meaning of my question about
“decent disk I/O”.

Regards

Dominique


Le ven. 9 mars 2018 à 00:40, Shawn Heisey <ap...@elyograg.org> a écrit :

> On 3/8/2018 2:55 PM, Dominique Bejean wrote:
> > Disk I/O are critical for high performance Solrcloud.
>
> This statement has truth to it, but if your system is correctly sized,
> disk performance will not have much of an impact on Solr performance.
> If upgrading to faster disks does improves long-term query performance,
> the system probably doesn't have enough memory installed.  There can be
> other causes, but that is the most common.
>
> When there is enough memory available to allow the operating system to
> effectively cache the index data, Solr will not need to access the disk
> much at all for queries -- all that data will be already in memory.
> Indexing will still be dependent on disk performance even when there is
> plenty of memory, because that will require writing new data to the disk.
>
> https://wiki.apache.org/solr/SolrPerformanceProblems
>
> This is my hammer.  To me, your question looks like a nail.  :)
>
> Thanks,
> Shawn
>
> --
Dominique Béjean
06 08 46 12 43

Re: What are descent disk I/O for Solr and Zookeeper ?

Posted by Shawn Heisey <ap...@elyograg.org>.
On 3/8/2018 2:55 PM, Dominique Bejean wrote:
> Disk I/O are critical for high performance Solrcloud.

This statement has truth to it, but if your system is correctly sized,
disk performance will not have much of an impact on Solr performance. 
If upgrading to faster disks does improves long-term query performance,
the system probably doesn't have enough memory installed.  There can be
other causes, but that is the most common.

When there is enough memory available to allow the operating system to
effectively cache the index data, Solr will not need to access the disk
much at all for queries -- all that data will be already in memory. 
Indexing will still be dependent on disk performance even when there is
plenty of memory, because that will require writing new data to the disk.

https://wiki.apache.org/solr/SolrPerformanceProblems

This is my hammer.  To me, your question looks like a nail.  :)

Thanks,
Shawn