You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Anahita Shayesteh-SSI <an...@ssi.samsung.com> on 2015/04/30 20:25:15 UTC

Lucene indexing speed on NVMe drive

Hi. I am studying Lucene performance and in particular how it benefits from faster I/O such as SSD and NVMe.
I am using nightlybench for indexing wiki (1K docs) with similar parameters as used in nightlyBench. (Hardware: Intel Xeon, 2.5GHz, 20 processor ,40 with hyperthreading, 64G Memory) and study indexing speed on HDD, SSD and NVMe. While I do see benefit when switching from HDD to SSD, there is not much noticeable benefit moving to NVMe.
I get best performance (200GB/hour) with 20 indexing threads, increasing number of threads to 40 hurts performance. Similarly increasing maxConcurrentMerges above 3-5 doesn't seem to give me any benefit. I am wondering what the bottleneck is, or anyone has insight on  set of options (number of threads, merge options, flush options, read buffer?) to take advantage of a very fast I/O system. I see NVMe bandwidth going as high as 800MB/s but it is only fast spikes and CPU utilization is about 50% on average, though some cores have consistently higher utilization while others have spiky behavior.
You thoughts and inside is greatly appreciated. Thanks.
Anahita Shayesteh


Re: Lucene indexing speed on NVMe drive

Posted by Michael McCandless <lu...@mikemccandless.com>.
Hyper-threading should help Lucene indexing go faster, when it's not
IO bound ... I found 20 threads (on 12 real cores, 24 with HT) to be
fastest in the nightly benchmark
(http://people.apache.org/~mikemccand/lucenebench/indexing.html).

But it's curious you're unable to saturate one of CPU or IO, with 20
real cores and NVMe storage.  200 GB/hour isn't that much better than
what we see on the nightly benchmark on 1 KB docs (~160 GB/hour),
though those cores are 3.33 Ghz (2 socket Intel Xeon x5680).

Where is the source line docs file stored?  Maybe pulling the lines
from it is a bottleneck?

Can you try running with 20 threads under a profiler and post the
results?  Or maybe capture thread stack for all threads multiple times
throughout the indexing run, so we can see where the threads are?
Might give a clue ...

Separately, you could turn on verbose to the Indexer, and IndexWriter
will produce lots of output about what happened ... maybe there is
something surprising, e.g. merges falling behind and stalling
indexing.

The nightly index doesn't wait for merges to finish in the end by
default, but it could be if you change that, then you'd see speedups
rom NVMe.

Mike McCandless

http://blog.mikemccandless.com


On Thu, Apr 30, 2015 at 2:25 PM, Anahita Shayesteh-SSI
<an...@ssi.samsung.com> wrote:
> Hi. I am studying Lucene performance and in particular how it benefits from faster I/O such as SSD and NVMe.
> I am using nightlybench for indexing wiki (1K docs) with similar parameters as used in nightlyBench. (Hardware: Intel Xeon, 2.5GHz, 20 processor ,40 with hyperthreading, 64G Memory) and study indexing speed on HDD, SSD and NVMe. While I do see benefit when switching from HDD to SSD, there is not much noticeable benefit moving to NVMe.
> I get best performance (200GB/hour) with 20 indexing threads, increasing number of threads to 40 hurts performance. Similarly increasing maxConcurrentMerges above 3-5 doesn't seem to give me any benefit. I am wondering what the bottleneck is, or anyone has insight on  set of options (number of threads, merge options, flush options, read buffer?) to take advantage of a very fast I/O system. I see NVMe bandwidth going as high as 800MB/s but it is only fast spikes and CPU utilization is about 50% on average, though some cores have consistently higher utilization while others have spiky behavior.
> You thoughts and inside is greatly appreciated. Thanks.
> Anahita Shayesteh
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Lucene indexing speed on NVMe drive

Posted by Anahita Shayesteh-SSI <an...@ssi.samsung.com>.
Hi Chris,
Thanks for your comment. You are correct, it looks to be CPU bound. However I am monitoring CPU utilization on all cores (40) and don't see close to 100% utilization on 20 cores, but all cores show activity ranging from %20 to 80% with lots of ups and downs.
Can you help me understand all threads running during indexing? What is the best way to tune the number of threads (indexing and merging) based on the available hardware?

Thank you,
Anahita
 

-----Original Message-----
From: Chris Hostetter [mailto:hossman_lucene@fucit.org] 
Sent: Thursday, April 30, 2015 11:32 AM
To: java-user@lucene.apache.org
Cc: Anahita Shayesteh-SSI
Subject: Re: Lucene indexing speed on NVMe drive


: Hi. I am studying Lucene performance and in particular how it benefits from faster I/O such as SSD and NVMe.

: parameters as used in nightlyBench. (Hardware: Intel Xeon, 2.5GHz, 20
: processor ,40 with hyperthreading, 64G Memory) and study indexing speed 
	...
: I get best performance (200GB/hour) with 20 indexing threads, increasing
: number of threads to 40 hurts performance. Similarly increasing
: maxConcurrentMerges above 3-5 doesn't seem to give me any benefit. I am
: wondering what the bottleneck is, or anyone has insight on set of 

Maybe i'm missing something, but it sounds like you are CPU bound.  

Hyperthreading isn't going to help you if you are maxing out 20 (real) CPUS -- IIUC it only helps with some additional paralellization when processes are blocked by something else -- ie: IO bound.




-Hoss
http://www.lucidworks.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene indexing speed on NVMe drive

Posted by Chris Hostetter <ho...@fucit.org>.
: Hi. I am studying Lucene performance and in particular how it benefits from faster I/O such as SSD and NVMe.

: parameters as used in nightlyBench. (Hardware: Intel Xeon, 2.5GHz, 20 
: processor ,40 with hyperthreading, 64G Memory) and study indexing speed 
	...
: I get best performance (200GB/hour) with 20 indexing threads, increasing 
: number of threads to 40 hurts performance. Similarly increasing 
: maxConcurrentMerges above 3-5 doesn't seem to give me any benefit. I am 
: wondering what the bottleneck is, or anyone has insight on set of 

Maybe i'm missing something, but it sounds like you are CPU bound.  

Hyperthreading isn't going to help you if you are maxing out 20 (real) 
CPUS -- IIUC it only helps with some additional paralellization when 
processes are blocked by something else -- ie: IO bound.




-Hoss
http://www.lucidworks.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org