You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Benson Margulies <bi...@gmail.com> on 2012/02/19 03:21:08 UTC

Concurrency and multiple merge threads

Using Lucene 3.5.0, on a 32-core machine, I have coded something shaped like:

make a writer on a RAMDirectory.

start:

  Create a near-real-time searcher from it.

  farm work out to multiple threads, each of which performs a search
and retrieves some docs.

  When all are done, write some new docs.

back to start.

The returns of adding threads diminish faster than I would like.
According to YourKit, a major contribution when I try 16 is conflict
on the RAMFile monitor.

The conflict shows five Lucene Merge Threads holding the monitor, plus
my own threads. I'm not sure that I'm interpreting this correctly;
perhaps there were five different occasions when a merge thread
blocked my threads.

In any case, I'm fairly stumped as to how my threads manage to
materially block each other, since the synchronized methods used on
the search side in RAMFile are pretty tiny.

YourKit claims that the problem is in RAMFile.numBuffers, but I have
not been able to catch this being called in a search.

I did spot the following backtrace.

In any case, I'd be grateful if anyone could tell me if this is a
familiar story or one for which there's a solution.


	RAMFile.getBuffer(int) line: 75	
	RAMInputStream.switchCurrentBuffer(boolean) line: 107	
	RAMInputStream.seek(long) line: 144	
	SegmentNorms.bytes() line: 163	
	SegmentNorms.bytes() line: 143	
	ReadOnlySegmentReader(SegmentReader).norms(String) line: 599	
	TermQuery$TermWeight.scorer(IndexReader, boolean, boolean) line: 107	
	BooleanQuery$BooleanWeight.scorer(IndexReader, boolean, boolean) line: 298	
	IndexSearcher.search(Weight, Filter, Collector) line: 577	
	IndexSearcher.search(Weight, Filter, int, Sort, boolean) line: 517	
	IndexSearcher.search(Weight, Filter, int, Sort) line: 487	
	IndexSearcher.search(Query, Filter, int, Sort) line: 400

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Concurrency and multiple merge threads

Posted by Mike McCandless <lu...@mikemccandless.com>.
Sounds like a nice machine!

It's frustrating that RAMFile even has any sync'd methods... Lucene is write once, so once a RAMFile is written we don't need any sync to read it.  Maybe on creating a RAMInputStream we could make a new ReadOnlyRAMFile, holding the same buffers without sync.

That said the ops inside the sync are tiny so it's strange if this really is the cause of the contention... It could just be a profiling ghost and something else is the real bottleneck...

Mike

On Feb 18, 2012, at 9:21 PM, Benson Margulies <bi...@gmail.com> wrote:

> Using Lucene 3.5.0, on a 32-core machine, I have coded something shaped like:
> 
> make a writer on a RAMDirectory.
> 
> start:
> 
>  Create a near-real-time searcher from it.
> 
>  farm work out to multiple threads, each of which performs a search
> and retrieves some docs.
> 
>  When all are done, write some new docs.
> 
> back to start.
> 
> The returns of adding threads diminish faster than I would like.
> According to YourKit, a major contribution when I try 16 is conflict
> on the RAMFile monitor.
> 
> The conflict shows five Lucene Merge Threads holding the monitor, plus
> my own threads. I'm not sure that I'm interpreting this correctly;
> perhaps there were five different occasions when a merge thread
> blocked my threads.
> 
> In any case, I'm fairly stumped as to how my threads manage to
> materially block each other, since the synchronized methods used on
> the search side in RAMFile are pretty tiny.
> 
> YourKit claims that the problem is in RAMFile.numBuffers, but I have
> not been able to catch this being called in a search.
> 
> I did spot the following backtrace.
> 
> In any case, I'd be grateful if anyone could tell me if this is a
> familiar story or one for which there's a solution.
> 
> 
>    RAMFile.getBuffer(int) line: 75    
>    RAMInputStream.switchCurrentBuffer(boolean) line: 107    
>    RAMInputStream.seek(long) line: 144    
>    SegmentNorms.bytes() line: 163    
>    SegmentNorms.bytes() line: 143    
>    ReadOnlySegmentReader(SegmentReader).norms(String) line: 599    
>    TermQuery$TermWeight.scorer(IndexReader, boolean, boolean) line: 107    
>    BooleanQuery$BooleanWeight.scorer(IndexReader, boolean, boolean) line: 298    
>    IndexSearcher.search(Weight, Filter, Collector) line: 577    
>    IndexSearcher.search(Weight, Filter, int, Sort, boolean) line: 517    
>    IndexSearcher.search(Weight, Filter, int, Sort) line: 487    
>    IndexSearcher.search(Query, Filter, int, Sort) line: 400
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Concurrency and multiple merge threads

Posted by Greg Bowyer <gb...@fastmail.co.uk>.
Your not very clear about where you see the specific slow operations, at 
search or re-index time.

I am going to go out on a limb here and suggest that maybe its at index 
time, and maybe the yourkit trace showing the 5 merge threads awaiting 
the monitor is the cause of your issues.

You claim to have 32 processors in your application, but that after 16 
threads you are seeing issues with throughput.

One source of your issues might be down to the number of threads that 
are allowed to "merge" indexes, and the maximum number of merge tasks 
that can be in progress before threads are stopped.

The code in lucene that manipulates this is here 
https://github.com/apache/lucene-solr/blob/lucene_solr_3_5/lucene/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java#L60

With that said, I dont lay claim to being right about lucene, so if 
someone more knowledgable says that I am wrong, then there is a good 
chance I am a fool and that they are speaking truths and wisdom.

-- Greg
On 18/02/2012 18:21, Benson Margulies wrote:
> Using Lucene 3.5.0, on a 32-core machine, I have coded something shaped like:
>
> make a writer on a RAMDirectory.
>
> start:
>
>    Create a near-real-time searcher from it.
>
>    farm work out to multiple threads, each of which performs a search
> and retrieves some docs.
>
>    When all are done, write some new docs.
>
> back to start.
>
> The returns of adding threads diminish faster than I would like.
> According to YourKit, a major contribution when I try 16 is conflict
> on the RAMFile monitor.
>
> The conflict shows five Lucene Merge Threads holding the monitor, plus
> my own threads. I'm not sure that I'm interpreting this correctly;
> perhaps there were five different occasions when a merge thread
> blocked my threads.
>
> In any case, I'm fairly stumped as to how my threads manage to
> materially block each other, since the synchronized methods used on
> the search side in RAMFile are pretty tiny.
>
> YourKit claims that the problem is in RAMFile.numBuffers, but I have
> not been able to catch this being called in a search.
>
> I did spot the following backtrace.
>
> In any case, I'd be grateful if anyone could tell me if this is a
> familiar story or one for which there's a solution.
>
>
> 	RAMFile.getBuffer(int) line: 75	
> 	RAMInputStream.switchCurrentBuffer(boolean) line: 107	
> 	RAMInputStream.seek(long) line: 144	
> 	SegmentNorms.bytes() line: 163	
> 	SegmentNorms.bytes() line: 143	
> 	ReadOnlySegmentReader(SegmentReader).norms(String) line: 599	
> 	TermQuery$TermWeight.scorer(IndexReader, boolean, boolean) line: 107	
> 	BooleanQuery$BooleanWeight.scorer(IndexReader, boolean, boolean) line: 298	
> 	IndexSearcher.search(Weight, Filter, Collector) line: 577	
> 	IndexSearcher.search(Weight, Filter, int, Sort, boolean) line: 517	
> 	IndexSearcher.search(Weight, Filter, int, Sort) line: 487	
> 	IndexSearcher.search(Query, Filter, int, Sort) line: 400
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org