You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Benson Margulies <bi...@gmail.com> on 2012/02/19 03:21:08 UTC
Concurrency and multiple merge threads
Using Lucene 3.5.0, on a 32-core machine, I have coded something shaped like:
make a writer on a RAMDirectory.
start:
Create a near-real-time searcher from it.
farm work out to multiple threads, each of which performs a search
and retrieves some docs.
When all are done, write some new docs.
back to start.
The returns of adding threads diminish faster than I would like.
According to YourKit, a major contribution when I try 16 is conflict
on the RAMFile monitor.
The conflict shows five Lucene Merge Threads holding the monitor, plus
my own threads. I'm not sure that I'm interpreting this correctly;
perhaps there were five different occasions when a merge thread
blocked my threads.
In any case, I'm fairly stumped as to how my threads manage to
materially block each other, since the synchronized methods used on
the search side in RAMFile are pretty tiny.
YourKit claims that the problem is in RAMFile.numBuffers, but I have
not been able to catch this being called in a search.
I did spot the following backtrace.
In any case, I'd be grateful if anyone could tell me if this is a
familiar story or one for which there's a solution.
RAMFile.getBuffer(int) line: 75
RAMInputStream.switchCurrentBuffer(boolean) line: 107
RAMInputStream.seek(long) line: 144
SegmentNorms.bytes() line: 163
SegmentNorms.bytes() line: 143
ReadOnlySegmentReader(SegmentReader).norms(String) line: 599
TermQuery$TermWeight.scorer(IndexReader, boolean, boolean) line: 107
BooleanQuery$BooleanWeight.scorer(IndexReader, boolean, boolean) line: 298
IndexSearcher.search(Weight, Filter, Collector) line: 577
IndexSearcher.search(Weight, Filter, int, Sort, boolean) line: 517
IndexSearcher.search(Weight, Filter, int, Sort) line: 487
IndexSearcher.search(Query, Filter, int, Sort) line: 400
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Concurrency and multiple merge threads
Posted by Mike McCandless <lu...@mikemccandless.com>.
Sounds like a nice machine!
It's frustrating that RAMFile even has any sync'd methods... Lucene is write once, so once a RAMFile is written we don't need any sync to read it. Maybe on creating a RAMInputStream we could make a new ReadOnlyRAMFile, holding the same buffers without sync.
That said the ops inside the sync are tiny so it's strange if this really is the cause of the contention... It could just be a profiling ghost and something else is the real bottleneck...
Mike
On Feb 18, 2012, at 9:21 PM, Benson Margulies <bi...@gmail.com> wrote:
> Using Lucene 3.5.0, on a 32-core machine, I have coded something shaped like:
>
> make a writer on a RAMDirectory.
>
> start:
>
> Create a near-real-time searcher from it.
>
> farm work out to multiple threads, each of which performs a search
> and retrieves some docs.
>
> When all are done, write some new docs.
>
> back to start.
>
> The returns of adding threads diminish faster than I would like.
> According to YourKit, a major contribution when I try 16 is conflict
> on the RAMFile monitor.
>
> The conflict shows five Lucene Merge Threads holding the monitor, plus
> my own threads. I'm not sure that I'm interpreting this correctly;
> perhaps there were five different occasions when a merge thread
> blocked my threads.
>
> In any case, I'm fairly stumped as to how my threads manage to
> materially block each other, since the synchronized methods used on
> the search side in RAMFile are pretty tiny.
>
> YourKit claims that the problem is in RAMFile.numBuffers, but I have
> not been able to catch this being called in a search.
>
> I did spot the following backtrace.
>
> In any case, I'd be grateful if anyone could tell me if this is a
> familiar story or one for which there's a solution.
>
>
> RAMFile.getBuffer(int) line: 75
> RAMInputStream.switchCurrentBuffer(boolean) line: 107
> RAMInputStream.seek(long) line: 144
> SegmentNorms.bytes() line: 163
> SegmentNorms.bytes() line: 143
> ReadOnlySegmentReader(SegmentReader).norms(String) line: 599
> TermQuery$TermWeight.scorer(IndexReader, boolean, boolean) line: 107
> BooleanQuery$BooleanWeight.scorer(IndexReader, boolean, boolean) line: 298
> IndexSearcher.search(Weight, Filter, Collector) line: 577
> IndexSearcher.search(Weight, Filter, int, Sort, boolean) line: 517
> IndexSearcher.search(Weight, Filter, int, Sort) line: 487
> IndexSearcher.search(Query, Filter, int, Sort) line: 400
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Concurrency and multiple merge threads
Posted by Greg Bowyer <gb...@fastmail.co.uk>.
Your not very clear about where you see the specific slow operations, at
search or re-index time.
I am going to go out on a limb here and suggest that maybe its at index
time, and maybe the yourkit trace showing the 5 merge threads awaiting
the monitor is the cause of your issues.
You claim to have 32 processors in your application, but that after 16
threads you are seeing issues with throughput.
One source of your issues might be down to the number of threads that
are allowed to "merge" indexes, and the maximum number of merge tasks
that can be in progress before threads are stopped.
The code in lucene that manipulates this is here
https://github.com/apache/lucene-solr/blob/lucene_solr_3_5/lucene/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java#L60
With that said, I dont lay claim to being right about lucene, so if
someone more knowledgable says that I am wrong, then there is a good
chance I am a fool and that they are speaking truths and wisdom.
-- Greg
On 18/02/2012 18:21, Benson Margulies wrote:
> Using Lucene 3.5.0, on a 32-core machine, I have coded something shaped like:
>
> make a writer on a RAMDirectory.
>
> start:
>
> Create a near-real-time searcher from it.
>
> farm work out to multiple threads, each of which performs a search
> and retrieves some docs.
>
> When all are done, write some new docs.
>
> back to start.
>
> The returns of adding threads diminish faster than I would like.
> According to YourKit, a major contribution when I try 16 is conflict
> on the RAMFile monitor.
>
> The conflict shows five Lucene Merge Threads holding the monitor, plus
> my own threads. I'm not sure that I'm interpreting this correctly;
> perhaps there were five different occasions when a merge thread
> blocked my threads.
>
> In any case, I'm fairly stumped as to how my threads manage to
> materially block each other, since the synchronized methods used on
> the search side in RAMFile are pretty tiny.
>
> YourKit claims that the problem is in RAMFile.numBuffers, but I have
> not been able to catch this being called in a search.
>
> I did spot the following backtrace.
>
> In any case, I'd be grateful if anyone could tell me if this is a
> familiar story or one for which there's a solution.
>
>
> RAMFile.getBuffer(int) line: 75
> RAMInputStream.switchCurrentBuffer(boolean) line: 107
> RAMInputStream.seek(long) line: 144
> SegmentNorms.bytes() line: 163
> SegmentNorms.bytes() line: 143
> ReadOnlySegmentReader(SegmentReader).norms(String) line: 599
> TermQuery$TermWeight.scorer(IndexReader, boolean, boolean) line: 107
> BooleanQuery$BooleanWeight.scorer(IndexReader, boolean, boolean) line: 298
> IndexSearcher.search(Weight, Filter, Collector) line: 577
> IndexSearcher.search(Weight, Filter, int, Sort, boolean) line: 517
> IndexSearcher.search(Weight, Filter, int, Sort) line: 487
> IndexSearcher.search(Query, Filter, int, Sort) line: 400
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org