You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Marc Sturlese <ma...@gmail.com> on 2011/10/11 14:14:23 UTC

lucene 4.0 and DocumentsWriterPerThreadPool compared to lucene 3.4

I'm doing some performance test doing bulk indexing with lucene 4.0 and I'm
seeing weird results. I've read
http://www.gossamer-threads.com/lists/lucene/java-dev/127190?do=post_view_threaded#127190
but I'm still having doubts.
I'm building an index of 1G containing 1 milion docs. When building the
index, never search on it. I'm doing it with 1000 java heap, dual core and
ssd disk laptop
Using this conf:
tieredMergePolicy
lucene_34
not optimizing and commiting just in the end
maxMergeAtOnce = 10
segmentsPerTier = 10
It's taking 6min.


Using:
tieredMergePolicy
lucene_40
not optimizing and commiting just in the end
maxMergeAtOnce = 10
segmentsPerTier = 10
DEFAULT_MAX_THREAD_STATES = 8 in DocumentsWriterPerThreadPool
It's taking 20min.

If I change the default DEFAULT_MAX_THREAD_STATES to 4 or even 1 I'm getting
almost the same result.
I thought setting DEFAULT_MAX_THREAD_STATES = 1 would emulate the "old"
lucene indexing behabiour.
I might be doing something wrong because the three indexs buit with 4.0
should have different number of segments (because of the different
DEFAULT_MAX_THREAD_STATES) but the thing is they don't.

Is that normal? Any clue what could be wrong? (my trunk is from yesterday)
Thanks in advance.


--
View this message in context: http://lucene.472066.n3.nabble.com/lucene-4-0-and-DocumentsWriterPerThreadPool-compared-to-lucene-3-4-tp3412388p3412388.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: lucene 4.0 and DocumentsWriterPerThreadPool compared to lucene 3.4

Posted by Simon Willnauer <si...@googlemail.com>.
FYI - there is an issue related to merging likely causing this
slowdown which are addressed in
https://issues.apache.org/jira/browse/LUCENE-3515 for those who are
interested in this issue.

simon

On Tue, Oct 11, 2011 at 4:01 PM, Marc Sturlese <ma...@gmail.com> wrote:
> Simon,
> In this example I've set the DEFAULT_MAX_THREAD_STATES of
> DocumentsWriterPerThreadPool to 1. I've debugged the code and I've made sure
> that ThreadAffinityDocumentsWriterThreadPool has the value set to 1 (as I
> was trying to make it behave similar to lucene 3.4 using a single thread).
> I'm indexing using EmbeddedSolrServer. Provably I'm missing something or
> I've not configured some value properly but can't figure out what.
>
> here is the IndexWriterConfig#toString():
> analyzer=org.apache.solr.schema.IndexSchema$SolrIndexAnalyzer
> delPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy
> commit=null
> openMode=CREATE_OR_APPEND
> similarityProvider=org.apache.solr.search.SolrSimilarityProvider
> termIndexInterval=32
> mergeScheduler=org.apache.lucene.index.ConcurrentMergeScheduler
> default WRITE_LOCK_TIMEOUT=1000
> writeLockTimeout=1000
> maxBufferedDeleteTerms=-1
> ramBufferSizeMB=32.0
> maxBufferedDocs=-1
> mergedSegmentWarmer=null
> codecProvider=org.apache.lucene.index.codecs.CoreCodecProvider@a50a649
> mergePolicy=[TieredMergePolicy: maxMergeAtOnce=10,
> maxMergeAtOnceExplicit=30, maxMergedSegmentMB=5120.0, floorSegmentMB=2.0,
> expungeDeletesPctAllowed=10.0, segmentsPerTier=10.0, useCompoundFile=false,
> noCFSRatio=0.1
> indexerThreadPool=org.apache.lucene.index.ThreadAffinityDocumentsWriterThreadPool@643cb075
> readerPooling=false
> readerTermsIndexDivisor=1
> flushPolicy=org.apache.lucene.index.FlushByRamOrCountsPolicy@4c6504bc
> perThreadHardLimitMB=1945
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/lucene-4-0-and-DocumentsWriterPerThreadPool-compared-to-lucene-3-4-tp3412388p3412659.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: lucene 4.0 and DocumentsWriterPerThreadPool compared to lucene 3.4

Posted by Marc Sturlese <ma...@gmail.com>.
Simon,
In this example I've set the DEFAULT_MAX_THREAD_STATES of
DocumentsWriterPerThreadPool to 1. I've debugged the code and I've made sure
that ThreadAffinityDocumentsWriterThreadPool has the value set to 1 (as I
was trying to make it behave similar to lucene 3.4 using a single thread). 
I'm indexing using EmbeddedSolrServer. Provably I'm missing something or
I've not configured some value properly but can't figure out what.

here is the IndexWriterConfig#toString():
analyzer=org.apache.solr.schema.IndexSchema$SolrIndexAnalyzer
delPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy
commit=null
openMode=CREATE_OR_APPEND
similarityProvider=org.apache.solr.search.SolrSimilarityProvider
termIndexInterval=32
mergeScheduler=org.apache.lucene.index.ConcurrentMergeScheduler
default WRITE_LOCK_TIMEOUT=1000
writeLockTimeout=1000
maxBufferedDeleteTerms=-1
ramBufferSizeMB=32.0
maxBufferedDocs=-1
mergedSegmentWarmer=null
codecProvider=org.apache.lucene.index.codecs.CoreCodecProvider@a50a649
mergePolicy=[TieredMergePolicy: maxMergeAtOnce=10,
maxMergeAtOnceExplicit=30, maxMergedSegmentMB=5120.0, floorSegmentMB=2.0,
expungeDeletesPctAllowed=10.0, segmentsPerTier=10.0, useCompoundFile=false,
noCFSRatio=0.1
indexerThreadPool=org.apache.lucene.index.ThreadAffinityDocumentsWriterThreadPool@643cb075
readerPooling=false
readerTermsIndexDivisor=1
flushPolicy=org.apache.lucene.index.FlushByRamOrCountsPolicy@4c6504bc
perThreadHardLimitMB=1945

--
View this message in context: http://lucene.472066.n3.nabble.com/lucene-4-0-and-DocumentsWriterPerThreadPool-compared-to-lucene-3-4-tp3412388p3412659.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: lucene 4.0 and DocumentsWriterPerThreadPool compared to lucene 3.4

Posted by Simon Willnauer <si...@googlemail.com>.
marc, can you provide more info about your IndexWriterConfig you are using?

maybe just call IndexWriterConfig#toString() and past it in?

simon

On Tue, Oct 11, 2011 at 2:14 PM, Marc Sturlese <ma...@gmail.com> wrote:
> I'm doing some performance test doing bulk indexing with lucene 4.0 and I'm
> seeing weird results. I've read
> http://www.gossamer-threads.com/lists/lucene/java-dev/127190?do=post_view_threaded#127190
> but I'm still having doubts.
> I'm building an index of 1G containing 1 milion docs. When building the
> index, never search on it. I'm doing it with 1000 java heap, dual core and
> ssd disk laptop
> Using this conf:
> tieredMergePolicy
> lucene_34
> not optimizing and commiting just in the end
> maxMergeAtOnce = 10
> segmentsPerTier = 10
> It's taking 6min.
>
>
> Using:
> tieredMergePolicy
> lucene_40
> not optimizing and commiting just in the end
> maxMergeAtOnce = 10
> segmentsPerTier = 10
> DEFAULT_MAX_THREAD_STATES = 8 in DocumentsWriterPerThreadPool
> It's taking 20min.
>
> If I change the default DEFAULT_MAX_THREAD_STATES to 4 or even 1 I'm getting
> almost the same result.
> I thought setting DEFAULT_MAX_THREAD_STATES = 1 would emulate the "old"
> lucene indexing behabiour.
> I might be doing something wrong because the three indexs buit with 4.0
> should have different number of segments (because of the different
> DEFAULT_MAX_THREAD_STATES) but the thing is they don't.
>
> Is that normal? Any clue what could be wrong? (my trunk is from yesterday)
> Thanks in advance.
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/lucene-4-0-and-DocumentsWriterPerThreadPool-compared-to-lucene-3-4-tp3412388p3412388.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: lucene 4.0 and DocumentsWriterPerThreadPool compared to lucene 3.4

Posted by Mihai Caraman <ca...@gmail.com>.
Hey, you should compare with the ThreadedIndexWriter too :). I'll attach the
source from Lucene in action SE manual and you can just replace the new
IntexWriter(... with new ThreadedIndexWriter(...

See if those results make a difference. Also I presume you don't have a
single core cpu

2011/10/11 Marc Sturlese <ma...@gmail.com>

> I'm doing some performance test doing bulk indexing with lucene 4.0
>