You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Marc Sturlese <ma...@gmail.com> on 2009/04/29 16:14:09 UTC

stress tests to DIH and deduplication patch

Hey there, I am doing some stress tests indexing with DIH.
I am indexing a mysql DB with 1400000 rows aprox. I am using also the
DeDuplication patch.
I am using tomcat with JVM limit of -Xms2000M -Xmx2000M
I have indexed 3 times using full-import command without restarting tomcat
or reloading the core between the indexations.
I have used jmap and jhat to map heap memory in some moments of the
indexations.
Here I show the beginig of the maps (I don't show the lower part of the
stack because object instance numbers are completely stable in there).
I have noticed that the number of Term, TermInfo and TermQuery grows between
an indexation and another... is that normal?



FIRST TIME I INDEX... WITH A MILION INDEXED DOCS APROX... HERE INDEXING
PROCESS IS STILL RUNNING
268290 instances of class org.apache.lucene.index.Term
215943 instances of class org.apache.lucene.index.TermInfo
129649 instances of class
org.apache.lucene.index.FreqProxTermsWriter$PostingList
51537 instances of class org.apache.lucene.search.TermQuery
25457 instances of class org.apache.lucene.index.BufferedDeletes$Num
23233 instances of class com.sun.tools.javac.zip.ZipFileIndexEntry
1569 instances of class com.sun.tools.javac.zip.ZipFileIndex$DirectoryEntry
1120 instances of class org.apache.lucene.index.FieldInfo
919 instances of class org.apache.catalina.loader.ResourceEntry 


FIRST TIME I INDEX, COMPLETED (1.4 MILION DOCS INDEXED)
552522 instances of class org.apache.lucene.index.Term
505835 instances of class org.apache.lucene.index.TermInfo
128937 instances of class
org.apache.lucene.index.FreqProxTermsWriter$PostingList
48645 instances of class org.apache.lucene.search.TermQuery
24065 instances of class org.apache.lucene.index.BufferedDeletes$Num
23233 instances of class com.sun.tools.javac.zip.ZipFileIndexEntry
1569 instances of class com.sun.tools.javac.zip.ZipFileIndex$DirectoryEntry
1470 instances of class org.apache.lucene.index.FieldInfo
923 instances of class org.apache.catalina.loader.ResourceEntry
858 instances of class com.sun.tools.javac.util.List 


SECOND TIME I INDEX WITH 500000 INDEXED DOCS... HERE INDEX PROCESS IS STILL
RUNNING 
264617 instances of class
org.apache.lucene.index.FreqProxTermsWriter$PostingList
262496 instances of class org.apache.lucene.index.Term
116078 instances of class org.apache.lucene.index.TermInfo
53383 instances of class org.apache.lucene.search.TermQuery
42274 instances of class
org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput
30230 instances of class org.apache.lucene.search.TermQuery$TermWeight
26044 instances of class org.apache.lucene.index.BufferedDeletes$Num
23233 instances of class com.sun.tools.javac.zip.ZipFileIndexEntry
15115 instances of class org.apache.lucene.search.BooleanScorer2$Coordinator
15115 instances of class org.apache.lucene.search.ReqExclScorer
7325 instances of class org.apache.lucene.search.ConjunctionScorer$1
1569 instances of class com.sun.tools.javac.zip.ZipFileIndex$DirectoryEntry
1279 instances of class org.apache.lucene.index.FieldInfo
923 instances of class org.apache.catalina.loader.ResourceEntry 


SECOND TIME I INDEX WITH 1200000 INDEXED DOCS... HERE INDEX PROCESS IS STILL
RUNNING 
574603 instances of class org.apache.lucene.index.Term
423558 instances of class org.apache.lucene.index.TermInfo
141394 instances of class
org.apache.lucene.index.FreqProxTermsWriter$PostingList
106729 instances of class org.apache.lucene.search.TermQuery
54858 instances of class org.apache.lucene.index.BufferedDeletes$Num
25347 instances of class
org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput
23233 instances of class com.sun.tools.javac.zip.ZipFileIndexEntry
11587 instances of class org.apache.lucene.search.TermQuery$TermWeight
5793 instances of class org.apache.lucene.search.BooleanScorer2$Coordinator
5793 instances of class org.apache.lucene.search.ReqExclScorer
2922 instances of class org.apache.lucene.search.ConjunctionScorer$1
2170 instances of class org.apache.lucene.index.FieldInfo
1569 instances of class com.sun.tools.javac.zip.ZipFileIndex$DirectoryEntry
923 instances of class org.apache.catalina.loader.ResourceEntry
858 instances of class com.sun.tools.javac.util.List 

SECOND TIME I INDEX, COMPLETED (1.4 MILION DOCS INDEXED)
999753 instances of class org.apache.lucene.index.Term
808190 instances of class org.apache.lucene.index.TermInfo
156511 instances of class org.apache.lucene.search.TermQuery
128975 instances of class
org.apache.lucene.index.FreqProxTermsWriter$PostingList
104396 instances of class org.apache.lucene.index.BufferedDeletes$Num
23233 instances of class com.sun.tools.javac.zip.ZipFileIndexEntry
15401 instances of class
org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput
14896 instances of class org.apache.lucene.search.TermQuery$TermWeight
7447 instances of class org.apache.lucene.search.BooleanScorer2$Coordinator
7447 instances of class org.apache.lucene.search.ReqExclScorer
3025 instances of class org.apache.lucene.search.ConjunctionScorer$1
2660 instances of class org.apache.lucene.index.FieldInfo
1569 instances of class com.sun.tools.javac.zip.ZipFileIndex$DirectoryEntry
923 instances of class org.apache.catalina.loader.ResourceEntry
858 instances of class com.sun.tools.javac.util.List 


THIRD TIME I INDEX WITH 200000 INDEXED DOCS... HERE INDEX PROCESS IS STILL
RUNNING 
591510 instances of class org.apache.lucene.index.Term
384132 instances of class org.apache.lucene.index.TermInfo
264655 instances of class
org.apache.lucene.index.FreqProxTermsWriter$PostingList
261909 instances of class org.apache.lucene.search.TermQuery
149021 instances of class org.apache.lucene.index.BufferedDeletes$Num
23233 instances of class com.sun.tools.javac.zip.ZipFileIndexEntry
9456 instances of class
org.apache.solr.update.processor.TextProfileSignature$Token
5802 instances of class org.apache.lucene.document.Field
5313 instances of class org.apache.solr.common.SolrInputField
5034 instances of class org.apache.solr.common.SolrInputField$1
2642 instances of class org.apache.lucene.index.FieldInfo
1569 instances of class com.sun.tools.javac.zip.ZipFileIndex$DirectoryEntry
1040 instances of class org.apache.lucene.analysis.CharArraySet
1040 instances of class
org.apache.lucene.analysis.tokenattributes.TermAttribute
1038 instances of class
org.apache.lucene.analysis.standard.StandardTokenizer
1038 instances of class
org.apache.lucene.analysis.standard.StandardTokenizerImpl
1038 instances of class
org.apache.lucene.analysis.tokenattributes.TypeAttribute
1035 instances of class org.apache.lucene.analysis.StopFilter
1035 instances of class org.apache.solr.analysis.RemoveDuplicatesTokenFilter
923 instances of class org.apache.catalina.loader.ResourceEntry
858 instances of class com.sun.tools.javac.util.List 


THIRD TIME I INDEX WITH 700000 INDEXED DOCS... HERE INDEX PROCESS IS STILL
RUNNING 
613746 instances of class org.apache.lucene.index.Term
480070 instances of class org.apache.lucene.index.TermInfo
137789 instances of class org.apache.lucene.search.TermQuery
130575 instances of class
org.apache.lucene.index.FreqProxTermsWriter$PostingList
89024 instances of class org.apache.lucene.index.BufferedDeletes$Num
23233 instances of class com.sun.tools.javac.zip.ZipFileIndexEntry
13341 instances of class
org.apache.solr.update.processor.TextProfileSignature$Token
9557 instances of class org.apache.lucene.document.Field
9118 instances of class org.apache.solr.common.SolrInputField
8927 instances of class org.apache.solr.common.SolrInputField$1
2870 instances of class org.apache.lucene.index.FieldInfo
2211 instances of class
org.apache.lucene.analysis.tokenattributes.TermAttribute
2209 instances of class org.apache.solr.analysis.RemoveDuplicatesTokenFilter
1618 instances of class org.apache.lucene.analysis.CharArraySet
1613 instances of class org.apache.lucene.analysis.StopFilter
1613 instances of class
org.apache.lucene.analysis.standard.StandardTokenizer
1613 instances of class
org.apache.lucene.analysis.standard.StandardTokenizerImpl
1613 instances of class
org.apache.lucene.analysis.tokenattributes.TypeAttribute
1569 instances of class com.sun.tools.javac.zip.ZipFileIndex$DirectoryEntry
1292 instances of class
org.apache.solr.update.processor.TextProfileSignature$TokenComparator
923 instances of class org.apache.catalina.loader.ResourceEntry
858 instances of class com.sun.tools.javac.util.List 

If I keep doing full-import with a cron job I will end with a outofmemory
error heap space (but it will take a lot of indexations to happen)
-- 
View this message in context: http://www.nabble.com/stress-tests-to-DIH-and-deduplication-patch-tp23295926p23295926.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: stress tests to DIH and deduplication patch

Posted by Marc Sturlese <ma...@gmail.com>.
I have already ran out of memory after a cronjob indexing as much times as
possible during a day.
Will activate GC loggin to see what it says...
Thnks!


Shalin Shekhar Mangar wrote:
> 
> On Wed, Apr 29, 2009 at 7:44 PM, Marc Sturlese
> <ma...@gmail.com>wrote:
> 
>>
>> Hey there, I am doing some stress tests indexing with DIH.
>> I am indexing a mysql DB with 1400000 rows aprox. I am using also the
>> DeDuplication patch.
>> I am using tomcat with JVM limit of -Xms2000M -Xmx2000M
>> I have indexed 3 times using full-import command without restarting
>> tomcat
>> or reloading the core between the indexations.
>> I have used jmap and jhat to map heap memory in some moments of the
>> indexations.
>> Here I show the beginig of the maps (I don't show the lower part of the
>> stack because object instance numbers are completely stable in there).
>> I have noticed that the number of Term, TermInfo and TermQuery grows
>> between
>> an indexation and another... is that normal?
>>
>>
> Perhaps you should enable GC logging as well. Also, did you actually run
> out
> of memory or you are interpolating and assuming that it might happen?
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.
> 
> 

-- 
View this message in context: http://www.nabble.com/stress-tests-to-DIH-and-deduplication-patch-tp23295926p23314604.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: stress tests to DIH and deduplication patch

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Wed, Apr 29, 2009 at 7:44 PM, Marc Sturlese <ma...@gmail.com>wrote:

>
> Hey there, I am doing some stress tests indexing with DIH.
> I am indexing a mysql DB with 1400000 rows aprox. I am using also the
> DeDuplication patch.
> I am using tomcat with JVM limit of -Xms2000M -Xmx2000M
> I have indexed 3 times using full-import command without restarting tomcat
> or reloading the core between the indexations.
> I have used jmap and jhat to map heap memory in some moments of the
> indexations.
> Here I show the beginig of the maps (I don't show the lower part of the
> stack because object instance numbers are completely stable in there).
> I have noticed that the number of Term, TermInfo and TermQuery grows
> between
> an indexation and another... is that normal?
>
>
Perhaps you should enable GC logging as well. Also, did you actually run out
of memory or you are interpolating and assuming that it might happen?

-- 
Regards,
Shalin Shekhar Mangar.