You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Jamie <ja...@stimulussoft.com> on 2010/03/19 12:49:57 UTC

Lucene 3.0 Search Performance Stats

Hi Guys

I just wanted to congratulate the Lucene guys for a fine job on 3.0!!

Since we switched our indexes to using integer based range queries based 
on Date (YYMMHHSS), search speed is lightening fast and memory 
consumption has dropped considerably!

Some stats:

Indexed Docs: 7.2M emails
Index Size: 24 GB (non optimized)
Search Speed: 0.06 - 0.09 seconds (with sort YYMMHHSS date)

Index stored on 4 SAS HDD hitachi RAID 10
16G RAM
2x Xeon 4 core 2.4Gz
OS FreeBSD 7.2
Filesystem UFS2 gjournal

I believe we are using all search performance recommendations now.

Good job!

Jamie


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Lucene 3.0 Search Performance Stats

Posted by Michael McCandless <lu...@mikemccandless.com>.

Looks like the bulk of your RAM usage is from the 370K index terms in
your terms dict...

The flex branch (once it lands) should substantially reduce that...

Mike

On Mon, Mar 22, 2010 at 8:35 AM, Jamie <ja...@stimulussoft.com> wrote:
> Hi Everyone
>
> The stats I sent through earlier were erroneous due to fact the date range
> query selected fewer records than stated.
>
> The correct stats are:
>
> Lucene 3.0 Stats:
>
> Search conducted using Lucene's Realtime search feature (writer.getReader()
> for each search)
> Analyzer: Russian Analyzer
> Total Docs: 26.04M (emails data - all attachments, body content indexed)
> Index Size: 37G
> Query: body: test AND  date: [200901010101 to 201003220225] with descending
> sort on date.
> Lucene Mem Usage: 32 MB (when sorted on date)
> Search Speed: 0.48 (unsorted)
> Search Speed: 0.49 (sorted on YYYYMMDDHHSS date)
>
> Hardware / Software:
>
> Index stored on 4 SAS HDD hitachi RAID 10
> 16G RAM
> 2x Xeon 4 core 2.4Gz
> OS FreeBSD 7.2
> Filesystem UFS2 gjournal
>
> From Yourkit, memory usage is as follows:
>
> Name    Number of Objects       Shallow Size
> org.apache.lucene.index.TermInfo        370610  14824400
> org.apache.lucene.index.Term    370505  11856160
> com.stimulus.archiva.search.LuceneResult        20000   960000
> org.apache.lucene.search.FieldDoc       10000   320000
> org.apache.lucene.search.ScoreDoc       10000   240000
> org.apache.lucene.index.FieldInfo       1027    41080
> org.apache.lucene.index.SegmentReader$Norm      840     67200
> org.apache.lucene.document.Field        578     36992
> org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput  415     49800
> org.apache.lucene.index.CompoundFileReader$CSIndexInput         380
> 36480
> org.apache.lucene.util.UnicodeUtil$UTF8Result   320     10240
> org.apache.lucene.index.TermBuffer      315     17640
> org.apache.lucene.util.UnicodeUtil$UTF16Result  315     12600
> org.apache.lucene.index.CompoundFileReader$FileEntry    280     8960
> org.apache.lucene.util.CloseableThreadLocal     260     8320
> org.apache.lucene.index.FreqProxTermsWriter$PostingList         256
> 12288
> org.apache.lucene.index.SegmentInfo     185     19240
> org.apache.lucene.index.SegmentReader$FieldsReaderLocal         140     5600
> org.apache.lucene.index.ReadOnlySegmentReader   105     12600
> org.apache.lucene.index.SegmentReader$Ref       105     2520
> org.apache.lucene.index.SegmentTermEnum         105     11760
> org.apache.lucene.search.FieldCacheImpl$Entry   105     3360
> org.apache.lucene.index.TermInfosReader$ThreadResources         70      2240
> org.apache.lucene.util.cache.SimpleLRUCache     70      1680
> org.apache.lucene.util.cache.SimpleLRUCache$1   70      6160
> org.apache.lucene.index.FieldsReader    50      4400
> org.apache.lucene.index.IndexFileDeleter$RefCount       42      1344
> org.apache.lucene.document.Document     41      1312
> org.apache.lucene.util.SimpleStringInterner$Entry       41      1640
> org.apache.lucene.index.FieldInfos      37      1480
> org.apache.lucene.index.CompoundFileReader      35      2520
> org.apache.lucene.index.SegmentReader   35      4200
> org.apache.lucene.index.SegmentReader$CoreReaders       35      4480
> org.apache.lucene.index.TermInfo[]      35      2963448
> org.apache.lucene.index.TermInfosReader         35      3360
> org.apache.lucene.index.Term[]  35      2963448
> org.apache.lucene.search.FieldCache$CreationPlaceholder         35      840
> org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor
>     35      2240
> org.apache.lucene.index.RawPostingList[]        30      8184
> org.apache.lucene.index.TermsHashPerField       24      3648
> org.apache.lucene.analysis.WhitespaceAnalyzer   16      512
> org.apache.lucene.document.Fieldable[]  13      416
> org.apache.lucene.index.DocFieldProcessorPerField       12      672
> org.apache.lucene.index.DocInverterPerField     12      768
> org.apache.lucene.index.FreqProxTermsWriterPerField     12      864
> org.apache.lucene.index.NormsWriterPerField     12      864
> org.apache.lucene.index.TermVectorsTermsWriterPerField  12      960
> org.apache.lucene.util.AttributeSource$State    12      384
> org.apache.lucene.util.Version  8       256
> org.apache.lucene.index.SegmentInfos    7       616
> org.apache.lucene.document.FieldSelectorResult  6       192
> org.apache.lucene.analysis.CharArraySet$UnmodifiableCharArraySet        5
>     160
> org.apache.lucene.analysis.tokenattributes.OffsetAttributeImpl  5       120
> org.apache.lucene.analysis.tokenattributes.TermAttributeImpl    5       160
> org.apache.lucene.analysis.tokenattributes.PositionIncrementAttributeImpl 4
>     96
> org.apache.lucene.index.BufferedDeletes         4       224
> org.apache.lucene.index.TermsHash       4       320
> org.apache.lucene.analysis.LowerCaseFilter      3       192
> org.apache.lucene.analysis.PerFieldAnalyzerWrapper      3       144
> org.apache.lucene.analysis.SimpleAnalyzer       3       96
> org.apache.lucene.analysis.StopFilter   3       264
> org.apache.lucene.analysis.ru.RussianAnalyzer   3       144
> org.apache.lucene.analysis.ru.RussianAnalyzer$SavedStreams      3       120
> org.apache.lucene.analysis.ru.RussianLetterTokenizer    3       288
> org.apache.lucene.analysis.ru.RussianStemFilter         3       216
> org.apache.lucene.analysis.ru.RussianStemmer    3       96
> org.apache.lucene.index.IndexReader[]   3       912
> org.apache.lucene.index.ReadOnlyDirectoryReader         3       384
> org.apache.lucene.index.SegmentReader[]         3       912
> org.apache.lucene.search.Sort   3       72
> org.apache.lucene.search.SortField      3       168
> org.apache.lucene.search.SortField[]    3       96
> org.apache.lucene.search.TermQuery      3       96
> org.apache.lucene.util.NamedThreadFactory       3       120
>
>
>
>
>
>
>
>
>
> org.apache.lucene.analysis.CharReader   2       80
> org.apache.lucene.index.ByteBlockPool   2       112
> org.apache.lucene.index.ConcurrentMergeScheduler        2       112
> org.apache.lucene.index.DocFieldProcessor       2       96
> org.apache.lucene.index.DocFieldProcessorPerField[]     2       432
> org.apache.lucene.index.DocInverter     2       80
> org.apache.lucene.index.DocumentsWriter         2       608
> org.apache.lucene.index.DocumentsWriter$ByteBlockAllocator      2       64
> org.apache.lucene.index.DocumentsWriter$DocWriter[]     2       208
> org.apache.lucene.index.DocumentsWriter$SkipDocWriter   2       64
> org.apache.lucene.index.DocumentsWriter$WaitQueue       2       112
> org.apache.lucene.index.DocumentsWriterThreadState[]    2       56
> org.apache.lucene.index.FreqProxTermsWriter     2       80
> org.apache.lucene.index.IndexFileDeleter        2       192
> org.apache.lucene.index.IndexFileDeleter$CommitPoint    2       176
> org.apache.lucene.index.IndexWriter     2       592
> org.apache.lucene.index.IndexWriter$MaxFieldLength      2       64
> org.apache.lucene.index.IndexWriter$ReaderPool  2       64
> org.apache.lucene.index.IntBlockPool    2       112
> org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy        2       32
> org.apache.lucene.index.LogByteSizeMergePolicy  2       112
> org.apache.lucene.index.NormsWriter     2       48
> org.apache.lucene.index.StoredFieldsWriter      2       128
> org.apache.lucene.index.StoredFieldsWriter$PerDoc[]     2       64
> org.apache.lucene.index.TermVectorsTermsWriter  2       176
> org.apache.lucene.index.TermVectorsTermsWriter$PerDoc[]         2       64
> org.apache.lucene.index.TermsHashPerThread      2       176
> org.apache.lucene.queryParser.QueryParser$Operator      2       64
> org.apache.lucene.search.BooleanClause  2       64
> org.apache.lucene.search.NumericRangeQuery      2       160
> org.apache.lucene.search.ParallelMultiSearcher  2       144
> org.apache.lucene.search.QueryWrapperFilter     2       48
> org.apache.lucene.search.ScoreDoc[]     2       48
> org.apache.lucene.search.Searchable[]   2       64
> org.apache.lucene.store.NIOFSDirectory  2       96
> org.apache.lucene.store.NativeFSLock    2       128
> org.apache.lucene.store.NativeFSLockFactory     2       80
> org.apache.lucene.analysis.CharArraySet         1       32
> org.apache.lucene.analysis.LowerCaseTokenizer   1       96
> org.apache.lucene.document.Field$Index$1        1       32
> org.apache.lucene.document.Field$Index$2        1       32
> org.apache.lucene.document.Field$Index$3        1       32
> org.apache.lucene.document.Field$Index$4        1       32
> org.apache.lucene.document.Field$Index$5        1       32
> org.apache.lucene.document.Field$Index[]        1       64
> org.apache.lucene.document.Field$Store$1        1       32
> org.apache.lucene.document.Field$Store$2        1       32
> org.apache.lucene.document.Field$Store[]        1       40
> org.apache.lucene.document.Field$TermVector$1   1       32
> org.apache.lucene.document.Field$TermVector$2   1       32
> org.apache.lucene.document.Field$TermVector$3   1       32
> org.apache.lucene.document.Field$TermVector$4   1       32
> org.apache.lucene.document.Field$TermVector$5   1       32
> org.apache.lucene.document.Field$TermVector[]   1       64
> org.apache.lucene.document.FieldSelectorResult[]        1       72
> org.apache.lucene.document.Field[]      1       24
> org.apache.lucene.index.ByteSliceReader         1       80
> org.apache.lucene.index.CharBlockPool   1       56
> org.apache.lucene.index.DocFieldProcessorPerThread      1       112
> org.apache.lucene.index.DocFieldProcessorPerThread$PerDoc[]     1       32
> org.apache.lucene.index.DocInverterPerThread    1       72
> org.apache.lucene.index.DocInverterPerThread$SingleTokenAttributeSource
>     1       64
> org.apache.lucene.index.DocumentsWriter$1       1       16
> org.apache.lucene.index.DocumentsWriter$DocState        1       72
> org.apache.lucene.index.DocumentsWriterThreadState      1       48
> org.apache.lucene.index.FieldInvertState        1       48
> org.apache.lucene.index.FieldsWriter    1       48
> org.apache.lucene.index.FreqProxTermsWriterPerThread    1       32
> org.apache.lucene.index.IndexFileNameFilter     1       32
> org.apache.lucene.index.NormsWriterPerThread    1       32
> org.apache.lucene.index.ReusableStringReader    1       48
> org.apache.lucene.index.SegmentWriteState       1       72
> org.apache.lucene.index.StoredFieldsWriter$PerDoc       1       56
> org.apache.lucene.index.StoredFieldsWriterPerThread     1       48
> org.apache.lucene.index.TermVectorsTermsWriterPerThread         1       72
> org.apache.lucene.queryParser.QueryParser$Operator[]    1       40
> org.apache.lucene.search.BooleanClause$Occur$1  1       32
> org.apache.lucene.search.BooleanClause$Occur$2  1       32
> org.apache.lucene.search.BooleanClause$Occur$3  1       32
> org.apache.lucene.search.BooleanClause$Occur[]  1       48
> org.apache.lucene.search.BooleanQuery   1       40
> org.apache.lucene.search.DefaultSimilarity      1       24
> org.apache.lucene.search.DocIdSet$1     1       24
> org.apache.lucene.search.DocIdSet$1$1   1       32
> org.apache.lucene.search.FieldCache$1   1       16
> org.apache.lucene.search.FieldCache$10  1       16
> org.apache.lucene.search.FieldCache$2   1       16
> org.apache.lucene.search.FieldCache$3   1       16
> org.apache.lucene.search.FieldCache$4   1       16
> org.apache.lucene.search.FieldCache$5   1       16
> org.apache.lucene.search.FieldCache$6   1       16
> org.apache.lucene.search.FieldCache$7   1       16
> org.apache.lucene.search.FieldCache$8   1       16
> org.apache.lucene.search.FieldCache$9   1       16
> org.apache.lucene.search.FieldCacheImpl         1       32
> org.apache.lucene.search.FieldCacheImpl$ByteCache       1       32
> org.apache.lucene.search.FieldCacheImpl$DoubleCache     1       32
> org.apache.lucene.search.FieldCacheImpl$FloatCache      1       32
> org.apache.lucene.search.FieldCacheImpl$IntCache        1       32
> org.apache.lucene.search.FieldCacheImpl$LongCache       1       32
> org.apache.lucene.search.FieldCacheImpl$ShortCache      1       32
> org.apache.lucene.search.FieldCacheImpl$StringCache     1       32
> org.apache.lucene.search.FieldCacheImpl$StringIndexCache        1       32
> org.apache.lucene.search.MultiTermQuery$1       1       32
> org.apache.lucene.search.MultiTermQuery$ConstantScoreBooleanQueryRewrite
>    1       16
> org.apache.lucene.search.MultiTermQuery$ConstantScoreFilterRewrite      1
>     16
> org.apache.lucene.search.MultiTermQuery$ScoringBooleanQueryRewrite      1
>     16
> org.apache.lucene.search.TopDocs        1       32
> org.apache.lucene.store.RAMFile         1       56
> org.apache.lucene.store.RAMOutputStream         1       72
> org.apache.lucene.util.AttributeSource$AttributeFactory$DefaultAttributeFactory
>         1       16
> org.apache.lucene.util.SimpleStringInterner     1       32
> org.apache.lucene.util.SimpleStringInterner$Entry[]     1       8216
> org.apache.lucene.util.UnicodeUtil$UTF8Result[]         1       40
> org.apache.lucene.util.Version[]        1       88
>
>
>        34562928 <mailto:=@SUM%28C1:C193%29>
>
>
>        32.96
>
>
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Lucene 3.0 Search Performance Stats

Posted by Jamie <ja...@stimulussoft.com>.

Hi Everyone

The stats I sent through earlier were erroneous due to fact the date 
range query selected fewer records than stated.

The correct stats are:

Lucene 3.0 Stats:

Search conducted using Lucene's Realtime search feature 
(writer.getReader() for each search)
Analyzer: Russian Analyzer
Total Docs: 26.04M (emails data - all attachments, body content indexed)
Index Size: 37G
Query: body: test AND  date: [200901010101 to 201003220225] with 
descending sort on date.
Lucene Mem Usage: 32 MB (when sorted on date)
Search Speed: 0.48 (unsorted)
Search Speed: 0.49 (sorted on YYYYMMDDHHSS date)

Hardware / Software:

Index stored on 4 SAS HDD hitachi RAID 10
16G RAM
2x Xeon 4 core 2.4Gz
OS FreeBSD 7.2
Filesystem UFS2 gjournal

 From Yourkit, memory usage is as follows:

Name 	Number of Objects 	Shallow Size
org.apache.lucene.index.TermInfo 	370610 	14824400
org.apache.lucene.index.Term 	370505 	11856160
com.stimulus.archiva.search.LuceneResult 	20000 	960000
org.apache.lucene.search.FieldDoc 	10000 	320000
org.apache.lucene.search.ScoreDoc 	10000 	240000
org.apache.lucene.index.FieldInfo 	1027 	41080
org.apache.lucene.index.SegmentReader$Norm 	840 	67200
org.apache.lucene.document.Field 	578 	36992
org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput 	415 	49800
org.apache.lucene.index.CompoundFileReader$CSIndexInput 	380 	36480
org.apache.lucene.util.UnicodeUtil$UTF8Result 	320 	10240
org.apache.lucene.index.TermBuffer 	315 	17640
org.apache.lucene.util.UnicodeUtil$UTF16Result 	315 	12600
org.apache.lucene.index.CompoundFileReader$FileEntry 	280 	8960
org.apache.lucene.util.CloseableThreadLocal 	260 	8320
org.apache.lucene.index.FreqProxTermsWriter$PostingList 	256 	12288
org.apache.lucene.index.SegmentInfo 	185 	19240
org.apache.lucene.index.SegmentReader$FieldsReaderLocal 	140 	5600
org.apache.lucene.index.ReadOnlySegmentReader 	105 	12600
org.apache.lucene.index.SegmentReader$Ref 	105 	2520
org.apache.lucene.index.SegmentTermEnum 	105 	11760
org.apache.lucene.search.FieldCacheImpl$Entry 	105 	3360
org.apache.lucene.index.TermInfosReader$ThreadResources 	70 	2240
org.apache.lucene.util.cache.SimpleLRUCache 	70 	1680
org.apache.lucene.util.cache.SimpleLRUCache$1 	70 	6160
org.apache.lucene.index.FieldsReader 	50 	4400
org.apache.lucene.index.IndexFileDeleter$RefCount 	42 	1344
org.apache.lucene.document.Document 	41 	1312
org.apache.lucene.util.SimpleStringInterner$Entry 	41 	1640
org.apache.lucene.index.FieldInfos 	37 	1480
org.apache.lucene.index.CompoundFileReader 	35 	2520
org.apache.lucene.index.SegmentReader 	35 	4200
org.apache.lucene.index.SegmentReader$CoreReaders 	35 	4480
org.apache.lucene.index.TermInfo[] 	35 	2963448
org.apache.lucene.index.TermInfosReader 	35 	3360
org.apache.lucene.index.Term[] 	35 	2963448
org.apache.lucene.search.FieldCache$CreationPlaceholder 	35 	840
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor 	35 
	2240
org.apache.lucene.index.RawPostingList[] 	30 	8184
org.apache.lucene.index.TermsHashPerField 	24 	3648
org.apache.lucene.analysis.WhitespaceAnalyzer 	16 	512
org.apache.lucene.document.Fieldable[] 	13 	416
org.apache.lucene.index.DocFieldProcessorPerField 	12 	672
org.apache.lucene.index.DocInverterPerField 	12 	768
org.apache.lucene.index.FreqProxTermsWriterPerField 	12 	864
org.apache.lucene.index.NormsWriterPerField 	12 	864
org.apache.lucene.index.TermVectorsTermsWriterPerField 	12 	960
org.apache.lucene.util.AttributeSource$State 	12 	384
org.apache.lucene.util.Version 	8 	256
org.apache.lucene.index.SegmentInfos 	7 	616
org.apache.lucene.document.FieldSelectorResult 	6 	192
org.apache.lucene.analysis.CharArraySet$UnmodifiableCharArraySet 	5 	160
org.apache.lucene.analysis.tokenattributes.OffsetAttributeImpl 	5 	120
org.apache.lucene.analysis.tokenattributes.TermAttributeImpl 	5 	160
org.apache.lucene.analysis.tokenattributes.PositionIncrementAttributeImpl 
4 	96
org.apache.lucene.index.BufferedDeletes 	4 	224
org.apache.lucene.index.TermsHash 	4 	320
org.apache.lucene.analysis.LowerCaseFilter 	3 	192
org.apache.lucene.analysis.PerFieldAnalyzerWrapper 	3 	144
org.apache.lucene.analysis.SimpleAnalyzer 	3 	96
org.apache.lucene.analysis.StopFilter 	3 	264
org.apache.lucene.analysis.ru.RussianAnalyzer 	3 	144
org.apache.lucene.analysis.ru.RussianAnalyzer$SavedStreams 	3 	120
org.apache.lucene.analysis.ru.RussianLetterTokenizer 	3 	288
org.apache.lucene.analysis.ru.RussianStemFilter 	3 	216
org.apache.lucene.analysis.ru.RussianStemmer 	3 	96
org.apache.lucene.index.IndexReader[] 	3 	912
org.apache.lucene.index.ReadOnlyDirectoryReader 	3 	384
org.apache.lucene.index.SegmentReader[] 	3 	912
org.apache.lucene.search.Sort 	3 	72
org.apache.lucene.search.SortField 	3 	168
org.apache.lucene.search.SortField[] 	3 	96
org.apache.lucene.search.TermQuery 	3 	96
org.apache.lucene.util.NamedThreadFactory 	3 	120

	
	

	
	

	
	
org.apache.lucene.analysis.CharReader 	2 	80
org.apache.lucene.index.ByteBlockPool 	2 	112
org.apache.lucene.index.ConcurrentMergeScheduler 	2 	112
org.apache.lucene.index.DocFieldProcessor 	2 	96
org.apache.lucene.index.DocFieldProcessorPerField[] 	2 	432
org.apache.lucene.index.DocInverter 	2 	80
org.apache.lucene.index.DocumentsWriter 	2 	608
org.apache.lucene.index.DocumentsWriter$ByteBlockAllocator 	2 	64
org.apache.lucene.index.DocumentsWriter$DocWriter[] 	2 	208
org.apache.lucene.index.DocumentsWriter$SkipDocWriter 	2 	64
org.apache.lucene.index.DocumentsWriter$WaitQueue 	2 	112
org.apache.lucene.index.DocumentsWriterThreadState[] 	2 	56
org.apache.lucene.index.FreqProxTermsWriter 	2 	80
org.apache.lucene.index.IndexFileDeleter 	2 	192
org.apache.lucene.index.IndexFileDeleter$CommitPoint 	2 	176
org.apache.lucene.index.IndexWriter 	2 	592
org.apache.lucene.index.IndexWriter$MaxFieldLength 	2 	64
org.apache.lucene.index.IndexWriter$ReaderPool 	2 	64
org.apache.lucene.index.IntBlockPool 	2 	112
org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy 	2 	32
org.apache.lucene.index.LogByteSizeMergePolicy 	2 	112
org.apache.lucene.index.NormsWriter 	2 	48
org.apache.lucene.index.StoredFieldsWriter 	2 	128
org.apache.lucene.index.StoredFieldsWriter$PerDoc[] 	2 	64
org.apache.lucene.index.TermVectorsTermsWriter 	2 	176
org.apache.lucene.index.TermVectorsTermsWriter$PerDoc[] 	2 	64
org.apache.lucene.index.TermsHashPerThread 	2 	176
org.apache.lucene.queryParser.QueryParser$Operator 	2 	64
org.apache.lucene.search.BooleanClause 	2 	64
org.apache.lucene.search.NumericRangeQuery 	2 	160
org.apache.lucene.search.ParallelMultiSearcher 	2 	144
org.apache.lucene.search.QueryWrapperFilter 	2 	48
org.apache.lucene.search.ScoreDoc[] 	2 	48
org.apache.lucene.search.Searchable[] 	2 	64
org.apache.lucene.store.NIOFSDirectory 	2 	96
org.apache.lucene.store.NativeFSLock 	2 	128
org.apache.lucene.store.NativeFSLockFactory 	2 	80
org.apache.lucene.analysis.CharArraySet 	1 	32
org.apache.lucene.analysis.LowerCaseTokenizer 	1 	96
org.apache.lucene.document.Field$Index$1 	1 	32
org.apache.lucene.document.Field$Index$2 	1 	32
org.apache.lucene.document.Field$Index$3 	1 	32
org.apache.lucene.document.Field$Index$4 	1 	32
org.apache.lucene.document.Field$Index$5 	1 	32
org.apache.lucene.document.Field$Index[] 	1 	64
org.apache.lucene.document.Field$Store$1 	1 	32
org.apache.lucene.document.Field$Store$2 	1 	32
org.apache.lucene.document.Field$Store[] 	1 	40
org.apache.lucene.document.Field$TermVector$1 	1 	32
org.apache.lucene.document.Field$TermVector$2 	1 	32
org.apache.lucene.document.Field$TermVector$3 	1 	32
org.apache.lucene.document.Field$TermVector$4 	1 	32
org.apache.lucene.document.Field$TermVector$5 	1 	32
org.apache.lucene.document.Field$TermVector[] 	1 	64
org.apache.lucene.document.FieldSelectorResult[] 	1 	72
org.apache.lucene.document.Field[] 	1 	24
org.apache.lucene.index.ByteSliceReader 	1 	80
org.apache.lucene.index.CharBlockPool 	1 	56
org.apache.lucene.index.DocFieldProcessorPerThread 	1 	112
org.apache.lucene.index.DocFieldProcessorPerThread$PerDoc[] 	1 	32
org.apache.lucene.index.DocInverterPerThread 	1 	72
org.apache.lucene.index.DocInverterPerThread$SingleTokenAttributeSource 	1 
	64
org.apache.lucene.index.DocumentsWriter$1 	1 	16
org.apache.lucene.index.DocumentsWriter$DocState 	1 	72
org.apache.lucene.index.DocumentsWriterThreadState 	1 	48
org.apache.lucene.index.FieldInvertState 	1 	48
org.apache.lucene.index.FieldsWriter 	1 	48
org.apache.lucene.index.FreqProxTermsWriterPerThread 	1 	32
org.apache.lucene.index.IndexFileNameFilter 	1 	32
org.apache.lucene.index.NormsWriterPerThread 	1 	32
org.apache.lucene.index.ReusableStringReader 	1 	48
org.apache.lucene.index.SegmentWriteState 	1 	72
org.apache.lucene.index.StoredFieldsWriter$PerDoc 	1 	56
org.apache.lucene.index.StoredFieldsWriterPerThread 	1 	48
org.apache.lucene.index.TermVectorsTermsWriterPerThread 	1 	72
org.apache.lucene.queryParser.QueryParser$Operator[] 	1 	40
org.apache.lucene.search.BooleanClause$Occur$1 	1 	32
org.apache.lucene.search.BooleanClause$Occur$2 	1 	32
org.apache.lucene.search.BooleanClause$Occur$3 	1 	32
org.apache.lucene.search.BooleanClause$Occur[] 	1 	48
org.apache.lucene.search.BooleanQuery 	1 	40
org.apache.lucene.search.DefaultSimilarity 	1 	24
org.apache.lucene.search.DocIdSet$1 	1 	24
org.apache.lucene.search.DocIdSet$1$1 	1 	32
org.apache.lucene.search.FieldCache$1 	1 	16
org.apache.lucene.search.FieldCache$10 	1 	16
org.apache.lucene.search.FieldCache$2 	1 	16
org.apache.lucene.search.FieldCache$3 	1 	16
org.apache.lucene.search.FieldCache$4 	1 	16
org.apache.lucene.search.FieldCache$5 	1 	16
org.apache.lucene.search.FieldCache$6 	1 	16
org.apache.lucene.search.FieldCache$7 	1 	16
org.apache.lucene.search.FieldCache$8 	1 	16
org.apache.lucene.search.FieldCache$9 	1 	16
org.apache.lucene.search.FieldCacheImpl 	1 	32
org.apache.lucene.search.FieldCacheImpl$ByteCache 	1 	32
org.apache.lucene.search.FieldCacheImpl$DoubleCache 	1 	32
org.apache.lucene.search.FieldCacheImpl$FloatCache 	1 	32
org.apache.lucene.search.FieldCacheImpl$IntCache 	1 	32
org.apache.lucene.search.FieldCacheImpl$LongCache 	1 	32
org.apache.lucene.search.FieldCacheImpl$ShortCache 	1 	32
org.apache.lucene.search.FieldCacheImpl$StringCache 	1 	32
org.apache.lucene.search.FieldCacheImpl$StringIndexCache 	1 	32
org.apache.lucene.search.MultiTermQuery$1 	1 	32
org.apache.lucene.search.MultiTermQuery$ConstantScoreBooleanQueryRewrite 	1 
	16
org.apache.lucene.search.MultiTermQuery$ConstantScoreFilterRewrite 	1 	16
org.apache.lucene.search.MultiTermQuery$ScoringBooleanQueryRewrite 	1 	16
org.apache.lucene.search.TopDocs 	1 	32
org.apache.lucene.store.RAMFile 	1 	56
org.apache.lucene.store.RAMOutputStream 	1 	72
org.apache.lucene.util.AttributeSource$AttributeFactory$DefaultAttributeFactory 
	1 	16
org.apache.lucene.util.SimpleStringInterner 	1 	32
org.apache.lucene.util.SimpleStringInterner$Entry[] 	1 	8216
org.apache.lucene.util.UnicodeUtil$UTF8Result[] 	1 	40
org.apache.lucene.util.Version[] 	1 	88

	
	34562928 <mailto:=@SUM%28C1:C193%29>

	
	32.96

RE: Lucene 3.0 Search Performance Stats

Posted by Uwe Schindler <uw...@thetaphi.de>.

Hi Jamie,

thanks for reporting back the numbers about your usage of NumericField and NumericRangeQuery! I am glad to hear about it.

> Sure. As soon as I get access to the server again, I'll get the mem
> stats for you. I will say that Lucene was consuming a large amount of
> memory before we moved over to using Numerics. The reason for this is
> that we were encoding dates as strings. Our date time strings were
> unique, so as the number of records exploded, so too did the number of
> terms in the index. As I understand it (and I am no Lucene expert),
> Lucene's sorting mechanisms need to load up all the terms in the index
> in memory during a sort. Thus, if you execute a sorted search, Lucene's
> memory consumption goes through the roof. Using Numerics avoids this
> problem.

Thats true, as you only need to store an integer per document in the cache (and not a String). Also performance for FieldCache warmup and sorting is higher.

> There are lot of other strategies we used to reduce memory consumption.
> Like, making sure that you are caching Searchers and IndexReaders, etc.
> 
> Regards,
> 
> Jamie
> 
> 
> 
> On 2010/03/19 07:04 PM, Monique Monteiro wrote:
> > Hi Jamie,
> >
> >    could you please tell us how much memory does your application
> consume
> > with Lucene? I'm asking it because we are having memory consumption
> problems
> > with a 32GB index and 1.5GB od RAM allocated to our web application.
> At the
> > momento, we use textual search.
> >   Thanks in advance,
> > Monique
> > On Fri, Mar 19, 2010 at 8:49 AM, Jamie<ja...@stimulussoft.com>
> wrote:
> >
> >
> >> Hi Guys
> >>
> >> I just wanted to congratulate the Lucene guys for a fine job on
> 3.0!!
> >>
> >> Since we switched our indexes to using integer based range queries
> based on
> >> Date (YYMMHHSS), search speed is lightening fast and memory
> consumption has
> >> dropped considerably!
> >>
> >> Some stats:
> >>
> >> Indexed Docs: 7.2M emails
> >> Index Size: 24 GB (non optimized)
> >> Search Speed: 0.06 - 0.09 seconds (with sort YYMMHHSS date)
> >>
> >> Index stored on 4 SAS HDD hitachi RAID 10
> >> 16G RAM
> >> 2x Xeon 4 core 2.4Gz
> >> OS FreeBSD 7.2
> >> Filesystem UFS2 gjournal
> >>
> >> I believe we are using all search performance recommendations now.
> >>
> >> Good job!
> >>
> >> Jamie
> >>
> >>
> >>
> >> --------------------------------------------------------------------
> -
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >>
> >
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Lucene 3.0 Search Performance Stats

Posted by Jamie <ja...@stimulussoft.com>.

Hi Monique

Sure. As soon as I get access to the server again, I'll get the mem 
stats for you. I will say that Lucene was consuming a large amount of 
memory before we moved over to using Numerics. The reason for this is 
that we were encoding dates as strings. Our date time strings were 
unique, so as the number of records exploded, so too did the number of 
terms in the index. As I understand it (and I am no Lucene expert), 
Lucene's sorting mechanisms need to load up all the terms in the index 
in memory during a sort. Thus, if you execute a sorted search, Lucene's 
memory consumption goes through the roof. Using Numerics avoids this 
problem.

There are lot of other strategies we used to reduce memory consumption. 
Like, making sure that you are caching Searchers and IndexReaders, etc.

Regards,

Jamie

On 2010/03/19 07:04 PM, Monique Monteiro wrote:
> Hi Jamie,
>
>    could you please tell us how much memory does your application consume
> with Lucene? I'm asking it because we are having memory consumption problems
> with a 32GB index and 1.5GB od RAM allocated to our web application. At the
> momento, we use textual search.
>   Thanks in advance,
> Monique
> On Fri, Mar 19, 2010 at 8:49 AM, Jamie<ja...@stimulussoft.com>  wrote:
>
>    
>> Hi Guys
>>
>> I just wanted to congratulate the Lucene guys for a fine job on 3.0!!
>>
>> Since we switched our indexes to using integer based range queries based on
>> Date (YYMMHHSS), search speed is lightening fast and memory consumption has
>> dropped considerably!
>>
>> Some stats:
>>
>> Indexed Docs: 7.2M emails
>> Index Size: 24 GB (non optimized)
>> Search Speed: 0.06 - 0.09 seconds (with sort YYMMHHSS date)
>>
>> Index stored on 4 SAS HDD hitachi RAID 10
>> 16G RAM
>> 2x Xeon 4 core 2.4Gz
>> OS FreeBSD 7.2
>> Filesystem UFS2 gjournal
>>
>> I believe we are using all search performance recommendations now.
>>
>> Good job!
>>
>> Jamie
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>      
>
>    

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Lucene 3.0 Search Performance Stats

Posted by Monique Monteiro <mo...@gmail.com>.

Hi Jamie,

  could you please tell us how much memory does your application consume
with Lucene? I'm asking it because we are having memory consumption problems
with a 32GB index and 1.5GB od RAM allocated to our web application. At the
momento, we use textual search.
 Thanks in advance,
Monique
On Fri, Mar 19, 2010 at 8:49 AM, Jamie <ja...@stimulussoft.com> wrote:

> Hi Guys
>
> I just wanted to congratulate the Lucene guys for a fine job on 3.0!!
>
> Since we switched our indexes to using integer based range queries based on
> Date (YYMMHHSS), search speed is lightening fast and memory consumption has
> dropped considerably!
>
> Some stats:
>
> Indexed Docs: 7.2M emails
> Index Size: 24 GB (non optimized)
> Search Speed: 0.06 - 0.09 seconds (with sort YYMMHHSS date)
>
> Index stored on 4 SAS HDD hitachi RAID 10
> 16G RAM
> 2x Xeon 4 core 2.4Gz
> OS FreeBSD 7.2
> Filesystem UFS2 gjournal
>
> I believe we are using all search performance recommendations now.
>
> Good job!
>
> Jamie
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
Monique Monteiro, MSc
Auditora Federal de Controle Externo - Tribunal de Contas da União (TCU)
IBM OOAD / SCJP / MCTS Web
Blog: http://moniquelouise.spaces.live.com/
Twitter: http://twitter.com/monilouise
MSN: monique_louise@msn.com
GTalk: monique.louise@gmail.com

Re: Lucene 3.0 Search Performance Stats

Posted by Michael McCandless <lu...@mikemccandless.com>.

Very nice!  Thanks for sharing :)

Mike

On Fri, Mar 19, 2010 at 6:53 AM, Jamie <ja...@stimulussoft.com> wrote:
> I forgot to point out, this is a search using the Lucene realtime search
> feature. We get the reader from indexwriter.getReader() for each search.
>
> On 2010/03/19 01:49 PM, Jamie wrote:
>>
>> Hi Guys
>>
>> I just wanted to congratulate the Lucene guys for a fine job on 3.0!!
>>
>> Since we switched our indexes to using integer based range queries based
>> on Date (YYMMHHSS), search speed is lightening fast and memory consumption
>> has dropped considerably!
>>
>> Some stats:
>>
>> Indexed Docs: 7.2M emails
>> Index Size: 24 GB (non optimized)
>> Search Speed: 0.06 - 0.09 seconds (with sort YYMMHHSS date)
>>
>> Index stored on 4 SAS HDD hitachi RAID 10
>> 16G RAM
>> 2x Xeon 4 core 2.4Gz
>> OS FreeBSD 7.2
>> Filesystem UFS2 gjournal
>>
>> I believe we are using all search performance recommendations now.
>>
>> Good job!
>>
>> Jamie
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Lucene 3.0 Search Performance Stats

Posted by Jamie <ja...@stimulussoft.com>.

I forgot to point out, this is a search using the Lucene realtime search 
feature. We get the reader from indexwriter.getReader() for each search.

On 2010/03/19 01:49 PM, Jamie wrote:
> Hi Guys
>
> I just wanted to congratulate the Lucene guys for a fine job on 3.0!!
>
> Since we switched our indexes to using integer based range queries 
> based on Date (YYMMHHSS), search speed is lightening fast and memory 
> consumption has dropped considerably!
>
> Some stats:
>
> Indexed Docs: 7.2M emails
> Index Size: 24 GB (non optimized)
> Search Speed: 0.06 - 0.09 seconds (with sort YYMMHHSS date)
>
> Index stored on 4 SAS HDD hitachi RAID 10
> 16G RAM
> 2x Xeon 4 core 2.4Gz
> OS FreeBSD 7.2
> Filesystem UFS2 gjournal
>
> I believe we are using all search performance recommendations now.
>
> Good job!
>
> Jamie
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org