You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Thomas Becker <th...@net-m.de> on 2009/09/15 15:30:14 UTC

lucene 2.9.0RC4 slower than 2.4.1?

Hi all,

I'm experiencing a performance degradation after migrating to 2.9 and running
some tests. I'm getting out of ideas and any help to identify the reasons why
2.9 is slower than 2.4 are highly appreciated.

We've had some issues with custom sorting in lucene 2.4.1. We worked around them
by sorting the resultsets manually and caching the results after sorting (memory
consuming but fast).

I now migrated to lucene 2.9.0RC4. Build some new FieldComparatorSource
implementation for sorting and refactored all deprecated api calls to the new
lucene 2.9 api.

Everything works fine from a functional perspective. But performance severly is
(negatively) affected by lucene 2.9.

I profiled the application for a couple of hours, build a jmeter load test and
compared the following scenarios:

1. lucene 2.9 - new api
2. lucene 2.9 - old api and custom sorting after lucene
3. lucene 2.4.1 - old api and custom sorting after lucene (what we had up2now)

Please find attached an rrd graph showing the results. The lighter the color the
faster the request has been served. y=# requests, x=time.

Most interestingly simply switching the lucene jars between 2.4 and 2.9 degraded
response times and therefore throughput (see results of testcase 2 and 3).
Adapting to the new api decreased performance again. The difference between
testcase 1 and 2 is most probably due to precached custom sorted results.

The application under test is a dedicated lucene search engine doing nothing
else, but serving search requests. We're running a cluster of them in prd and
it's incredibly fast. With the old implementation and prd traffic we've above
98% of the requests served in 200ms.
The index under test contains about 3 million documents (with lots of fields),
consumes about 2,5gig disk space and is stored on a tmpfs RAMDISK provided by
the linux kernel.

Most interesting methods used for searching are:

getHitsCount (is there a way to speed this up?):

	public int getHitsCount(String query, Filter filter) throws
LuceneServiceException {
		log.debug("getHitsCount('{}, {}')", query, filter);
		if (StringUtils.isBlank(query)) {
			log.warn("getHitsCount: empty lucene query");
			return 0;
		}
		long startTimeMillis = System.currentTimeMillis();
		int count = 0;

		if (indexSearcher == null) {
			return 0;
		}

		BooleanQuery.setMaxClauseCount(MAXCLAUSECOUNT);
		Query q = null;
		try {
			q = createQuery(query);
			TopScoreDocCollector tsdc = TopScoreDocCollector.create(1, true);
			indexSearcher.search(q, filter, tsdc);
			count = tsdc.getTotalHits();
			log.info("getHitsCount: count = {}",count);
		} catch (ParseException ex) {
			throw new LuceneServiceException("invalid lucene query:" + query, ex);
		} catch (IOException e) {
			throw new LuceneServiceException(" indexSearcher could be corrupted", e);
		} finally {
			long durationMillis = System.currentTimeMillis() - startTimeMillis;
			if (durationMillis > slowQueryLimit) {
				log.warn("getHitsCount: Slow query: {} ms, query={}", durationMillis, query);
			}
			log.debug("getHitsCount: query took {} ms", durationMillis);
		}
		return count;
	}

search:
	public List<Document> search(String query, Filter filter, Sort sort, int from,
int size) throws LuceneServiceException {
		log.debug("{} search('{}', {}, {}, {}, {})", new Object[] { indexAlias, query,
filter, sort, from, size });
		long startTimeMillis = System.currentTimeMillis();

		List<Document> docs = new ArrayList<Document>();
		if (indexSearcher == null) {
			return docs;
		}
		Query q = null;
		try {
			if (query == null) {
				log.warn("search: lucene query is null...");
				return docs;
			}
			q = createQuery(query);
			BooleanQuery.setMaxClauseCount(MAXCLAUSECOUNT);
			if (size < 0 || size > maxNumHits) {
				// set hard limit for numHits
				size = maxNumHits;
				if (log.isDebugEnabled())
					log.debug("search: Size set to hardlimit: {} for query: {} with filter:
{}", new Object[] { size, query, filter });
			}
			TopFieldCollector collector = TopFieldCollector.create(sort, size + from,
true, false, false, true);
			indexSearcher.search(q, filter, collector);
			if(size > collector.getTotalHits())
				size = collector.getTotalHits();
			if (size > 100000)
				log.info("search: size: {} bigger than 100.000 for query: {} with filter:
{}", new Object[] { size, query, filter });
			TopDocs td = collector.topDocs(from, size);
			ScoreDoc[] scoreDocs = td.scoreDocs;
			for (ScoreDoc scoreDoc : scoreDocs) {
				docs.add(indexSearcher.doc(scoreDoc.doc));
			}
		} catch (ParseException e) {
			log.warn("search: ParseException: {}", e.getMessage());
			if (log.isDebugEnabled())
				log.warn("search: ParseException: ", e);
			return Collections.emptyList();
		} catch (IOException e) {
			log.warn("search: IOException: ", e);
			return Collections.emptyList();
		} finally {
			long durationMillis = System.currentTimeMillis() - startTimeMillis;
			if (durationMillis > slowQueryLimit) {
				log.warn("search: Slow query: {} ms, query={}, indexUsed={}",
						new Object[] { durationMillis, query,
indexSearcher.getIndexReader().directory() });
			}
			log.debug("search: query took {} ms", durationMillis);
		}
		return docs;
	}

I'm wondering why others are experiencing better performance with 2.9 and why
our implementations performance is going bad. Maybe our way of using the 2.9 api
is not the best and sorting is definetly expensive.

Any ideas are appreciated. I'm torning out my hair since hours and days to find
the root cause. Also hints how I could find the bottlenecks myself are appreciated.

Cheers,
Thomas

-- 
Thomas Becker
Senior JEE Developer

net mobile AG
Zollhof 17
40221 Düsseldorf
GERMANY

Phone:    +49 211 97020-195
Fax:      +49 211 97020-949
Mobile:   +49 173 5146567 (private)
E-Mail:   mailto:thomas.becker@net-m.de
Internet: http://www.net-m.de

Registergericht:  Amtsgericht Düsseldorf, HRB 48022
Vorstand:         Theodor Niehues (Vorsitzender), Frank Hartmann,
                 Kai Markus Kulas, Dieter Plassmann
Vorsitzender des
Aufsichtsrates:   Dr. Michael Briem


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: lucene 2.9.0RC4 slower than 2.4.1?

Posted by Yonik Seeley <yo...@lucidimagination.com>.

OK, I see the issue - SingleFile doesn't have it's own filepointer.
I'll update the original issue.  (for large files, this shouldn't
change the times any).

-Yonik
http://www.lucidimagination.com

On Tue, Sep 15, 2009 at 4:13 PM, Yonik Seeley
<yo...@lucidimagination.com> wrote:
> On Tue, Sep 15, 2009 at 4:12 PM, Yonik Seeley
> <yo...@lucidimagination.com> wrote:
>> Note that when nthreads>1 I sometimes get wrong answers for SimpleFile...
>
> s/SimpleFile/SingleFile/g
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: lucene 2.9.0RC4 slower than 2.4.1?

Posted by Yonik Seeley <yo...@lucidimagination.com>.

On Tue, Sep 15, 2009 at 4:12 PM, Yonik Seeley
<yo...@lucidimagination.com> wrote:
> Note that when nthreads>1 I sometimes get wrong answers for SimpleFile...

s/SimpleFile/SingleFile/g

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: lucene 2.9.0RC4 slower than 2.4.1?

Posted by Yonik Seeley <yo...@lucidimagination.com>.

Note that when nthreads>1 I sometimes get wrong answers for SimpleFile...
hopefully it's just a bug in the test... I'll look into it a little.

-Yonik
http://www.lucidimagination.com



On Tue, Sep 15, 2009 at 4:00 PM, Mark Miller <ma...@gmail.com> wrote:
> I'm jealous of your 4 3.0Ghz to my 2.0Ghz.
>
> I was on dynamic scaling frequency and switched to 2.0Ghz hard.
>
> On ramdisk, my puny 2.0's almost catch you and get a bit over 1800MB/s
> with SeparateFile.
>
> I'm smoked on PooledPread and ChannelPread though. Still sub 500 for
> both, even
> on the ramdisk.
>
> Its an absurd comparison though - everyone knows a jackalope is faster
> than a koala.
>
> - Mark
>
> Yonik Seeley wrote:
>> Here's my results in my quad core phenom, with ondemand CPU freq
>> scaling disabled (clocks locked at 3GHz)
>>
>> Ubuntu 9.04, filesystem=ext4 on 7200RPM IDE drive, testfile=95MB fully cached.
>>
>> Linux odin 2.6.28-15-generic #49-Ubuntu SMP Tue Aug 18 19:25:34 UTC
>> 2009 x86_64 GNU/Linux
>> Java(TM) SE Runtime Environment (build 1.6.0_14-b08)
>> Java HotSpot(TM) 64-Bit Server VM (build 14.0-b16, mixed mode)
>>
>>
>> config: impl=ClassicFile serial=false nThreads=4 iterations=20
>> bufsize=1024 poolsize=2 filelen=95610240
>> answer=1165427971, ms=15610, MB/sec=489.99482383087764
>>
>> config: impl=SeparateFile serial=false nThreads=4 iterations=20
>> bufsize=1024 poolsize=2 filelen=95610240
>> answer=1165427672, ms=4115, MB/sec=1858.7652976913728
>>
>> config: impl=PooledPread serial=false nThreads=4 iterations=20
>> bufsize=1024 poolsize=2 filelen=95610240
>> answer=1165427971, ms=6352, MB/sec=1204.15919395466
>>
>> config: impl=ChannelFile serial=false nThreads=4 iterations=20
>> bufsize=1024 poolsize=2 filelen=95610240
>> answer=1165427971, ms=20347, MB/sec=375.91876935174713
>>
>> config: impl=ChannelPread serial=false nThreads=4 iterations=20
>> bufsize=1024 poolsize=2 filelen=95610240
>> answer=1165427971, ms=5189, MB/sec=1474.0449412218154
>>
>> config: impl=ChannelTransfer serial=false nThreads=4 iterations=20
>> bufsize=1024 poolsize=2 filelen=95610240
>> answer=1165427971, ms=14794, MB/sec=517.021711504664
>>
>> -Yonik
>> http://www.lucidimagination.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: lucene 2.9.0RC4 slower than 2.4.1?

Posted by Mark Miller <ma...@gmail.com>.

Okay - using a smaller file, I get better results. I had about 2+ gig
available to cache the 700mb file, but I probably had fragmentation
issues - I just grabbed the first big file I had.

So its gets a little better for ChannelPread with the smaller file
(approx 160mb vs approx 700mb for the old tests). It goes up a bit over
400MB/sec - but oddly, I lose approx 400MB/sec from my SeparateFile score:


ramdisk results:

config: impl=ChannelPread serial=false nThreads=4 iterations=100
bufsize=1024 poolsize=2 filelen=164956707
answer=-31115427, ms=68940, MB/sec=957.1030287206268

config: impl=SeparateFile serial=false nThreads=4 iterations=100
bufsize=1024 poolsize=2 filelen=164956707
answer=-31115729, ms=45691, MB/sec=1444.106778140115


Mark Miller wrote:
> I'm jealous of your 4 3.0Ghz to my 2.0Ghz.
>
> I was on dynamic scaling frequency and switched to 2.0Ghz hard.
>
> On ramdisk, my puny 2.0's almost catch you and get a bit over 1800MB/s
> with SeparateFile.
>
> I'm smoked on PooledPread and ChannelPread though. Still sub 500 for
> both, even
> on the ramdisk.
>
> Its an absurd comparison though - everyone knows a jackalope is faster
> than a koala.
>
> - Mark
>
> Yonik Seeley wrote:
>   
>> Here's my results in my quad core phenom, with ondemand CPU freq
>> scaling disabled (clocks locked at 3GHz)
>>
>> Ubuntu 9.04, filesystem=ext4 on 7200RPM IDE drive, testfile=95MB fully cached.
>>
>> Linux odin 2.6.28-15-generic #49-Ubuntu SMP Tue Aug 18 19:25:34 UTC
>> 2009 x86_64 GNU/Linux
>> Java(TM) SE Runtime Environment (build 1.6.0_14-b08)
>> Java HotSpot(TM) 64-Bit Server VM (build 14.0-b16, mixed mode)
>>
>>
>> config: impl=ClassicFile serial=false nThreads=4 iterations=20
>> bufsize=1024 poolsize=2 filelen=95610240
>> answer=1165427971, ms=15610, MB/sec=489.99482383087764
>>
>> config: impl=SeparateFile serial=false nThreads=4 iterations=20
>> bufsize=1024 poolsize=2 filelen=95610240
>> answer=1165427672, ms=4115, MB/sec=1858.7652976913728
>>
>> config: impl=PooledPread serial=false nThreads=4 iterations=20
>> bufsize=1024 poolsize=2 filelen=95610240
>> answer=1165427971, ms=6352, MB/sec=1204.15919395466
>>
>> config: impl=ChannelFile serial=false nThreads=4 iterations=20
>> bufsize=1024 poolsize=2 filelen=95610240
>> answer=1165427971, ms=20347, MB/sec=375.91876935174713
>>
>> config: impl=ChannelPread serial=false nThreads=4 iterations=20
>> bufsize=1024 poolsize=2 filelen=95610240
>> answer=1165427971, ms=5189, MB/sec=1474.0449412218154
>>
>> config: impl=ChannelTransfer serial=false nThreads=4 iterations=20
>> bufsize=1024 poolsize=2 filelen=95610240
>> answer=1165427971, ms=14794, MB/sec=517.021711504664
>>
>> -Yonik
>> http://www.lucidimagination.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>   
>>     
>
>
>   


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: lucene 2.9.0RC4 slower than 2.4.1?

Posted by Mark Miller <ma...@gmail.com>.

I'm jealous of your 4 3.0Ghz to my 2.0Ghz.

I was on dynamic scaling frequency and switched to 2.0Ghz hard.

On ramdisk, my puny 2.0's almost catch you and get a bit over 1800MB/s
with SeparateFile.

I'm smoked on PooledPread and ChannelPread though. Still sub 500 for
both, even
on the ramdisk.

Its an absurd comparison though - everyone knows a jackalope is faster
than a koala.

- Mark

Yonik Seeley wrote:
> Here's my results in my quad core phenom, with ondemand CPU freq
> scaling disabled (clocks locked at 3GHz)
>
> Ubuntu 9.04, filesystem=ext4 on 7200RPM IDE drive, testfile=95MB fully cached.
>
> Linux odin 2.6.28-15-generic #49-Ubuntu SMP Tue Aug 18 19:25:34 UTC
> 2009 x86_64 GNU/Linux
> Java(TM) SE Runtime Environment (build 1.6.0_14-b08)
> Java HotSpot(TM) 64-Bit Server VM (build 14.0-b16, mixed mode)
>
>
> config: impl=ClassicFile serial=false nThreads=4 iterations=20
> bufsize=1024 poolsize=2 filelen=95610240
> answer=1165427971, ms=15610, MB/sec=489.99482383087764
>
> config: impl=SeparateFile serial=false nThreads=4 iterations=20
> bufsize=1024 poolsize=2 filelen=95610240
> answer=1165427672, ms=4115, MB/sec=1858.7652976913728
>
> config: impl=PooledPread serial=false nThreads=4 iterations=20
> bufsize=1024 poolsize=2 filelen=95610240
> answer=1165427971, ms=6352, MB/sec=1204.15919395466
>
> config: impl=ChannelFile serial=false nThreads=4 iterations=20
> bufsize=1024 poolsize=2 filelen=95610240
> answer=1165427971, ms=20347, MB/sec=375.91876935174713
>
> config: impl=ChannelPread serial=false nThreads=4 iterations=20
> bufsize=1024 poolsize=2 filelen=95610240
> answer=1165427971, ms=5189, MB/sec=1474.0449412218154
>
> config: impl=ChannelTransfer serial=false nThreads=4 iterations=20
> bufsize=1024 poolsize=2 filelen=95610240
> answer=1165427971, ms=14794, MB/sec=517.021711504664
>
> -Yonik
> http://www.lucidimagination.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>   


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: lucene 2.9.0RC4 slower than 2.4.1?

Posted by Yonik Seeley <yo...@lucidimagination.com>.

Here's my results in my quad core phenom, with ondemand CPU freq
scaling disabled (clocks locked at 3GHz)

Ubuntu 9.04, filesystem=ext4 on 7200RPM IDE drive, testfile=95MB fully cached.

Linux odin 2.6.28-15-generic #49-Ubuntu SMP Tue Aug 18 19:25:34 UTC
2009 x86_64 GNU/Linux
Java(TM) SE Runtime Environment (build 1.6.0_14-b08)
Java HotSpot(TM) 64-Bit Server VM (build 14.0-b16, mixed mode)


config: impl=ClassicFile serial=false nThreads=4 iterations=20
bufsize=1024 poolsize=2 filelen=95610240
answer=1165427971, ms=15610, MB/sec=489.99482383087764

config: impl=SeparateFile serial=false nThreads=4 iterations=20
bufsize=1024 poolsize=2 filelen=95610240
answer=1165427672, ms=4115, MB/sec=1858.7652976913728

config: impl=PooledPread serial=false nThreads=4 iterations=20
bufsize=1024 poolsize=2 filelen=95610240
answer=1165427971, ms=6352, MB/sec=1204.15919395466

config: impl=ChannelFile serial=false nThreads=4 iterations=20
bufsize=1024 poolsize=2 filelen=95610240
answer=1165427971, ms=20347, MB/sec=375.91876935174713

config: impl=ChannelPread serial=false nThreads=4 iterations=20
bufsize=1024 poolsize=2 filelen=95610240
answer=1165427971, ms=5189, MB/sec=1474.0449412218154

config: impl=ChannelTransfer serial=false nThreads=4 iterations=20
bufsize=1024 poolsize=2 filelen=95610240
answer=1165427971, ms=14794, MB/sec=517.021711504664

-Yonik
http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: lucene 2.9.0RC4 slower than 2.4.1?

Posted by Mark Miller <ma...@gmail.com>.

Disturbed reminds me of the owl from sword in the stone ;)

Thats a great one liner - now I am completely disturbed.

Sorry - I've been known to do that -

The two results that I say specifically are from the harddisk - those
are from the harddisk and are ext4. They are a tad slower than the
ramdisk results.

Everything else is the ramdisk - and I wasn't sure if thats ext4 or ext3
(and my mind hasn't wandered towards finding the answer yet). I didn't
specify when creating it.

- Mark

Uwe Schindler wrote:
> Now I am completely disturbed. Which numbers come from which filesystem?
> Ext4 on HDD, tmpfs (which is a filesystem of its own), ext3 on HDD,...
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>   
>> -----Original Message-----
>> From: yseeley@gmail.com [mailto:yseeley@gmail.com] On Behalf Of Yonik
>> Seeley
>> Sent: Tuesday, September 15, 2009 9:33 PM
>> To: java-user@lucene.apache.org
>> Subject: Re: lucene 2.9.0RC4 slower than 2.4.1?
>>
>> Remember to disable CPU frequency scaling when benchmarking... some
>> things with IO cause the freq to drop, and when it's CPU bound again
>> it takes a while for Linux to scale up the freq again.
>>
>> For example, on my ubuntu box, ChannelFile went from 100MB/sec to
>> 388MB/sec.  This effect probably won't be uniform across
>> implementations either.
>>
>> -Yonik
>> http://www.lucidimagination.com
>>
>> On Tue, Sep 15, 2009 at 3:25 PM, Mark Miller <ma...@gmail.com>
>> wrote:
>>     
>>> I just really I hadn't sent this one. Here are results from the
>>>       
>> harddrive:
>>     
>>> It looks like its closer to the same speed on the hardrive once
>>> everything is loaded in the system cache (as you'd expect). SeparateFile
>>> was 1200 vs almost 1700 on RAMDISK. ChannelPread looked a lot closer
>>>       
>> though.
>>     
>>> - Mark
>>>
>>> Tests from harddisk (filesystem cache warmed):
>>>
>>> config: impl=SeparateFile serial=false nThreads=4 iterations=100
>>> bufsize=1024 poolsize=2 filelen=730554368
>>> answer=-282293977, ms=238230, MB/sec=1226.6370616630988
>>>
>>> config: impl=ChannelPread serial=false nThreads=4 iterations=100
>>> bufsize=1024 poolsize=2 filelen=730554368
>>> answer=-282295361, ms=766340, MB/sec=381.3212767179059
>>>
>>>
>>> Mark Miller wrote:
>>>       
>>>> Michael McCandless wrote:
>>>>
>>>>         
>>>>> I don't like that the answer is different... but it's really really
>>>>> odd that it's different-yet-almost-the-same.
>>>>>
>>>>> Mark, were these 4 results on a normal (ext4) filesystem, or tmpfs?
>>>>> (Because the top 2 entries of your 4 results match the first set of 2
>>>>> entries you sent... so I'm thinking these 4 were actually tmpfs not
>>>>> ext4).
>>>>>
>>>>>
>>>>>           
>>>> Those 4 were tmpfs - I mention ext4 at the end because I had just given
>>>> a feel for the hardrive tests and wanted to note it was from ext4 - the
>>>> results are def ramdisk though.
>>>>
>>>>         
>>>>> What JRE/OS, linux, kernel versions, and hardware, are you running on?
>>>>>
>>>>>
>>>>>           
>>>> These are on:
>>>> Ubuntu Karmic Koala 9.10, currently updated
>>>> java-1.5.0-sun-1.5.0.20
>>>> 2.6.31-10-generic
>>>>
>>>> RAM: 3.9 Gig
>>>> 4 core Intel Core2 duo 2.0GHz
>>>>
>>>> Slow 5200 rpm laptop drives.
>>>>
>>>>
>>>>         
>>>>> The gains of SeparateFile over all else are stunning.  And, quite
>>>>> different from the linux tests I had run under LUCENE-753.  Maybe we
>>>>> need to revert FSDir.open to return SimpleFSDir again, on non-Windows
>>>>> hosts.  But then we don't have good concurrency...
>>>>>
>>>>> Mike
>>>>>
>>>>> On Tue, Sep 15, 2009 at 2:59 PM, Yonik Seeley
>>>>> <yo...@lucidimagination.com> wrote:
>>>>>
>>>>>
>>>>>           
>>>>>> It's been a while since I wrote that benchmarker... is it OK that the
>>>>>> answer is different?  Did you use the same test file?
>>>>>>
>>>>>> -Yonik
>>>>>> http://www.lucidimagination.com
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Sep 15, 2009 at 2:18 PM, Mark Miller <ma...@gmail.com>
>>>>>>             
>> wrote:
>>     
>>>>>>             
>>>>>>> The results:
>>>>>>>
>>>>>>> config: impl=SeparateFile serial=false nThreads=4 iterations=100
>>>>>>> bufsize=1024 poolsize=2 filelen=730554368
>>>>>>> answer=-282295611, ms=173550, MB/sec=1683.7899579371938
>>>>>>>
>>>>>>> config: impl=ChannelFile serial=false nThreads=4 iterations=100
>>>>>>> bufsize=1024 poolsize=2 filelen=730554368
>>>>>>> answer=-282295361, ms=1377768, MB/sec=212.09793463050383
>>>>>>>
>>>>>>> config: impl=ChannelPread serial=false nThreads=4 iterations=100
>>>>>>> bufsize=1024 poolsize=2 filelen=730554368
>>>>>>> answer=-282295361, ms=632253, MB/sec=462.19115955163517
>>>>>>>
>>>>>>> config: impl=PooledPread serial=false nThreads=4 iterations=100
>>>>>>> bufsize=1024 poolsize=2 filelen=730554368
>>>>>>> answer=-282295361, ms=774664, MB/sec=377.2238637654518
>>>>>>>
>>>>>>> ClassicFile was heading for the same fate as ChannelFile.
>>>>>>>
>>>>>>>
>>>>>>> I'll have to check what its like on the file system - but it appears
>>>>>>> just ridiculously slower. Even with SeparateFile, All 4 cores are
>>>>>>>               
>> bouncing
>>     
>>>>>>> from 0-12% independently and really favoring the low end of that.
>>>>>>> ChannelPread appears no better.
>>>>>>>
>>>>>>> There are results from other OS's/setups in the JIRA issue.
>>>>>>>
>>>>>>> I'm using ext4.
>>>>>>>
>>>>>>> Uwe Schindler wrote:
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>>>> How does a conventional file system compare?
>>>>>>>>
>>>>>>>> -----
>>>>>>>> Uwe Schindler
>>>>>>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>>>>>>> http://www.thetaphi.de
>>>>>>>> eMail: uwe@thetaphi.de
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Mark Miller [mailto:markrmiller@gmail.com]
>>>>>>>>> Sent: Tuesday, September 15, 2009 7:15 PM
>>>>>>>>> To: java-user@lucene.apache.org
>>>>>>>>> Subject: Re: lucene 2.9.0RC4 slower than 2.4.1?
>>>>>>>>>
>>>>>>>>> Mark Miller wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>>>> Indeed - I just ran the FileReaderTest on a Linux tmpfs ramdisk -
>>>>>>>>>>                     
>> with
>>     
>>>>>>>>>> SeparateFile all 4 of my cores are immediately pinned and remain
>>>>>>>>>>                     
>> so.
>>     
>>>>>>>>>> With ChannelFile, all 4 cores hover 20-30%.
>>>>>>>>>>
>>>>>>>>>> It would appear it may not be a good idea to use NIOFSDirectory
>>>>>>>>>>                     
>> on
>>     
>>>>>>>>>>
>>>>>>>>>>                     
>>>>>>>>> ramdisks.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>>>> Even still though - it looks like you have a further issue - your
>>>>>>>>>>                     
>> Lucene
>>     
>>>>>>>>>> 2.9 old-api results don't use it, and are still not good.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                     
>>>>>>>>> The quick results:
>>>>>>>>>
>>>>>>>>> ramdisk: sudo mount -t tmpfs tmpfs /tmp/space -o
>>>>>>>>> size=1G,nr_inodes=200k,mode=01777
>>>>>>>>>
>>>>>>>>> config: impl=SeparateFile serial=false nThreads=4 iterations=100
>>>>>>>>> bufsize=1024 poolsize=2 filelen=730554368
>>>>>>>>> answer=-282295611, ms=173550, MB/sec=1683.7899579371938
>>>>>>>>>
>>>>>>>>> config: impl=ChannelFile serial=false nThreads=4 iterations=100
>>>>>>>>> bufsize=1024 poolsize=2 filelen=730554368
>>>>>>>>> answer=-282295361, ms=1377768, MB/sec=212.09793463050383
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> - Mark
>>>>>>>>>
>>>>>>>>> http://www.lucidimagination.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ------------------------------------------------------------------
>>>>>>>>>                   
>> ---
>>     
>>>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>> -------------------------------------------------------------------
>>>>>>>>                 
>> --
>>     
>>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>> --
>>>>>>> - Mark
>>>>>>>
>>>>>>> http://www.lucidimagination.com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --------------------------------------------------------------------
>>>>>>>               
>> -
>>     
>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>             
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>
>>>>>
>>>>>           
>>>>
>>>>         
>>> --
>>> - Mark
>>>
>>> http://www.lucidimagination.com
>>>
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>       
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>     
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: lucene 2.9.0RC4 slower than 2.4.1?

Posted by Uwe Schindler <uw...@thetaphi.de>.

Now I am completely disturbed. Which numbers come from which filesystem?
Ext4 on HDD, tmpfs (which is a filesystem of its own), ext3 on HDD,...

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: yseeley@gmail.com [mailto:yseeley@gmail.com] On Behalf Of Yonik
> Seeley
> Sent: Tuesday, September 15, 2009 9:33 PM
> To: java-user@lucene.apache.org
> Subject: Re: lucene 2.9.0RC4 slower than 2.4.1?
> 
> Remember to disable CPU frequency scaling when benchmarking... some
> things with IO cause the freq to drop, and when it's CPU bound again
> it takes a while for Linux to scale up the freq again.
> 
> For example, on my ubuntu box, ChannelFile went from 100MB/sec to
> 388MB/sec.  This effect probably won't be uniform across
> implementations either.
> 
> -Yonik
> http://www.lucidimagination.com
> 
> On Tue, Sep 15, 2009 at 3:25 PM, Mark Miller <ma...@gmail.com>
> wrote:
> > I just really I hadn't sent this one. Here are results from the
> harddrive:
> >
> > It looks like its closer to the same speed on the hardrive once
> > everything is loaded in the system cache (as you'd expect). SeparateFile
> > was 1200 vs almost 1700 on RAMDISK. ChannelPread looked a lot closer
> though.
> >
> >
> > - Mark
> >
> > Tests from harddisk (filesystem cache warmed):
> >
> > config: impl=SeparateFile serial=false nThreads=4 iterations=100
> > bufsize=1024 poolsize=2 filelen=730554368
> > answer=-282293977, ms=238230, MB/sec=1226.6370616630988
> >
> > config: impl=ChannelPread serial=false nThreads=4 iterations=100
> > bufsize=1024 poolsize=2 filelen=730554368
> > answer=-282295361, ms=766340, MB/sec=381.3212767179059
> >
> >
> > Mark Miller wrote:
> >> Michael McCandless wrote:
> >>
> >>> I don't like that the answer is different... but it's really really
> >>> odd that it's different-yet-almost-the-same.
> >>>
> >>> Mark, were these 4 results on a normal (ext4) filesystem, or tmpfs?
> >>> (Because the top 2 entries of your 4 results match the first set of 2
> >>> entries you sent... so I'm thinking these 4 were actually tmpfs not
> >>> ext4).
> >>>
> >>>
> >> Those 4 were tmpfs - I mention ext4 at the end because I had just given
> >> a feel for the hardrive tests and wanted to note it was from ext4 - the
> >> results are def ramdisk though.
> >>
> >>> What JRE/OS, linux, kernel versions, and hardware, are you running on?
> >>>
> >>>
> >> These are on:
> >> Ubuntu Karmic Koala 9.10, currently updated
> >> java-1.5.0-sun-1.5.0.20
> >> 2.6.31-10-generic
> >>
> >> RAM: 3.9 Gig
> >> 4 core Intel Core2 duo 2.0GHz
> >>
> >> Slow 5200 rpm laptop drives.
> >>
> >>
> >>> The gains of SeparateFile over all else are stunning.  And, quite
> >>> different from the linux tests I had run under LUCENE-753.  Maybe we
> >>> need to revert FSDir.open to return SimpleFSDir again, on non-Windows
> >>> hosts.  But then we don't have good concurrency...
> >>>
> >>> Mike
> >>>
> >>> On Tue, Sep 15, 2009 at 2:59 PM, Yonik Seeley
> >>> <yo...@lucidimagination.com> wrote:
> >>>
> >>>
> >>>> It's been a while since I wrote that benchmarker... is it OK that the
> >>>> answer is different?  Did you use the same test file?
> >>>>
> >>>> -Yonik
> >>>> http://www.lucidimagination.com
> >>>>
> >>>>
> >>>>
> >>>> On Tue, Sep 15, 2009 at 2:18 PM, Mark Miller <ma...@gmail.com>
> wrote:
> >>>>
> >>>>
> >>>>> The results:
> >>>>>
> >>>>> config: impl=SeparateFile serial=false nThreads=4 iterations=100
> >>>>> bufsize=1024 poolsize=2 filelen=730554368
> >>>>> answer=-282295611, ms=173550, MB/sec=1683.7899579371938
> >>>>>
> >>>>> config: impl=ChannelFile serial=false nThreads=4 iterations=100
> >>>>> bufsize=1024 poolsize=2 filelen=730554368
> >>>>> answer=-282295361, ms=1377768, MB/sec=212.09793463050383
> >>>>>
> >>>>> config: impl=ChannelPread serial=false nThreads=4 iterations=100
> >>>>> bufsize=1024 poolsize=2 filelen=730554368
> >>>>> answer=-282295361, ms=632253, MB/sec=462.19115955163517
> >>>>>
> >>>>> config: impl=PooledPread serial=false nThreads=4 iterations=100
> >>>>> bufsize=1024 poolsize=2 filelen=730554368
> >>>>> answer=-282295361, ms=774664, MB/sec=377.2238637654518
> >>>>>
> >>>>> ClassicFile was heading for the same fate as ChannelFile.
> >>>>>
> >>>>>
> >>>>> I'll have to check what its like on the file system - but it appears
> >>>>> just ridiculously slower. Even with SeparateFile, All 4 cores are
> bouncing
> >>>>> from 0-12% independently and really favoring the low end of that.
> >>>>> ChannelPread appears no better.
> >>>>>
> >>>>> There are results from other OS's/setups in the JIRA issue.
> >>>>>
> >>>>> I'm using ext4.
> >>>>>
> >>>>> Uwe Schindler wrote:
> >>>>>
> >>>>>
> >>>>>> How does a conventional file system compare?
> >>>>>>
> >>>>>> -----
> >>>>>> Uwe Schindler
> >>>>>> H.-H.-Meier-Allee 63, D-28213 Bremen
> >>>>>> http://www.thetaphi.de
> >>>>>> eMail: uwe@thetaphi.de
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: Mark Miller [mailto:markrmiller@gmail.com]
> >>>>>>> Sent: Tuesday, September 15, 2009 7:15 PM
> >>>>>>> To: java-user@lucene.apache.org
> >>>>>>> Subject: Re: lucene 2.9.0RC4 slower than 2.4.1?
> >>>>>>>
> >>>>>>> Mark Miller wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>> Indeed - I just ran the FileReaderTest on a Linux tmpfs ramdisk -
> with
> >>>>>>>> SeparateFile all 4 of my cores are immediately pinned and remain
> so.
> >>>>>>>> With ChannelFile, all 4 cores hover 20-30%.
> >>>>>>>>
> >>>>>>>> It would appear it may not be a good idea to use NIOFSDirectory
> on
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>> ramdisks.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>> Even still though - it looks like you have a further issue - your
> Lucene
> >>>>>>>> 2.9 old-api results don't use it, and are still not good.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>> The quick results:
> >>>>>>>
> >>>>>>> ramdisk: sudo mount -t tmpfs tmpfs /tmp/space -o
> >>>>>>> size=1G,nr_inodes=200k,mode=01777
> >>>>>>>
> >>>>>>> config: impl=SeparateFile serial=false nThreads=4 iterations=100
> >>>>>>> bufsize=1024 poolsize=2 filelen=730554368
> >>>>>>> answer=-282295611, ms=173550, MB/sec=1683.7899579371938
> >>>>>>>
> >>>>>>> config: impl=ChannelFile serial=false nThreads=4 iterations=100
> >>>>>>> bufsize=1024 poolsize=2 filelen=730554368
> >>>>>>> answer=-282295361, ms=1377768, MB/sec=212.09793463050383
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> - Mark
> >>>>>>>
> >>>>>>> http://www.lucidimagination.com
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> ------------------------------------------------------------------
> ---
> >>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>> -------------------------------------------------------------------
> --
> >>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>> --
> >>>>> - Mark
> >>>>>
> >>>>> http://www.lucidimagination.com
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --------------------------------------------------------------------
> -
> >>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>>
> >>>>
> >>>>
> >>>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>
> >>>
> >>>
> >>
> >>
> >>
> >
> >
> > --
> > - Mark
> >
> > http://www.lucidimagination.com
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: lucene 2.9.0RC4 slower than 2.4.1?

Posted by Yonik Seeley <yo...@lucidimagination.com>.

Remember to disable CPU frequency scaling when benchmarking... some
things with IO cause the freq to drop, and when it's CPU bound again
it takes a while for Linux to scale up the freq again.

For example, on my ubuntu box, ChannelFile went from 100MB/sec to
388MB/sec.  This effect probably won't be uniform across
implementations either.

-Yonik
http://www.lucidimagination.com

On Tue, Sep 15, 2009 at 3:25 PM, Mark Miller <ma...@gmail.com> wrote:
> I just really I hadn't sent this one. Here are results from the harddrive:
>
> It looks like its closer to the same speed on the hardrive once
> everything is loaded in the system cache (as you'd expect). SeparateFile
> was 1200 vs almost 1700 on RAMDISK. ChannelPread looked a lot closer though.
>
>
> - Mark
>
> Tests from harddisk (filesystem cache warmed):
>
> config: impl=SeparateFile serial=false nThreads=4 iterations=100
> bufsize=1024 poolsize=2 filelen=730554368
> answer=-282293977, ms=238230, MB/sec=1226.6370616630988
>
> config: impl=ChannelPread serial=false nThreads=4 iterations=100
> bufsize=1024 poolsize=2 filelen=730554368
> answer=-282295361, ms=766340, MB/sec=381.3212767179059
>
>
> Mark Miller wrote:
>> Michael McCandless wrote:
>>
>>> I don't like that the answer is different... but it's really really
>>> odd that it's different-yet-almost-the-same.
>>>
>>> Mark, were these 4 results on a normal (ext4) filesystem, or tmpfs?
>>> (Because the top 2 entries of your 4 results match the first set of 2
>>> entries you sent... so I'm thinking these 4 were actually tmpfs not
>>> ext4).
>>>
>>>
>> Those 4 were tmpfs - I mention ext4 at the end because I had just given
>> a feel for the hardrive tests and wanted to note it was from ext4 - the
>> results are def ramdisk though.
>>
>>> What JRE/OS, linux, kernel versions, and hardware, are you running on?
>>>
>>>
>> These are on:
>> Ubuntu Karmic Koala 9.10, currently updated
>> java-1.5.0-sun-1.5.0.20
>> 2.6.31-10-generic
>>
>> RAM: 3.9 Gig
>> 4 core Intel Core2 duo 2.0GHz
>>
>> Slow 5200 rpm laptop drives.
>>
>>
>>> The gains of SeparateFile over all else are stunning.  And, quite
>>> different from the linux tests I had run under LUCENE-753.  Maybe we
>>> need to revert FSDir.open to return SimpleFSDir again, on non-Windows
>>> hosts.  But then we don't have good concurrency...
>>>
>>> Mike
>>>
>>> On Tue, Sep 15, 2009 at 2:59 PM, Yonik Seeley
>>> <yo...@lucidimagination.com> wrote:
>>>
>>>
>>>> It's been a while since I wrote that benchmarker... is it OK that the
>>>> answer is different?  Did you use the same test file?
>>>>
>>>> -Yonik
>>>> http://www.lucidimagination.com
>>>>
>>>>
>>>>
>>>> On Tue, Sep 15, 2009 at 2:18 PM, Mark Miller <ma...@gmail.com> wrote:
>>>>
>>>>
>>>>> The results:
>>>>>
>>>>> config: impl=SeparateFile serial=false nThreads=4 iterations=100
>>>>> bufsize=1024 poolsize=2 filelen=730554368
>>>>> answer=-282295611, ms=173550, MB/sec=1683.7899579371938
>>>>>
>>>>> config: impl=ChannelFile serial=false nThreads=4 iterations=100
>>>>> bufsize=1024 poolsize=2 filelen=730554368
>>>>> answer=-282295361, ms=1377768, MB/sec=212.09793463050383
>>>>>
>>>>> config: impl=ChannelPread serial=false nThreads=4 iterations=100
>>>>> bufsize=1024 poolsize=2 filelen=730554368
>>>>> answer=-282295361, ms=632253, MB/sec=462.19115955163517
>>>>>
>>>>> config: impl=PooledPread serial=false nThreads=4 iterations=100
>>>>> bufsize=1024 poolsize=2 filelen=730554368
>>>>> answer=-282295361, ms=774664, MB/sec=377.2238637654518
>>>>>
>>>>> ClassicFile was heading for the same fate as ChannelFile.
>>>>>
>>>>>
>>>>> I'll have to check what its like on the file system - but it appears
>>>>> just ridiculously slower. Even with SeparateFile, All 4 cores are bouncing
>>>>> from 0-12% independently and really favoring the low end of that.
>>>>> ChannelPread appears no better.
>>>>>
>>>>> There are results from other OS's/setups in the JIRA issue.
>>>>>
>>>>> I'm using ext4.
>>>>>
>>>>> Uwe Schindler wrote:
>>>>>
>>>>>
>>>>>> How does a conventional file system compare?
>>>>>>
>>>>>> -----
>>>>>> Uwe Schindler
>>>>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>>>>> http://www.thetaphi.de
>>>>>> eMail: uwe@thetaphi.de
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Mark Miller [mailto:markrmiller@gmail.com]
>>>>>>> Sent: Tuesday, September 15, 2009 7:15 PM
>>>>>>> To: java-user@lucene.apache.org
>>>>>>> Subject: Re: lucene 2.9.0RC4 slower than 2.4.1?
>>>>>>>
>>>>>>> Mark Miller wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Indeed - I just ran the FileReaderTest on a Linux tmpfs ramdisk - with
>>>>>>>> SeparateFile all 4 of my cores are immediately pinned and remain so.
>>>>>>>> With ChannelFile, all 4 cores hover 20-30%.
>>>>>>>>
>>>>>>>> It would appear it may not be a good idea to use NIOFSDirectory on
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> ramdisks.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Even still though - it looks like you have a further issue - your Lucene
>>>>>>>> 2.9 old-api results don't use it, and are still not good.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> The quick results:
>>>>>>>
>>>>>>> ramdisk: sudo mount -t tmpfs tmpfs /tmp/space -o
>>>>>>> size=1G,nr_inodes=200k,mode=01777
>>>>>>>
>>>>>>> config: impl=SeparateFile serial=false nThreads=4 iterations=100
>>>>>>> bufsize=1024 poolsize=2 filelen=730554368
>>>>>>> answer=-282295611, ms=173550, MB/sec=1683.7899579371938
>>>>>>>
>>>>>>> config: impl=ChannelFile serial=false nThreads=4 iterations=100
>>>>>>> bufsize=1024 poolsize=2 filelen=730554368
>>>>>>> answer=-282295361, ms=1377768, MB/sec=212.09793463050383
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> - Mark
>>>>>>>
>>>>>>> http://www.lucidimagination.com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>> --
>>>>> - Mark
>>>>>
>>>>> http://www.lucidimagination.com
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>
>>>>>
>>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>>
>>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>
>>
>>
>>
>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: lucene 2.9.0RC4 slower than 2.4.1?

Posted by Mark Miller <ma...@gmail.com>.

I just really I hadn't sent this one. Here are results from the harddrive:

It looks like its closer to the same speed on the hardrive once
everything is loaded in the system cache (as you'd expect). SeparateFile
was 1200 vs almost 1700 on RAMDISK. ChannelPread looked a lot closer though.


- Mark

Tests from harddisk (filesystem cache warmed):

config: impl=SeparateFile serial=false nThreads=4 iterations=100
bufsize=1024 poolsize=2 filelen=730554368
answer=-282293977, ms=238230, MB/sec=1226.6370616630988

config: impl=ChannelPread serial=false nThreads=4 iterations=100
bufsize=1024 poolsize=2 filelen=730554368
answer=-282295361, ms=766340, MB/sec=381.3212767179059


Mark Miller wrote:
> Michael McCandless wrote:
>   
>> I don't like that the answer is different... but it's really really
>> odd that it's different-yet-almost-the-same.
>>
>> Mark, were these 4 results on a normal (ext4) filesystem, or tmpfs?
>> (Because the top 2 entries of your 4 results match the first set of 2
>> entries you sent... so I'm thinking these 4 were actually tmpfs not
>> ext4).
>>   
>>     
> Those 4 were tmpfs - I mention ext4 at the end because I had just given
> a feel for the hardrive tests and wanted to note it was from ext4 - the
> results are def ramdisk though.
>   
>> What JRE/OS, linux, kernel versions, and hardware, are you running on?
>>   
>>     
> These are on:
> Ubuntu Karmic Koala 9.10, currently updated
> java-1.5.0-sun-1.5.0.20
> 2.6.31-10-generic
>
> RAM: 3.9 Gig
> 4 core Intel Core2 duo 2.0GHz
>
> Slow 5200 rpm laptop drives.
>
>   
>> The gains of SeparateFile over all else are stunning.  And, quite
>> different from the linux tests I had run under LUCENE-753.  Maybe we
>> need to revert FSDir.open to return SimpleFSDir again, on non-Windows
>> hosts.  But then we don't have good concurrency...
>>
>> Mike
>>
>> On Tue, Sep 15, 2009 at 2:59 PM, Yonik Seeley
>> <yo...@lucidimagination.com> wrote:
>>   
>>     
>>> It's been a while since I wrote that benchmarker... is it OK that the
>>> answer is different?  Did you use the same test file?
>>>
>>> -Yonik
>>> http://www.lucidimagination.com
>>>
>>>
>>>
>>> On Tue, Sep 15, 2009 at 2:18 PM, Mark Miller <ma...@gmail.com> wrote:
>>>     
>>>       
>>>> The results:
>>>>
>>>> config: impl=SeparateFile serial=false nThreads=4 iterations=100
>>>> bufsize=1024 poolsize=2 filelen=730554368
>>>> answer=-282295611, ms=173550, MB/sec=1683.7899579371938
>>>>
>>>> config: impl=ChannelFile serial=false nThreads=4 iterations=100
>>>> bufsize=1024 poolsize=2 filelen=730554368
>>>> answer=-282295361, ms=1377768, MB/sec=212.09793463050383
>>>>
>>>> config: impl=ChannelPread serial=false nThreads=4 iterations=100
>>>> bufsize=1024 poolsize=2 filelen=730554368
>>>> answer=-282295361, ms=632253, MB/sec=462.19115955163517
>>>>
>>>> config: impl=PooledPread serial=false nThreads=4 iterations=100
>>>> bufsize=1024 poolsize=2 filelen=730554368
>>>> answer=-282295361, ms=774664, MB/sec=377.2238637654518
>>>>
>>>> ClassicFile was heading for the same fate as ChannelFile.
>>>>
>>>>
>>>> I'll have to check what its like on the file system - but it appears
>>>> just ridiculously slower. Even with SeparateFile, All 4 cores are bouncing
>>>> from 0-12% independently and really favoring the low end of that.
>>>> ChannelPread appears no better.
>>>>
>>>> There are results from other OS's/setups in the JIRA issue.
>>>>
>>>> I'm using ext4.
>>>>
>>>> Uwe Schindler wrote:
>>>>       
>>>>         
>>>>> How does a conventional file system compare?
>>>>>
>>>>> -----
>>>>> Uwe Schindler
>>>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>>>> http://www.thetaphi.de
>>>>> eMail: uwe@thetaphi.de
>>>>>
>>>>>
>>>>>         
>>>>>           
>>>>>> -----Original Message-----
>>>>>> From: Mark Miller [mailto:markrmiller@gmail.com]
>>>>>> Sent: Tuesday, September 15, 2009 7:15 PM
>>>>>> To: java-user@lucene.apache.org
>>>>>> Subject: Re: lucene 2.9.0RC4 slower than 2.4.1?
>>>>>>
>>>>>> Mark Miller wrote:
>>>>>>
>>>>>>           
>>>>>>             
>>>>>>> Indeed - I just ran the FileReaderTest on a Linux tmpfs ramdisk - with
>>>>>>> SeparateFile all 4 of my cores are immediately pinned and remain so.
>>>>>>> With ChannelFile, all 4 cores hover 20-30%.
>>>>>>>
>>>>>>> It would appear it may not be a good idea to use NIOFSDirectory on
>>>>>>>
>>>>>>>             
>>>>>>>               
>>>>>> ramdisks.
>>>>>>
>>>>>>           
>>>>>>             
>>>>>>> Even still though - it looks like you have a further issue - your Lucene
>>>>>>> 2.9 old-api results don't use it, and are still not good.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>             
>>>>>>>               
>>>>>> The quick results:
>>>>>>
>>>>>> ramdisk: sudo mount -t tmpfs tmpfs /tmp/space -o
>>>>>> size=1G,nr_inodes=200k,mode=01777
>>>>>>
>>>>>> config: impl=SeparateFile serial=false nThreads=4 iterations=100
>>>>>> bufsize=1024 poolsize=2 filelen=730554368
>>>>>> answer=-282295611, ms=173550, MB/sec=1683.7899579371938
>>>>>>
>>>>>> config: impl=ChannelFile serial=false nThreads=4 iterations=100
>>>>>> bufsize=1024 poolsize=2 filelen=730554368
>>>>>> answer=-282295361, ms=1377768, MB/sec=212.09793463050383
>>>>>>
>>>>>>
>>>>>> --
>>>>>> - Mark
>>>>>>
>>>>>> http://www.lucidimagination.com
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>>           
>>>>>>             
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>
>>>>>         
>>>>>           
>>>> --
>>>> - Mark
>>>>
>>>> http://www.lucidimagination.com
>>>>
>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>>       
>>>>         
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>     
>>>       
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>   
>>     
>
>
>   


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: lucene 2.9.0RC4 slower than 2.4.1?

Posted by Mark Miller <ma...@gmail.com>.

Michael McCandless wrote:
> I don't like that the answer is different... but it's really really
> odd that it's different-yet-almost-the-same.
>
> Mark, were these 4 results on a normal (ext4) filesystem, or tmpfs?
> (Because the top 2 entries of your 4 results match the first set of 2
> entries you sent... so I'm thinking these 4 were actually tmpfs not
> ext4).
>   
Those 4 were tmpfs - I mention ext4 at the end because I had just given
a feel for the hardrive tests and wanted to note it was from ext4 - the
results are def ramdisk though.
> What JRE/OS, linux, kernel versions, and hardware, are you running on?
>   
These are on:
Ubuntu Karmic Koala 9.10, currently updated
java-1.5.0-sun-1.5.0.20
2.6.31-10-generic

RAM: 3.9 Gig
4 core Intel Core2 duo 2.0GHz

Slow 5200 rpm laptop drives.

> The gains of SeparateFile over all else are stunning.  And, quite
> different from the linux tests I had run under LUCENE-753.  Maybe we
> need to revert FSDir.open to return SimpleFSDir again, on non-Windows
> hosts.  But then we don't have good concurrency...
>
> Mike
>
> On Tue, Sep 15, 2009 at 2:59 PM, Yonik Seeley
> <yo...@lucidimagination.com> wrote:
>   
>> It's been a while since I wrote that benchmarker... is it OK that the
>> answer is different?  Did you use the same test file?
>>
>> -Yonik
>> http://www.lucidimagination.com
>>
>>
>>
>> On Tue, Sep 15, 2009 at 2:18 PM, Mark Miller <ma...@gmail.com> wrote:
>>     
>>> The results:
>>>
>>> config: impl=SeparateFile serial=false nThreads=4 iterations=100
>>> bufsize=1024 poolsize=2 filelen=730554368
>>> answer=-282295611, ms=173550, MB/sec=1683.7899579371938
>>>
>>> config: impl=ChannelFile serial=false nThreads=4 iterations=100
>>> bufsize=1024 poolsize=2 filelen=730554368
>>> answer=-282295361, ms=1377768, MB/sec=212.09793463050383
>>>
>>> config: impl=ChannelPread serial=false nThreads=4 iterations=100
>>> bufsize=1024 poolsize=2 filelen=730554368
>>> answer=-282295361, ms=632253, MB/sec=462.19115955163517
>>>
>>> config: impl=PooledPread serial=false nThreads=4 iterations=100
>>> bufsize=1024 poolsize=2 filelen=730554368
>>> answer=-282295361, ms=774664, MB/sec=377.2238637654518
>>>
>>> ClassicFile was heading for the same fate as ChannelFile.
>>>
>>>
>>> I'll have to check what its like on the file system - but it appears
>>> just ridiculously slower. Even with SeparateFile, All 4 cores are bouncing
>>> from 0-12% independently and really favoring the low end of that.
>>> ChannelPread appears no better.
>>>
>>> There are results from other OS's/setups in the JIRA issue.
>>>
>>> I'm using ext4.
>>>
>>> Uwe Schindler wrote:
>>>       
>>>> How does a conventional file system compare?
>>>>
>>>> -----
>>>> Uwe Schindler
>>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>>> http://www.thetaphi.de
>>>> eMail: uwe@thetaphi.de
>>>>
>>>>
>>>>         
>>>>> -----Original Message-----
>>>>> From: Mark Miller [mailto:markrmiller@gmail.com]
>>>>> Sent: Tuesday, September 15, 2009 7:15 PM
>>>>> To: java-user@lucene.apache.org
>>>>> Subject: Re: lucene 2.9.0RC4 slower than 2.4.1?
>>>>>
>>>>> Mark Miller wrote:
>>>>>
>>>>>           
>>>>>> Indeed - I just ran the FileReaderTest on a Linux tmpfs ramdisk - with
>>>>>> SeparateFile all 4 of my cores are immediately pinned and remain so.
>>>>>> With ChannelFile, all 4 cores hover 20-30%.
>>>>>>
>>>>>> It would appear it may not be a good idea to use NIOFSDirectory on
>>>>>>
>>>>>>             
>>>>> ramdisks.
>>>>>
>>>>>           
>>>>>> Even still though - it looks like you have a further issue - your Lucene
>>>>>> 2.9 old-api results don't use it, and are still not good.
>>>>>>
>>>>>>
>>>>>>
>>>>>>             
>>>>> The quick results:
>>>>>
>>>>> ramdisk: sudo mount -t tmpfs tmpfs /tmp/space -o
>>>>> size=1G,nr_inodes=200k,mode=01777
>>>>>
>>>>> config: impl=SeparateFile serial=false nThreads=4 iterations=100
>>>>> bufsize=1024 poolsize=2 filelen=730554368
>>>>> answer=-282295611, ms=173550, MB/sec=1683.7899579371938
>>>>>
>>>>> config: impl=ChannelFile serial=false nThreads=4 iterations=100
>>>>> bufsize=1024 poolsize=2 filelen=730554368
>>>>> answer=-282295361, ms=1377768, MB/sec=212.09793463050383
>>>>>
>>>>>
>>>>> --
>>>>> - Mark
>>>>>
>>>>> http://www.lucidimagination.com
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>           
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>>         
>>> --
>>> - Mark
>>>
>>> http://www.lucidimagination.com
>>>
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>       
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>     
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>   


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: lucene 2.9.0RC4 slower than 2.4.1?

Posted by Michael McCandless <lu...@mikemccandless.com>.

I don't like that the answer is different... but it's really really
odd that it's different-yet-almost-the-same.

Mark, were these 4 results on a normal (ext4) filesystem, or tmpfs?
(Because the top 2 entries of your 4 results match the first set of 2
entries you sent... so I'm thinking these 4 were actually tmpfs not
ext4).

What JRE/OS, linux, kernel versions, and hardware, are you running on?

The gains of SeparateFile over all else are stunning.  And, quite
different from the linux tests I had run under LUCENE-753.  Maybe we
need to revert FSDir.open to return SimpleFSDir again, on non-Windows
hosts.  But then we don't have good concurrency...

Mike

On Tue, Sep 15, 2009 at 2:59 PM, Yonik Seeley
<yo...@lucidimagination.com> wrote:
> It's been a while since I wrote that benchmarker... is it OK that the
> answer is different?  Did you use the same test file?
>
> -Yonik
> http://www.lucidimagination.com
>
>
>
> On Tue, Sep 15, 2009 at 2:18 PM, Mark Miller <ma...@gmail.com> wrote:
>> The results:
>>
>> config: impl=SeparateFile serial=false nThreads=4 iterations=100
>> bufsize=1024 poolsize=2 filelen=730554368
>> answer=-282295611, ms=173550, MB/sec=1683.7899579371938
>>
>> config: impl=ChannelFile serial=false nThreads=4 iterations=100
>> bufsize=1024 poolsize=2 filelen=730554368
>> answer=-282295361, ms=1377768, MB/sec=212.09793463050383
>>
>> config: impl=ChannelPread serial=false nThreads=4 iterations=100
>> bufsize=1024 poolsize=2 filelen=730554368
>> answer=-282295361, ms=632253, MB/sec=462.19115955163517
>>
>> config: impl=PooledPread serial=false nThreads=4 iterations=100
>> bufsize=1024 poolsize=2 filelen=730554368
>> answer=-282295361, ms=774664, MB/sec=377.2238637654518
>>
>> ClassicFile was heading for the same fate as ChannelFile.
>>
>>
>> I'll have to check what its like on the file system - but it appears
>> just ridiculously slower. Even with SeparateFile, All 4 cores are bouncing
>> from 0-12% independently and really favoring the low end of that.
>> ChannelPread appears no better.
>>
>> There are results from other OS's/setups in the JIRA issue.
>>
>> I'm using ext4.
>>
>> Uwe Schindler wrote:
>>> How does a conventional file system compare?
>>>
>>> -----
>>> Uwe Schindler
>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>> http://www.thetaphi.de
>>> eMail: uwe@thetaphi.de
>>>
>>>
>>>> -----Original Message-----
>>>> From: Mark Miller [mailto:markrmiller@gmail.com]
>>>> Sent: Tuesday, September 15, 2009 7:15 PM
>>>> To: java-user@lucene.apache.org
>>>> Subject: Re: lucene 2.9.0RC4 slower than 2.4.1?
>>>>
>>>> Mark Miller wrote:
>>>>
>>>>> Indeed - I just ran the FileReaderTest on a Linux tmpfs ramdisk - with
>>>>> SeparateFile all 4 of my cores are immediately pinned and remain so.
>>>>> With ChannelFile, all 4 cores hover 20-30%.
>>>>>
>>>>> It would appear it may not be a good idea to use NIOFSDirectory on
>>>>>
>>>> ramdisks.
>>>>
>>>>> Even still though - it looks like you have a further issue - your Lucene
>>>>> 2.9 old-api results don't use it, and are still not good.
>>>>>
>>>>>
>>>>>
>>>> The quick results:
>>>>
>>>> ramdisk: sudo mount -t tmpfs tmpfs /tmp/space -o
>>>> size=1G,nr_inodes=200k,mode=01777
>>>>
>>>> config: impl=SeparateFile serial=false nThreads=4 iterations=100
>>>> bufsize=1024 poolsize=2 filelen=730554368
>>>> answer=-282295611, ms=173550, MB/sec=1683.7899579371938
>>>>
>>>> config: impl=ChannelFile serial=false nThreads=4 iterations=100
>>>> bufsize=1024 poolsize=2 filelen=730554368
>>>> answer=-282295361, ms=1377768, MB/sec=212.09793463050383
>>>>
>>>>
>>>> --
>>>> - Mark
>>>>
>>>> http://www.lucidimagination.com
>>>>
>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>>
>> --
>> - Mark
>>
>> http://www.lucidimagination.com
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: lucene 2.9.0RC4 slower than 2.4.1?

Posted by Mark Miller <ma...@gmail.com>.

Its the same test file - everything the same except one file is on local
ext4 hd and the copy is on ramdisk.

I havn't yet looked into what the answer corresponds to. I wonder if the
RAM disk is getting made as ext3?

note:also I give the JVM RAM a bit larger than the file size, and the OS
has plenty of RAM beyond the file size.

Here are two runs on the hardrive/cached side (not ramdisk). Again, a
bit slower than the RAM disk:

config: impl=ChannelPread serial=false nThreads=4 iterations=100
bufsize=1024 poolsize=2 filelen=730554368
answer=-282295361, ms=766340, MB/sec=381.3212767179059


config: impl=ClassicFile serial=false nThreads=4 iterations=100
bufsize=1024 poolsize=2 filelen=730554368
answer=-282295361, ms=852010, MB/sec=342.9792457835002



Yonik Seeley wrote:
> It's been a while since I wrote that benchmarker... is it OK that the
> answer is different?  Did you use the same test file?
>
> -Yonik
> http://www.lucidimagination.com
>
>
>
> On Tue, Sep 15, 2009 at 2:18 PM, Mark Miller <ma...@gmail.com> wrote:
>   
>> The results:
>>
>> config: impl=SeparateFile serial=false nThreads=4 iterations=100
>> bufsize=1024 poolsize=2 filelen=730554368
>> answer=-282295611, ms=173550, MB/sec=1683.7899579371938
>>
>> config: impl=ChannelFile serial=false nThreads=4 iterations=100
>> bufsize=1024 poolsize=2 filelen=730554368
>> answer=-282295361, ms=1377768, MB/sec=212.09793463050383
>>
>> config: impl=ChannelPread serial=false nThreads=4 iterations=100
>> bufsize=1024 poolsize=2 filelen=730554368
>> answer=-282295361, ms=632253, MB/sec=462.19115955163517
>>
>> config: impl=PooledPread serial=false nThreads=4 iterations=100
>> bufsize=1024 poolsize=2 filelen=730554368
>> answer=-282295361, ms=774664, MB/sec=377.2238637654518
>>
>> ClassicFile was heading for the same fate as ChannelFile.
>>
>>
>> I'll have to check what its like on the file system - but it appears
>> just ridiculously slower. Even with SeparateFile, All 4 cores are bouncing
>> from 0-12% independently and really favoring the low end of that.
>> ChannelPread appears no better.
>>
>> There are results from other OS's/setups in the JIRA issue.
>>
>> I'm using ext4.
>>
>> Uwe Schindler wrote:
>>     
>>> How does a conventional file system compare?
>>>
>>> -----
>>> Uwe Schindler
>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>> http://www.thetaphi.de
>>> eMail: uwe@thetaphi.de
>>>
>>>
>>>       
>>>> -----Original Message-----
>>>> From: Mark Miller [mailto:markrmiller@gmail.com]
>>>> Sent: Tuesday, September 15, 2009 7:15 PM
>>>> To: java-user@lucene.apache.org
>>>> Subject: Re: lucene 2.9.0RC4 slower than 2.4.1?
>>>>
>>>> Mark Miller wrote:
>>>>
>>>>         
>>>>> Indeed - I just ran the FileReaderTest on a Linux tmpfs ramdisk - with
>>>>> SeparateFile all 4 of my cores are immediately pinned and remain so.
>>>>> With ChannelFile, all 4 cores hover 20-30%.
>>>>>
>>>>> It would appear it may not be a good idea to use NIOFSDirectory on
>>>>>
>>>>>           
>>>> ramdisks.
>>>>
>>>>         
>>>>> Even still though - it looks like you have a further issue - your Lucene
>>>>> 2.9 old-api results don't use it, and are still not good.
>>>>>
>>>>>
>>>>>
>>>>>           
>>>> The quick results:
>>>>
>>>> ramdisk: sudo mount -t tmpfs tmpfs /tmp/space -o
>>>> size=1G,nr_inodes=200k,mode=01777
>>>>
>>>> config: impl=SeparateFile serial=false nThreads=4 iterations=100
>>>> bufsize=1024 poolsize=2 filelen=730554368
>>>> answer=-282295611, ms=173550, MB/sec=1683.7899579371938
>>>>
>>>> config: impl=ChannelFile serial=false nThreads=4 iterations=100
>>>> bufsize=1024 poolsize=2 filelen=730554368
>>>> answer=-282295361, ms=1377768, MB/sec=212.09793463050383
>>>>
>>>>
>>>> --
>>>> - Mark
>>>>
>>>> http://www.lucidimagination.com
>>>>
>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>         
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>       
>> --
>> - Mark
>>
>> http://www.lucidimagination.com
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>     
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>   


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: lucene 2.9.0RC4 slower than 2.4.1?

Posted by Yonik Seeley <yo...@lucidimagination.com>.

It's been a while since I wrote that benchmarker... is it OK that the
answer is different?  Did you use the same test file?

-Yonik
http://www.lucidimagination.com



On Tue, Sep 15, 2009 at 2:18 PM, Mark Miller <ma...@gmail.com> wrote:
> The results:
>
> config: impl=SeparateFile serial=false nThreads=4 iterations=100
> bufsize=1024 poolsize=2 filelen=730554368
> answer=-282295611, ms=173550, MB/sec=1683.7899579371938
>
> config: impl=ChannelFile serial=false nThreads=4 iterations=100
> bufsize=1024 poolsize=2 filelen=730554368
> answer=-282295361, ms=1377768, MB/sec=212.09793463050383
>
> config: impl=ChannelPread serial=false nThreads=4 iterations=100
> bufsize=1024 poolsize=2 filelen=730554368
> answer=-282295361, ms=632253, MB/sec=462.19115955163517
>
> config: impl=PooledPread serial=false nThreads=4 iterations=100
> bufsize=1024 poolsize=2 filelen=730554368
> answer=-282295361, ms=774664, MB/sec=377.2238637654518
>
> ClassicFile was heading for the same fate as ChannelFile.
>
>
> I'll have to check what its like on the file system - but it appears
> just ridiculously slower. Even with SeparateFile, All 4 cores are bouncing
> from 0-12% independently and really favoring the low end of that.
> ChannelPread appears no better.
>
> There are results from other OS's/setups in the JIRA issue.
>
> I'm using ext4.
>
> Uwe Schindler wrote:
>> How does a conventional file system compare?
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: uwe@thetaphi.de
>>
>>
>>> -----Original Message-----
>>> From: Mark Miller [mailto:markrmiller@gmail.com]
>>> Sent: Tuesday, September 15, 2009 7:15 PM
>>> To: java-user@lucene.apache.org
>>> Subject: Re: lucene 2.9.0RC4 slower than 2.4.1?
>>>
>>> Mark Miller wrote:
>>>
>>>> Indeed - I just ran the FileReaderTest on a Linux tmpfs ramdisk - with
>>>> SeparateFile all 4 of my cores are immediately pinned and remain so.
>>>> With ChannelFile, all 4 cores hover 20-30%.
>>>>
>>>> It would appear it may not be a good idea to use NIOFSDirectory on
>>>>
>>> ramdisks.
>>>
>>>> Even still though - it looks like you have a further issue - your Lucene
>>>> 2.9 old-api results don't use it, and are still not good.
>>>>
>>>>
>>>>
>>> The quick results:
>>>
>>> ramdisk: sudo mount -t tmpfs tmpfs /tmp/space -o
>>> size=1G,nr_inodes=200k,mode=01777
>>>
>>> config: impl=SeparateFile serial=false nThreads=4 iterations=100
>>> bufsize=1024 poolsize=2 filelen=730554368
>>> answer=-282295611, ms=173550, MB/sec=1683.7899579371938
>>>
>>> config: impl=ChannelFile serial=false nThreads=4 iterations=100
>>> bufsize=1024 poolsize=2 filelen=730554368
>>> answer=-282295361, ms=1377768, MB/sec=212.09793463050383
>>>
>>>
>>> --
>>> - Mark
>>>
>>> http://www.lucidimagination.com
>>>
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: lucene 2.9.0RC4 slower than 2.4.1?

Posted by Thomas Becker <th...@net-m.de>.


Mark Miller wrote:
> Thomas Becker wrote:
>> Hey Mark,
>>
>> yes. I'm running the app on unix. You see the difference between 2.9 and 2.4 here:
>>
>> http://ankeschwarzer.de/tmp/graph.jpg
>>   
> Right - I know your measurements showed a difference (and will keep that
> in mind) - but the profiling results then seem
> oddly similar.
Ok, got your point and agree.
> 
>> 2.4 responds much quicker thus increasing throughput severly. I'm having a
>> single segment only:
>>
>> -rw-r--r-- 1 asuser asgroup         20 Sep  9 16:40 segments.gen
>> -rw-r--r-- 1 asuser asgroup        294 Sep  9 16:44 segments_1o
>> -rw-r--r-- 1 asuser asgroup 2810603184 Sep  9 16:44 _7c.cfs
>>
>> The FileChannel.read hotspot is indeed strange. Especially if taking into
>> account that the index is lying on a tmpfs (in memory). And 2.4 should be using
>> NIOFSDirectory as well?! Will check that.
>>   
> 2.4 did not use NIOFSDirectory by default - you would have had to
> manually specified it. Now its used by default if its detects a non
> Windows OS.
Yes, I've checked that in the sources. I'm right now building the app and will
try it with SimpleFSDirectory. FSDirectory.open is replaced with new
SimpleFSDirectory. Will collect and post the results tomorrow. Seems to be a tad
faster. Details and new profiling data tomorrow. Have to leave now, sorry.
> 
> Any particular reason your profiling output does not show invocations?
> Its not essential most likely, but I've found it to be helpful in
> comparisons.
Yes, I reduced the profiler's impact on performance. It got horribly slow with
my usual profiling settings as soon as I included the org.apache.lucene files.
> 
> We are about to release 2.9, and its been such a long haul, I'd hate to
> see a release with an unknown performance trap.
> 
And me too. Doing my best to find the issue. :)
-- 
Thomas Becker
Senior JEE Developer

net mobile AG
Zollhof 17
40221 Düsseldorf
GERMANY

Phone:    +49 211 97020-195
Fax:      +49 211 97020-949
Mobile:   +49 173 5146567 (private)
E-Mail:   mailto:thomas.becker@net-m.de
Internet: http://www.net-m.de

Registergericht:  Amtsgericht Düsseldorf, HRB 48022
Vorstand:         Theodor Niehues (Vorsitzender), Frank Hartmann,
                 Kai Markus Kulas, Dieter Plassmann
Vorsitzender des
Aufsichtsrates:   Dr. Michael Briem

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: lucene 2.9.0RC4 slower than 2.4.1?

Posted by Yonik Seeley <yo...@lucidimagination.com>.

On Wed, Sep 16, 2009 at 12:33 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
> How should we proceed? Stop the final artifact build and voting or proceed
> with the release of 2.9? We waited so long and for most people it is faster
> than slower!

I think we know that 2.9 will not be faster for everyone:
  - Per-segment searching and the new comparatores are a general win,
but will be slower for some people.
  - Query parsing and small document indexing will be somewhat slower
due to the new token APIs (the workarounds for back compatibility) if
token streams aren't reused.

I don't see any indication of any bugs in Lucene in this thread either.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: lucene 2.9.0RC4 slower than 2.4.1?

Posted by Mark Miller <ma...@gmail.com>.

I agree - best to find the cause before making a decision. There are
enough smart people in the wings, I can't imagine this should take us
that long. We have solved a good chunk of it already, and have only just
begun chunk two.

-- 
- Mark

http://www.lucidimagination.com



Thomas Becker wrote:
> I suggest to find the root cause and then decide about the release. Tomorrow I
> will spent the whole working day on the issue if no prio1 pops up.
>
> Sadly I've to leave early today, since I'm moving to a new flat... :(
>
> Uwe Schindler wrote:
>   
>> How should we proceed? Stop the final artifact build and voting or proceed
>> with the release of 2.9? We waited so long and for most people it is faster
>> than slower!
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: uwe@thetaphi.de
>>
>>
>>     
>>> -----Original Message-----
>>> From: Mark Miller [mailto:markrmiller@gmail.com]
>>> Sent: Wednesday, September 16, 2009 6:23 PM
>>> To: java-user@lucene.apache.org
>>> Subject: Re: lucene 2.9.0RC4 slower than 2.4.1?
>>>
>>> bq. I'll do some profiling now again and let you know the results.
>>>
>>> Great - it will be interesting to see the results. My guess, based on
>>> the 2.9 new api profiling, is that your queries may not be agreeing with
>>> some of the changes somehow. Along with the profiling, can you fill us
>>> in on the query types you are using as well? (eg qualities)
>>>
>>> And grab invocations if its possible.
>>>
>>> --
>>> - Mark
>>>
>>> http://www.lucidimagination.com
>>>
>>>
>>>
>>> Thomas Becker wrote:
>>>       
>>>> Tests run on tmpfs:
>>>> config: impl=SeparateFile serial=false nThreads=4 iterations=100
>>>>         
>>> bufsize=1024
>>>       
>>>> poolsize=2 filelen=18920301
>>>> answer=850258539, ms=8090, MB/sec=935.4907787391842
>>>> config: impl=ChannelFile serial=false nThreads=4 iterations=100
>>>>         
>>> bufsize=1024
>>>       
>>>> poolsize=2 filelen=18920301
>>>> answer=850258903, ms=39444, MB/sec=191.8700030422878
>>>> config: impl=ChannelPread serial=false nThreads=4 iterations=100
>>>>         
>>> bufsize=1024
>>>       
>>>> poolsize=2 filelen=18920301
>>>> answer=850258903, ms=8504, MB/sec=889.9483066792098
>>>> config: impl=PooledPread serial=false nThreads=4 iterations=100
>>>>         
>>> bufsize=1024
>>>       
>>>> poolsize=2 filelen=18920301
>>>> answer=850258903, ms=9585, MB/sec=789.5795931142409
>>>>
>>>> I did run some tests now with SimpleFSDirectory and MMapDirectory. Both
>>>>         
>>> are
>>>       
>>>> faster than NIOFS and the response times improved. But it's still slower
>>>>         
>>> than 2.4.
>>>       
>>>> I'll do some profiling now again and let you know the results.
>>>>
>>>> Thanks again for all the great support to all who've answered.
>>>>
>>>>
>>>> Mark Miller wrote:
>>>>
>>>>         
>>>>> Can you run the following test on your RAMDISK?
>>>>>
>>>>> http://people.apache.org/~markrmiller/FileReadTest.java
>>>>>
>>>>> I've taken it from the following issue (in which NIOFSDirectory was
>>>>> developed):
>>>>> https://issues.apache.org/jira/browse/LUCENE-753
>>>>>
>>>>>
>>>>>           
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>       
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>     
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: lucene 2.9.0RC4 slower than 2.4.1?

Posted by Thomas Becker <th...@net-m.de>.

I suggest to find the root cause and then decide about the release. Tomorrow I
will spent the whole working day on the issue if no prio1 pops up.

Sadly I've to leave early today, since I'm moving to a new flat... :(

Uwe Schindler wrote:
> How should we proceed? Stop the final artifact build and voting or proceed
> with the release of 2.9? We waited so long and for most people it is faster
> than slower!
> 
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
> 
> 
>> -----Original Message-----
>> From: Mark Miller [mailto:markrmiller@gmail.com]
>> Sent: Wednesday, September 16, 2009 6:23 PM
>> To: java-user@lucene.apache.org
>> Subject: Re: lucene 2.9.0RC4 slower than 2.4.1?
>>
>> bq. I'll do some profiling now again and let you know the results.
>>
>> Great - it will be interesting to see the results. My guess, based on
>> the 2.9 new api profiling, is that your queries may not be agreeing with
>> some of the changes somehow. Along with the profiling, can you fill us
>> in on the query types you are using as well? (eg qualities)
>>
>> And grab invocations if its possible.
>>
>> --
>> - Mark
>>
>> http://www.lucidimagination.com
>>
>>
>>
>> Thomas Becker wrote:
>>> Tests run on tmpfs:
>>> config: impl=SeparateFile serial=false nThreads=4 iterations=100
>> bufsize=1024
>>> poolsize=2 filelen=18920301
>>> answer=850258539, ms=8090, MB/sec=935.4907787391842
>>> config: impl=ChannelFile serial=false nThreads=4 iterations=100
>> bufsize=1024
>>> poolsize=2 filelen=18920301
>>> answer=850258903, ms=39444, MB/sec=191.8700030422878
>>> config: impl=ChannelPread serial=false nThreads=4 iterations=100
>> bufsize=1024
>>> poolsize=2 filelen=18920301
>>> answer=850258903, ms=8504, MB/sec=889.9483066792098
>>> config: impl=PooledPread serial=false nThreads=4 iterations=100
>> bufsize=1024
>>> poolsize=2 filelen=18920301
>>> answer=850258903, ms=9585, MB/sec=789.5795931142409
>>>
>>> I did run some tests now with SimpleFSDirectory and MMapDirectory. Both
>> are
>>> faster than NIOFS and the response times improved. But it's still slower
>> than 2.4.
>>> I'll do some profiling now again and let you know the results.
>>>
>>> Thanks again for all the great support to all who've answered.
>>>
>>>
>>> Mark Miller wrote:
>>>
>>>> Can you run the following test on your RAMDISK?
>>>>
>>>> http://people.apache.org/~markrmiller/FileReadTest.java
>>>>
>>>> I've taken it from the following issue (in which NIOFSDirectory was
>>>> developed):
>>>> https://issues.apache.org/jira/browse/LUCENE-753
>>>>
>>>>
>>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 

-- 
Thomas Becker
Senior JEE Developer

net mobile AG
Zollhof 17
40221 Düsseldorf
GERMANY

Phone:    +49 211 97020-195
Fax:      +49 211 97020-949
Mobile:   +49 173 5146567 (private)
E-Mail:   mailto:thomas.becker@net-m.de
Internet: http://www.net-m.de

Registergericht:  Amtsgericht Düsseldorf, HRB 48022
Vorstand:         Theodor Niehues (Vorsitzender), Frank Hartmann,
                 Kai Markus Kulas, Dieter Plassmann
Vorsitzender des
Aufsichtsrates:   Dr. Michael Briem

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: lucene 2.9.0RC4 slower than 2.4.1?

Posted by Uwe Schindler <uw...@thetaphi.de>.

How should we proceed? Stop the final artifact build and voting or proceed
with the release of 2.9? We waited so long and for most people it is faster
than slower!

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Mark Miller [mailto:markrmiller@gmail.com]
> Sent: Wednesday, September 16, 2009 6:23 PM
> To: java-user@lucene.apache.org
> Subject: Re: lucene 2.9.0RC4 slower than 2.4.1?
> 
> bq. I'll do some profiling now again and let you know the results.
> 
> Great - it will be interesting to see the results. My guess, based on
> the 2.9 new api profiling, is that your queries may not be agreeing with
> some of the changes somehow. Along with the profiling, can you fill us
> in on the query types you are using as well? (eg qualities)
> 
> And grab invocations if its possible.
> 
> --
> - Mark
> 
> http://www.lucidimagination.com
> 
> 
> 
> Thomas Becker wrote:
> > Tests run on tmpfs:
> > config: impl=SeparateFile serial=false nThreads=4 iterations=100
> bufsize=1024
> > poolsize=2 filelen=18920301
> > answer=850258539, ms=8090, MB/sec=935.4907787391842
> > config: impl=ChannelFile serial=false nThreads=4 iterations=100
> bufsize=1024
> > poolsize=2 filelen=18920301
> > answer=850258903, ms=39444, MB/sec=191.8700030422878
> > config: impl=ChannelPread serial=false nThreads=4 iterations=100
> bufsize=1024
> > poolsize=2 filelen=18920301
> > answer=850258903, ms=8504, MB/sec=889.9483066792098
> > config: impl=PooledPread serial=false nThreads=4 iterations=100
> bufsize=1024
> > poolsize=2 filelen=18920301
> > answer=850258903, ms=9585, MB/sec=789.5795931142409
> >
> > I did run some tests now with SimpleFSDirectory and MMapDirectory. Both
> are
> > faster than NIOFS and the response times improved. But it's still slower
> than 2.4.
> >
> > I'll do some profiling now again and let you know the results.
> >
> > Thanks again for all the great support to all who've answered.
> >
> >
> > Mark Miller wrote:
> >
> >> Can you run the following test on your RAMDISK?
> >>
> >> http://people.apache.org/~markrmiller/FileReadTest.java
> >>
> >> I've taken it from the following issue (in which NIOFSDirectory was
> >> developed):
> >> https://issues.apache.org/jira/browse/LUCENE-753
> >>
> >>
> >
> >
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: lucene 2.9.0RC4 slower than 2.4.1?

Posted by Uwe Schindler <uw...@thetaphi.de>.

I found one thing in your debug output:

You are using a lot of CachingWrapperFilters around QueryWrapperFilter.
According to http://issues.apache.org/jira/browse/LUCENE-1427,
QueryWrapperFilter does not copy the scorer's doc ids into a OpenBitSet, it
instead returns the scorer itself as DocIdSet (which is perfectly legal ant
very good, if no caching is involved). The CachingWrapperFilter just caches
this DocIdSet, but the result is that the scorer's next/advance methods are
executed even if the cache had cached the filter before.

Maybe we need some change to CachingWrapperFilter that caches the DocIdSets
as before, but optionally would wrap it into an OpenBitSet, if it is not an
instance of OBS.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Thomas Becker [mailto:thomas.becker@net-m.de]
> Sent: Wednesday, September 16, 2009 6:56 PM
> To: java-user@lucene.apache.org
> Subject: Re: lucene 2.9.0RC4 slower than 2.4.1?
> 
> So here's a debug message showing the query:
> 
> 2009-09-16 18:53:59,642 [DEBUG] [http-8440-2] [] [2144122] []
> service.impl.LuceneBaseService: items search('viewable:(FINDALL 0 1 2 )',
> BooleanFilter( +CachingWrapperFilter(QueryWrapperFilter(+issalesallo
> wed:true))
> +CachingWrapperFilter(QueryWrapperFilter(licenseGroupKeys:lablxrox))
> +CachingWrapperFilter(QueryWrapperFilter(+(viewable:1 viewable:2)))
> +CachingWrapperFilter(QueryWrapperFilter(+contentReposit
> oryIds:146))
> +CachingWrapperFilter(QueryWrapperFilter(+contentGroupIds:14554))
> -
> CachingWrapperFilter(QueryWrapperFilter(digitalReleaseDate:{lablxrox200909
> 16 TO
> lablxrox99999999} digitalDeleteDate:[lablxro
> x19000101 TO lablxrox20090916]))
> -CachingWrapperFilter(QueryWrapperFilter(+excludedContractIds:all))
> -CachingWrapperFilter(QueryWrapperFilter(+excludedContractIds:2144122))),
> <custom:"position": null>, 0,
>  16)
> 
> By the way, will it make sense to move this into a jira issue? It'll get
> quite
> tough to find the right information already since this thread is growing
> quite fast.
> 
> Mark Miller wrote:
> > bq. I'll do some profiling now again and let you know the results.
> >
> > Great - it will be interesting to see the results. My guess, based on
> > the 2.9 new api profiling, is that your queries may not be agreeing with
> > some of the changes somehow. Along with the profiling, can you fill us
> > in on the query types you are using as well? (eg qualities)
> >
> > And grab invocations if its possible.
> >
> 
> --
> Thomas Becker
> Senior JEE Developer
> 
> net mobile AG
> Zollhof 17
> 40221 Düsseldorf
> GERMANY
> 
> Phone:    +49 211 97020-195
> Fax:      +49 211 97020-949
> Mobile:   +49 173 5146567 (private)
> E-Mail:   mailto:thomas.becker@net-m.de
> Internet: http://www.net-m.de
> 
> Registergericht:  Amtsgericht Düsseldorf, HRB 48022
> Vorstand:         Theodor Niehues (Vorsitzender), Frank Hartmann,
>                  Kai Markus Kulas, Dieter Plassmann
> Vorsitzender des
> Aufsichtsrates:   Dr. Michael Briem
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: lucene 2.9.0RC4 slower than 2.4.1?

Posted by Uwe Schindler <uw...@thetaphi.de>.

Hi Thomas,

I think we found the root of the problem. We opened
https://issues.apache.org/jira/browse/LUCENE-1911 .

Could you please try the attached patch, if it solves your problems? It has
to do with the work of CachingWrapperFilter and QueryWrapperFilter together,
which changed in 2.9.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Thomas Becker [mailto:thomas.becker@net-m.de]
> Sent: Wednesday, September 16, 2009 6:56 PM
> To: java-user@lucene.apache.org
> Subject: Re: lucene 2.9.0RC4 slower than 2.4.1?
> 
> So here's a debug message showing the query:
> 
> 2009-09-16 18:53:59,642 [DEBUG] [http-8440-2] [] [2144122] []
> service.impl.LuceneBaseService: items search('viewable:(FINDALL 0 1 2 )',
> BooleanFilter( +CachingWrapperFilter(QueryWrapperFilter(+issalesallo
> wed:true))
> +CachingWrapperFilter(QueryWrapperFilter(licenseGroupKeys:lablxrox))
> +CachingWrapperFilter(QueryWrapperFilter(+(viewable:1 viewable:2)))
> +CachingWrapperFilter(QueryWrapperFilter(+contentReposit
> oryIds:146))
> +CachingWrapperFilter(QueryWrapperFilter(+contentGroupIds:14554))
> -
> CachingWrapperFilter(QueryWrapperFilter(digitalReleaseDate:{lablxrox200909
> 16 TO
> lablxrox99999999} digitalDeleteDate:[lablxro
> x19000101 TO lablxrox20090916]))
> -CachingWrapperFilter(QueryWrapperFilter(+excludedContractIds:all))
> -CachingWrapperFilter(QueryWrapperFilter(+excludedContractIds:2144122))),
> <custom:"position": null>, 0,
>  16)
> 
> By the way, will it make sense to move this into a jira issue? It'll get
> quite
> tough to find the right information already since this thread is growing
> quite fast.
> 
> Mark Miller wrote:
> > bq. I'll do some profiling now again and let you know the results.
> >
> > Great - it will be interesting to see the results. My guess, based on
> > the 2.9 new api profiling, is that your queries may not be agreeing with
> > some of the changes somehow. Along with the profiling, can you fill us
> > in on the query types you are using as well? (eg qualities)
> >
> > And grab invocations if its possible.
> >
> 
> --
> Thomas Becker
> Senior JEE Developer
> 
> net mobile AG
> Zollhof 17
> 40221 Düsseldorf
> GERMANY
> 
> Phone:    +49 211 97020-195
> Fax:      +49 211 97020-949
> Mobile:   +49 173 5146567 (private)
> E-Mail:   mailto:thomas.becker@net-m.de
> Internet: http://www.net-m.de
> 
> Registergericht:  Amtsgericht Düsseldorf, HRB 48022
> Vorstand:         Theodor Niehues (Vorsitzender), Frank Hartmann,
>                  Kai Markus Kulas, Dieter Plassmann
> Vorsitzender des
> Aufsichtsrates:   Dr. Michael Briem
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: lucene 2.9.0RC4 slower than 2.4.1?

Posted by Thomas Becker <th...@net-m.de>.

So here's a debug message showing the query:

2009-09-16 18:53:59,642 [DEBUG] [http-8440-2] [] [2144122] []
service.impl.LuceneBaseService: items search('viewable:(FINDALL 0 1 2 )',
BooleanFilter( +CachingWrapperFilter(QueryWrapperFilter(+issalesallo
wed:true)) +CachingWrapperFilter(QueryWrapperFilter(licenseGroupKeys:lablxrox))
+CachingWrapperFilter(QueryWrapperFilter(+(viewable:1 viewable:2)))
+CachingWrapperFilter(QueryWrapperFilter(+contentReposit
oryIds:146)) +CachingWrapperFilter(QueryWrapperFilter(+contentGroupIds:14554))
-CachingWrapperFilter(QueryWrapperFilter(digitalReleaseDate:{lablxrox20090916 TO
lablxrox99999999} digitalDeleteDate:[lablxro
x19000101 TO lablxrox20090916]))
-CachingWrapperFilter(QueryWrapperFilter(+excludedContractIds:all))
-CachingWrapperFilter(QueryWrapperFilter(+excludedContractIds:2144122))),
<custom:"position": null>, 0,
 16)

By the way, will it make sense to move this into a jira issue? It'll get quite
tough to find the right information already since this thread is growing quite fast.

Mark Miller wrote:
> bq. I'll do some profiling now again and let you know the results.
> 
> Great - it will be interesting to see the results. My guess, based on
> the 2.9 new api profiling, is that your queries may not be agreeing with
> some of the changes somehow. Along with the profiling, can you fill us
> in on the query types you are using as well? (eg qualities)
> 
> And grab invocations if its possible.
> 

-- 
Thomas Becker
Senior JEE Developer

net mobile AG
Zollhof 17
40221 Düsseldorf
GERMANY

Phone:    +49 211 97020-195
Fax:      +49 211 97020-949
Mobile:   +49 173 5146567 (private)
E-Mail:   mailto:thomas.becker@net-m.de
Internet: http://www.net-m.de

Registergericht:  Amtsgericht Düsseldorf, HRB 48022
Vorstand:         Theodor Niehues (Vorsitzender), Frank Hartmann,
                 Kai Markus Kulas, Dieter Plassmann
Vorsitzender des
Aufsichtsrates:   Dr. Michael Briem

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: lucene 2.9.0RC4 slower than 2.4.1?

Posted by Mark Miller <ma...@gmail.com>.

Very interesting. Something can't be going right here. You are searching
against a single segment, yet, just for example,
while before, DisjunctionSumScorer.advanceAfterCurrent was being called
154,000 times, now its being called 1.3 million times.
Other scoring methods have similar crazy jumps.

Guys that worked on that stuff - any ideas on whats up here?


They both appear to be using BooleanScorer2.

-- 
- Mark

http://www.lucidimagination.com




Thomas Becker wrote:
> New Profiling sessions with invocation counts. A single lucene search request
> with huge resultset (169k items).
>
> Quite interesting results though and there's definetly something wrong with luc
> 2.9 and the way I'm using it. But see yourself:
>
> http://ankeschwarzer.de/tmp/lucene_24_oldapi_singlereq.png
> http://ankeschwarzer.de/tmp/lucene_29_oldapi_singlereq.png
> and even worse:
> http://ankeschwarzer.de/tmp/lucene_29_newapi_mmap_singlereq.png
>
> Have to verify that the last one is not by accident more than one request. Will
> do the run again and then post the required info.
>
> Mark Miller wrote:
>   
>> bq. I'll do some profiling now again and let you know the results.
>>
>> Great - it will be interesting to see the results. My guess, based on
>> the 2.9 new api profiling, is that your queries may not be agreeing with
>> some of the changes somehow. Along with the profiling, can you fill us
>> in on the query types you are using as well? (eg qualities)
>>
>> And grab invocations if its possible.
>>
>>     
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: lucene 2.9.0RC4 slower than 2.4.1?

Posted by Thomas Becker <th...@net-m.de>.

This might be a change in my getHitCounts method. I will dig into that by tomorrow.

I'm really sorry, but I've to leave now. Otherwise I'll have other issues.

Uwe Schindler wrote:
>> http://ankeschwarzer.de/tmp/lucene_29_newapi_mmap_singlereq.png
>>
>> Have to verify that the last one is not by accident more than one request.
>> Will
>> do the run again and then post the required info.
> 
> The last figure shows, that IndexSearcher.searchWithFilter was called twice
> in contrast to the first figure, where IndexSearcher.search was called only
> once.
> 
> Uwe
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 

-- 
Thomas Becker
Senior JEE Developer

net mobile AG
Zollhof 17
40221 Düsseldorf
GERMANY

Phone:    +49 211 97020-195
Fax:      +49 211 97020-949
Mobile:   +49 173 5146567 (private)
E-Mail:   mailto:thomas.becker@net-m.de
Internet: http://www.net-m.de

Registergericht:  Amtsgericht Düsseldorf, HRB 48022
Vorstand:         Theodor Niehues (Vorsitzender), Frank Hartmann,
                 Kai Markus Kulas, Dieter Plassmann
Vorsitzender des
Aufsichtsrates:   Dr. Michael Briem

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: lucene 2.9.0RC4 slower than 2.4.1?

Posted by Uwe Schindler <uw...@thetaphi.de>.

I am currently preparing a patch for CachingWrapperFilter that has a boolean
ctor parameter (useOpenBitSetCache) and the method getDocIdSet will then do
what QueryWrapperFilter did before LUCENE-1427.

I would not do this in QueryWrapperFilter like before, because it would
slowdown MultiTermQuery if no caching is used and the constantScore/Boolean
rewrite mode is active.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Mark Miller [mailto:markrmiller@gmail.com]
> Sent: Wednesday, September 16, 2009 7:33 PM
> To: java-user@lucene.apache.org
> Subject: Re: lucene 2.9.0RC4 slower than 2.4.1?
> 
> Yeah, thanks Uwe - I had read it quickly, but was just rereading and it
> was sinking in. I hadn't cross correlated the issues yet.
> 
> Makes perfect sense. Very nice catch.
> 
> >Maybe we need some change to CachingWrapperFilter that caches the
> DocIdSets
> >as before, but optionally would wrap it into an OpenBitSet, if it is not
> an
> >instance of OBS.
> 
> So - do we address this for 2.9. This is a nasty performance trap, is it
> not?
> 
> We recommend that you use both of these classes together - if it can
> work like this though ...
> 
> Seems like a bug to me.
> 
> - mark
> 
> Uwe Schindler wrote:
> > See my mail about the CachingWrapperFilter and QueryWrapperFilter, I
> think
> > it explains this behaviour (and Thomas ran some warming queries before).
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> >
> >
> >> -----Original Message-----
> >> From: Mark Miller [mailto:markrmiller@gmail.com]
> >> Sent: Wednesday, September 16, 2009 7:23 PM
> >> To: java-user@lucene.apache.org
> >> Subject: Re: lucene 2.9.0RC4 slower than 2.4.1?
> >>
> >> Nevermind. I see advance wasn't around in 2.4. This is part of the
> >> DocIdSetIterator changes.
> >>
> >> Anyway - either these are just not comparable runs, or there is a major
> >> bug (which seems unlikely).
> >>
> >> Just to keep pointing out the obvious:
> >>
> >> 2.4 calls doc 195,000 times
> >> 2.9 calls docId 1.4 million times
> >>
> >> That just doesn't jive.
> >>
> >> Mark Miller wrote:
> >>
> >>> Notice that while DisjunctionScorer.advance and
> >>> DisjuntionScorer.advanceAfterCurrent appear to be called
> >>> in 2.9, in 2.4, I am only seeing DisjuntionScorer.advanceAfterCurrent
> >>> called.
> >>>
> >>> Can someone explain that?
> >>>
> >>> Mark Miller wrote:
> >>>
> >>>
> >>>> Something is very odd about this if they both cover the same search
> and
> >>>> the environ for both is identical. Even if one search was done twice,
> >>>> and we divide the numbers for the new api by 2 - its still *very*
> odd.
> >>>>
> >>>> With 2.4, ScorerDocQueue.topDoc is called half a million times.
> >>>> With 2.9, its called over 4 million times.
> >>>>
> >>>> Huh?
> >>>>
> >>>> Thomas Becker wrote:
> >>>>
> >>>>
> >>>>
> >>>>> No it's only a single segment. But two calls. One doing a
> getHitsCount
> >>>>>
> >> first and
> >>
> >>>>> the other doing the actual search. I'll paste both methods below if
> >>>>>
> >> someone's
> >>
> >>>>> interested.
> >>>>>
> >>>>> Will dig into lucene's sources and compare 2.4 search behaviour for
> my
> >>>>>
> >> case with
> >>
> >>>>> 2.9 tomorrow. It was about time to get more into lucene-core sources
> >>>>>
> >> anyhow. :)
> >>
> >>>>> See you tomorrow guys and thanks a lot again! It's a pleasure.
> >>>>>
> >>>>> 	public int getHitsCount(String query, Filter filter) throws
> >>>>> LuceneServiceException {
> >>>>> 		log.debug("getHitsCount('{}, {}')", query, filter);
> >>>>> 		if (StringUtils.isBlank(query)) {
> >>>>> 			log.warn("getHitsCount: empty lucene
query");
> >>>>> 			return 0;
> >>>>> 		}
> >>>>> 		long startTimeMillis = System.currentTimeMillis();
> >>>>> 		int count = 0;
> >>>>>
> >>>>> 		if (indexSearcher == null) {
> >>>>> 			return 0;
> >>>>> 		}
> >>>>>
> >>>>> 		BooleanQuery.setMaxClauseCount(MAXCLAUSECOUNT);
> >>>>> 		Query q = null;
> >>>>> 		try {
> >>>>> 			q = createQuery(query);
> >>>>> 			TopScoreDocCollector tsdc =
> >>>>>
> >> TopScoreDocCollector.create(1, true);
> >>
> >>>>> 			indexSearcher.search(q, filter, tsdc);
> >>>>> 			count = tsdc.getTotalHits();
> >>>>> 			log.info("getHitsCount: count = {}",count);
> >>>>> 		} catch (ParseException ex) {
> >>>>> 			throw new LuceneServiceException("invalid
lucene
> >>>>>
> > query:"
> >
> >> + query, ex);
> >>
> >>>>> 		} catch (IOException e) {
> >>>>> 			throw new LuceneServiceException("
indexSearcher
> >>>>>
> > could
> >
> >> be corrupted", e);
> >>
> >>>>> 		} finally {
> >>>>> 			long durationMillis =
System.currentTimeMillis() -
> >>>>>
> >> startTimeMillis;
> >>
> >>>>> 			if (durationMillis > slowQueryLimit) {
> >>>>> 				log.warn("getHitsCount: Slow query:
{} ms,
> >>>>>
> >> query={}", durationMillis, query);
> >>
> >>>>> 			}
> >>>>> 			log.debug("getHitsCount: query took {} ms",
> >>>>>
> >> durationMillis);
> >>
> >>>>> 		}
> >>>>> 		return count;
> >>>>> 	}
> >>>>>
> >>>>> 	public List<Document> search(String query, Filter filter,
Sort
> sort,
> >>>>>
> >> int from,
> >>
> >>>>> int size) throws LuceneServiceException {
> >>>>> 		log.debug("{} search('{}', {}, {}, {}, {})", new
> Object[] {
> >>>>>
> >> indexAlias, query,
> >>
> >>>>> filter, sort, from, size });
> >>>>> 		long startTimeMillis = System.currentTimeMillis();
> >>>>>
> >>>>> 		List<Document> docs = new ArrayList<Document>();
> >>>>> 		if (indexSearcher == null) {
> >>>>> 			return docs;
> >>>>> 		}
> >>>>> 		Query q = null;
> >>>>> 		try {
> >>>>> 			if (query == null) {
> >>>>> 				log.warn("search: lucene query is
null...");
> >>>>> 				return docs;
> >>>>> 			}
> >>>>> 			q = createQuery(query);
> >>>>>
BooleanQuery.setMaxClauseCount(MAXCLAUSECOUNT);
> >>>>> 			if (size < 0 || size > maxNumHits) {
> >>>>> 				// set hard limit for numHits
> >>>>> 				size = maxNumHits;
> >>>>> 				if (log.isDebugEnabled())
> >>>>> 					log.debug("search: Size set
to
> >>>>>
> > hardlimit: {}
> >
> >> for query: {} with filter:
> >>
> >>>>> {}", new Object[] { size, query, filter });
> >>>>> 			}
> >>>>> 			TopFieldCollector collector =
> >>>>>
> >> TopFieldCollector.create(sort, size + from,
> >>
> >>>>> true, false, false, true);
> >>>>> 			indexSearcher.search(q, filter, collector);
> >>>>> 			if(size > collector.getTotalHits())
> >>>>> 				size = collector.getTotalHits();
> >>>>> 			if (size > 100000)
> >>>>> 				log.info("search: size: {} bigger
than
> >>>>>
> > 100.000 for
> >
> >> query: {} with filter:
> >>
> >>>>> {}", new Object[] { size, query, filter });
> >>>>> 			TopDocs td = collector.topDocs(from, size);
> >>>>> 			ScoreDoc[] scoreDocs = td.scoreDocs;
> >>>>> 			for (ScoreDoc scoreDoc : scoreDocs) {
> >>>>>
docs.add(indexSearcher.doc(scoreDoc.doc));
> >>>>> 			}
> >>>>> 		} catch (ParseException e) {
> >>>>> 			log.warn("search: ParseException: {}",
> >>>>>
> > e.getMessage());
> >
> >>>>> 			if (log.isDebugEnabled())
> >>>>> 				log.warn("search: ParseException: ",
e);
> >>>>> 			return Collections.emptyList();
> >>>>> 		} catch (IOException e) {
> >>>>> 			log.warn("search: IOException: ", e);
> >>>>> 			return Collections.emptyList();
> >>>>> 		} finally {
> >>>>> 			long durationMillis =
System.currentTimeMillis() -
> >>>>>
> >> startTimeMillis;
> >>
> >>>>> 			if (durationMillis > slowQueryLimit) {
> >>>>> 				log.warn("search: Slow query: {} ms,
> >>>>>
> > query={},
> >
> >> indexUsed={}",
> >>
> >>>>> 						new Object[] {
> >>>>>
> > durationMillis, query,
> >
> >>>>> indexSearcher.getIndexReader().directory() });
> >>>>> 			}
> >>>>> 			log.debug("search: query took {} ms",
> >>>>>
> > durationMillis);
> >
> >>>>> 		}
> >>>>> 		return docs;
> >>>>> 	}
> >>>>>
> >>>>>
> >>>>> Uwe Schindler wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>>>> http://ankeschwarzer.de/tmp/lucene_29_newapi_mmap_singlereq.png
> >>>>>>>>
> >>>>>>>> Have to verify that the last one is not by accident more than one
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>> request.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>> Will
> >>>>>>>> do the run again and then post the required info.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>> The last figure shows, that IndexSearcher.searchWithFilter was
> >>>>>>>
> >> called
> >>
> >>>>>>> twice
> >>>>>>> in contrast to the first figure, where IndexSearcher.search was
> >>>>>>>
> >> called
> >>
> >>>>>>> only
> >>>>>>> once.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>> I forgot, searchWithFilter it is called per segment in 2.9. If it
> was
> >>>>>>
> >> only
> >>
> >>>>>> one search, you must have two segments and therefore no optimized
> >>>>>>
> >> index for
> >>
> >>>>>> this to be correct?
> >>>>>>
> >>>>>> Uwe
> >>>>>>
> >>>>>>
> >>>>>> -------------------------------------------------------------------
> --
> >>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >> --
> >> - Mark
> >>
> >> http://www.lucidimagination.com
> >>
> >>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> 
> 
> --
> - Mark
> 
> http://www.lucidimagination.com
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: lucene 2.9.0RC4 slower than 2.4.1?

Posted by Mark Miller <ma...@gmail.com>.

Yeah, thanks Uwe - I had read it quickly, but was just rereading and it
was sinking in. I hadn't cross correlated the issues yet.

Makes perfect sense. Very nice catch.

>Maybe we need some change to CachingWrapperFilter that caches the DocIdSets
>as before, but optionally would wrap it into an OpenBitSet, if it is not an
>instance of OBS.

So - do we address this for 2.9. This is a nasty performance trap, is it
not?

We recommend that you use both of these classes together - if it can
work like this though ...

Seems like a bug to me.

- mark

Uwe Schindler wrote:
> See my mail about the CachingWrapperFilter and QueryWrapperFilter, I think
> it explains this behaviour (and Thomas ran some warming queries before).
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
>   
>> -----Original Message-----
>> From: Mark Miller [mailto:markrmiller@gmail.com]
>> Sent: Wednesday, September 16, 2009 7:23 PM
>> To: java-user@lucene.apache.org
>> Subject: Re: lucene 2.9.0RC4 slower than 2.4.1?
>>
>> Nevermind. I see advance wasn't around in 2.4. This is part of the
>> DocIdSetIterator changes.
>>
>> Anyway - either these are just not comparable runs, or there is a major
>> bug (which seems unlikely).
>>
>> Just to keep pointing out the obvious:
>>
>> 2.4 calls doc 195,000 times
>> 2.9 calls docId 1.4 million times
>>
>> That just doesn't jive.
>>
>> Mark Miller wrote:
>>     
>>> Notice that while DisjunctionScorer.advance and
>>> DisjuntionScorer.advanceAfterCurrent appear to be called
>>> in 2.9, in 2.4, I am only seeing DisjuntionScorer.advanceAfterCurrent
>>> called.
>>>
>>> Can someone explain that?
>>>
>>> Mark Miller wrote:
>>>
>>>       
>>>> Something is very odd about this if they both cover the same search and
>>>> the environ for both is identical. Even if one search was done twice,
>>>> and we divide the numbers for the new api by 2 - its still *very* odd.
>>>>
>>>> With 2.4, ScorerDocQueue.topDoc is called half a million times.
>>>> With 2.9, its called over 4 million times.
>>>>
>>>> Huh?
>>>>
>>>> Thomas Becker wrote:
>>>>
>>>>
>>>>         
>>>>> No it's only a single segment. But two calls. One doing a getHitsCount
>>>>>           
>> first and
>>     
>>>>> the other doing the actual search. I'll paste both methods below if
>>>>>           
>> someone's
>>     
>>>>> interested.
>>>>>
>>>>> Will dig into lucene's sources and compare 2.4 search behaviour for my
>>>>>           
>> case with
>>     
>>>>> 2.9 tomorrow. It was about time to get more into lucene-core sources
>>>>>           
>> anyhow. :)
>>     
>>>>> See you tomorrow guys and thanks a lot again! It's a pleasure.
>>>>>
>>>>> 	public int getHitsCount(String query, Filter filter) throws
>>>>> LuceneServiceException {
>>>>> 		log.debug("getHitsCount('{}, {}')", query, filter);
>>>>> 		if (StringUtils.isBlank(query)) {
>>>>> 			log.warn("getHitsCount: empty lucene query");
>>>>> 			return 0;
>>>>> 		}
>>>>> 		long startTimeMillis = System.currentTimeMillis();
>>>>> 		int count = 0;
>>>>>
>>>>> 		if (indexSearcher == null) {
>>>>> 			return 0;
>>>>> 		}
>>>>>
>>>>> 		BooleanQuery.setMaxClauseCount(MAXCLAUSECOUNT);
>>>>> 		Query q = null;
>>>>> 		try {
>>>>> 			q = createQuery(query);
>>>>> 			TopScoreDocCollector tsdc =
>>>>>           
>> TopScoreDocCollector.create(1, true);
>>     
>>>>> 			indexSearcher.search(q, filter, tsdc);
>>>>> 			count = tsdc.getTotalHits();
>>>>> 			log.info("getHitsCount: count = {}",count);
>>>>> 		} catch (ParseException ex) {
>>>>> 			throw new LuceneServiceException("invalid lucene
>>>>>           
> query:"
>   
>> + query, ex);
>>     
>>>>> 		} catch (IOException e) {
>>>>> 			throw new LuceneServiceException(" indexSearcher
>>>>>           
> could
>   
>> be corrupted", e);
>>     
>>>>> 		} finally {
>>>>> 			long durationMillis = System.currentTimeMillis() -
>>>>>           
>> startTimeMillis;
>>     
>>>>> 			if (durationMillis > slowQueryLimit) {
>>>>> 				log.warn("getHitsCount: Slow query: {} ms,
>>>>>           
>> query={}", durationMillis, query);
>>     
>>>>> 			}
>>>>> 			log.debug("getHitsCount: query took {} ms",
>>>>>           
>> durationMillis);
>>     
>>>>> 		}
>>>>> 		return count;
>>>>> 	}
>>>>>
>>>>> 	public List<Document> search(String query, Filter filter, Sort sort,
>>>>>           
>> int from,
>>     
>>>>> int size) throws LuceneServiceException {
>>>>> 		log.debug("{} search('{}', {}, {}, {}, {})", new Object[] {
>>>>>           
>> indexAlias, query,
>>     
>>>>> filter, sort, from, size });
>>>>> 		long startTimeMillis = System.currentTimeMillis();
>>>>>
>>>>> 		List<Document> docs = new ArrayList<Document>();
>>>>> 		if (indexSearcher == null) {
>>>>> 			return docs;
>>>>> 		}
>>>>> 		Query q = null;
>>>>> 		try {
>>>>> 			if (query == null) {
>>>>> 				log.warn("search: lucene query is null...");
>>>>> 				return docs;
>>>>> 			}
>>>>> 			q = createQuery(query);
>>>>> 			BooleanQuery.setMaxClauseCount(MAXCLAUSECOUNT);
>>>>> 			if (size < 0 || size > maxNumHits) {
>>>>> 				// set hard limit for numHits
>>>>> 				size = maxNumHits;
>>>>> 				if (log.isDebugEnabled())
>>>>> 					log.debug("search: Size set to
>>>>>           
> hardlimit: {}
>   
>> for query: {} with filter:
>>     
>>>>> {}", new Object[] { size, query, filter });
>>>>> 			}
>>>>> 			TopFieldCollector collector =
>>>>>           
>> TopFieldCollector.create(sort, size + from,
>>     
>>>>> true, false, false, true);
>>>>> 			indexSearcher.search(q, filter, collector);
>>>>> 			if(size > collector.getTotalHits())
>>>>> 				size = collector.getTotalHits();
>>>>> 			if (size > 100000)
>>>>> 				log.info("search: size: {} bigger than
>>>>>           
> 100.000 for
>   
>> query: {} with filter:
>>     
>>>>> {}", new Object[] { size, query, filter });
>>>>> 			TopDocs td = collector.topDocs(from, size);
>>>>> 			ScoreDoc[] scoreDocs = td.scoreDocs;
>>>>> 			for (ScoreDoc scoreDoc : scoreDocs) {
>>>>> 				docs.add(indexSearcher.doc(scoreDoc.doc));
>>>>> 			}
>>>>> 		} catch (ParseException e) {
>>>>> 			log.warn("search: ParseException: {}",
>>>>>           
> e.getMessage());
>   
>>>>> 			if (log.isDebugEnabled())
>>>>> 				log.warn("search: ParseException: ", e);
>>>>> 			return Collections.emptyList();
>>>>> 		} catch (IOException e) {
>>>>> 			log.warn("search: IOException: ", e);
>>>>> 			return Collections.emptyList();
>>>>> 		} finally {
>>>>> 			long durationMillis = System.currentTimeMillis() -
>>>>>           
>> startTimeMillis;
>>     
>>>>> 			if (durationMillis > slowQueryLimit) {
>>>>> 				log.warn("search: Slow query: {} ms,
>>>>>           
> query={},
>   
>> indexUsed={}",
>>     
>>>>> 						new Object[] {
>>>>>           
> durationMillis, query,
>   
>>>>> indexSearcher.getIndexReader().directory() });
>>>>> 			}
>>>>> 			log.debug("search: query took {} ms",
>>>>>           
> durationMillis);
>   
>>>>> 		}
>>>>> 		return docs;
>>>>> 	}
>>>>>
>>>>>
>>>>> Uwe Schindler wrote:
>>>>>
>>>>>
>>>>>
>>>>>           
>>>>>>>> http://ankeschwarzer.de/tmp/lucene_29_newapi_mmap_singlereq.png
>>>>>>>>
>>>>>>>> Have to verify that the last one is not by accident more than one
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>> request.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>>>> Will
>>>>>>>> do the run again and then post the required info.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>> The last figure shows, that IndexSearcher.searchWithFilter was
>>>>>>>               
>> called
>>     
>>>>>>> twice
>>>>>>> in contrast to the first figure, where IndexSearcher.search was
>>>>>>>               
>> called
>>     
>>>>>>> only
>>>>>>> once.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>> I forgot, searchWithFilter it is called per segment in 2.9. If it was
>>>>>>             
>> only
>>     
>>>>>> one search, you must have two segments and therefore no optimized
>>>>>>             
>> index for
>>     
>>>>>> this to be correct?
>>>>>>
>>>>>> Uwe
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>             
>>>>>
>>>>>           
>>>>         
>>>
>>>       
>> --
>> - Mark
>>
>> http://www.lucidimagination.com
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>     
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>   


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: lucene 2.9.0RC4 slower than 2.4.1?

Posted by Uwe Schindler <uw...@thetaphi.de>.

See my mail about the CachingWrapperFilter and QueryWrapperFilter, I think
it explains this behaviour (and Thomas ran some warming queries before).

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Mark Miller [mailto:markrmiller@gmail.com]
> Sent: Wednesday, September 16, 2009 7:23 PM
> To: java-user@lucene.apache.org
> Subject: Re: lucene 2.9.0RC4 slower than 2.4.1?
> 
> Nevermind. I see advance wasn't around in 2.4. This is part of the
> DocIdSetIterator changes.
> 
> Anyway - either these are just not comparable runs, or there is a major
> bug (which seems unlikely).
> 
> Just to keep pointing out the obvious:
> 
> 2.4 calls doc 195,000 times
> 2.9 calls docId 1.4 million times
> 
> That just doesn't jive.
> 
> Mark Miller wrote:
> > Notice that while DisjunctionScorer.advance and
> > DisjuntionScorer.advanceAfterCurrent appear to be called
> > in 2.9, in 2.4, I am only seeing DisjuntionScorer.advanceAfterCurrent
> > called.
> >
> > Can someone explain that?
> >
> > Mark Miller wrote:
> >
> >> Something is very odd about this if they both cover the same search and
> >> the environ for both is identical. Even if one search was done twice,
> >> and we divide the numbers for the new api by 2 - its still *very* odd.
> >>
> >> With 2.4, ScorerDocQueue.topDoc is called half a million times.
> >> With 2.9, its called over 4 million times.
> >>
> >> Huh?
> >>
> >> Thomas Becker wrote:
> >>
> >>
> >>> No it's only a single segment. But two calls. One doing a getHitsCount
> first and
> >>> the other doing the actual search. I'll paste both methods below if
> someone's
> >>> interested.
> >>>
> >>> Will dig into lucene's sources and compare 2.4 search behaviour for my
> case with
> >>> 2.9 tomorrow. It was about time to get more into lucene-core sources
> anyhow. :)
> >>>
> >>> See you tomorrow guys and thanks a lot again! It's a pleasure.
> >>>
> >>> 	public int getHitsCount(String query, Filter filter) throws
> >>> LuceneServiceException {
> >>> 		log.debug("getHitsCount('{}, {}')", query, filter);
> >>> 		if (StringUtils.isBlank(query)) {
> >>> 			log.warn("getHitsCount: empty lucene query");
> >>> 			return 0;
> >>> 		}
> >>> 		long startTimeMillis = System.currentTimeMillis();
> >>> 		int count = 0;
> >>>
> >>> 		if (indexSearcher == null) {
> >>> 			return 0;
> >>> 		}
> >>>
> >>> 		BooleanQuery.setMaxClauseCount(MAXCLAUSECOUNT);
> >>> 		Query q = null;
> >>> 		try {
> >>> 			q = createQuery(query);
> >>> 			TopScoreDocCollector tsdc =
> TopScoreDocCollector.create(1, true);
> >>> 			indexSearcher.search(q, filter, tsdc);
> >>> 			count = tsdc.getTotalHits();
> >>> 			log.info("getHitsCount: count = {}",count);
> >>> 		} catch (ParseException ex) {
> >>> 			throw new LuceneServiceException("invalid lucene
query:"
> + query, ex);
> >>> 		} catch (IOException e) {
> >>> 			throw new LuceneServiceException(" indexSearcher
could
> be corrupted", e);
> >>> 		} finally {
> >>> 			long durationMillis = System.currentTimeMillis() -
> startTimeMillis;
> >>> 			if (durationMillis > slowQueryLimit) {
> >>> 				log.warn("getHitsCount: Slow query: {} ms,
> query={}", durationMillis, query);
> >>> 			}
> >>> 			log.debug("getHitsCount: query took {} ms",
> durationMillis);
> >>> 		}
> >>> 		return count;
> >>> 	}
> >>>
> >>> 	public List<Document> search(String query, Filter filter, Sort sort,
> int from,
> >>> int size) throws LuceneServiceException {
> >>> 		log.debug("{} search('{}', {}, {}, {}, {})", new Object[] {
> indexAlias, query,
> >>> filter, sort, from, size });
> >>> 		long startTimeMillis = System.currentTimeMillis();
> >>>
> >>> 		List<Document> docs = new ArrayList<Document>();
> >>> 		if (indexSearcher == null) {
> >>> 			return docs;
> >>> 		}
> >>> 		Query q = null;
> >>> 		try {
> >>> 			if (query == null) {
> >>> 				log.warn("search: lucene query is null...");
> >>> 				return docs;
> >>> 			}
> >>> 			q = createQuery(query);
> >>> 			BooleanQuery.setMaxClauseCount(MAXCLAUSECOUNT);
> >>> 			if (size < 0 || size > maxNumHits) {
> >>> 				// set hard limit for numHits
> >>> 				size = maxNumHits;
> >>> 				if (log.isDebugEnabled())
> >>> 					log.debug("search: Size set to
hardlimit: {}
> for query: {} with filter:
> >>> {}", new Object[] { size, query, filter });
> >>> 			}
> >>> 			TopFieldCollector collector =
> TopFieldCollector.create(sort, size + from,
> >>> true, false, false, true);
> >>> 			indexSearcher.search(q, filter, collector);
> >>> 			if(size > collector.getTotalHits())
> >>> 				size = collector.getTotalHits();
> >>> 			if (size > 100000)
> >>> 				log.info("search: size: {} bigger than
100.000 for
> query: {} with filter:
> >>> {}", new Object[] { size, query, filter });
> >>> 			TopDocs td = collector.topDocs(from, size);
> >>> 			ScoreDoc[] scoreDocs = td.scoreDocs;
> >>> 			for (ScoreDoc scoreDoc : scoreDocs) {
> >>> 				docs.add(indexSearcher.doc(scoreDoc.doc));
> >>> 			}
> >>> 		} catch (ParseException e) {
> >>> 			log.warn("search: ParseException: {}",
e.getMessage());
> >>> 			if (log.isDebugEnabled())
> >>> 				log.warn("search: ParseException: ", e);
> >>> 			return Collections.emptyList();
> >>> 		} catch (IOException e) {
> >>> 			log.warn("search: IOException: ", e);
> >>> 			return Collections.emptyList();
> >>> 		} finally {
> >>> 			long durationMillis = System.currentTimeMillis() -
> startTimeMillis;
> >>> 			if (durationMillis > slowQueryLimit) {
> >>> 				log.warn("search: Slow query: {} ms,
query={},
> indexUsed={}",
> >>> 						new Object[] {
durationMillis, query,
> >>> indexSearcher.getIndexReader().directory() });
> >>> 			}
> >>> 			log.debug("search: query took {} ms",
durationMillis);
> >>> 		}
> >>> 		return docs;
> >>> 	}
> >>>
> >>>
> >>> Uwe Schindler wrote:
> >>>
> >>>
> >>>
> >>>>>> http://ankeschwarzer.de/tmp/lucene_29_newapi_mmap_singlereq.png
> >>>>>>
> >>>>>> Have to verify that the last one is not by accident more than one
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>> request.
> >>>>>
> >>>>>
> >>>>>
> >>>>>> Will
> >>>>>> do the run again and then post the required info.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>> The last figure shows, that IndexSearcher.searchWithFilter was
> called
> >>>>> twice
> >>>>> in contrast to the first figure, where IndexSearcher.search was
> called
> >>>>> only
> >>>>> once.
> >>>>>
> >>>>>
> >>>>>
> >>>> I forgot, searchWithFilter it is called per segment in 2.9. If it was
> only
> >>>> one search, you must have two segments and therefore no optimized
> index for
> >>>> this to be correct?
> >>>>
> >>>> Uwe
> >>>>
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>>
> >>>>
> >>>>
> >>>>
> >>>
> >>>
> >>>
> >>
> >>
> >
> >
> >
> 
> 
> --
> - Mark
> 
> http://www.lucidimagination.com
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: lucene 2.9.0RC4 slower than 2.4.1?

Posted by Mark Miller <ma...@gmail.com>.

Nevermind. I see advance wasn't around in 2.4. This is part of the
DocIdSetIterator changes.

Anyway - either these are just not comparable runs, or there is a major
bug (which seems unlikely).

Just to keep pointing out the obvious:

2.4 calls doc 195,000 times
2.9 calls docId 1.4 million times

That just doesn't jive.

Mark Miller wrote:
> Notice that while DisjunctionScorer.advance and
> DisjuntionScorer.advanceAfterCurrent appear to be called
> in 2.9, in 2.4, I am only seeing DisjuntionScorer.advanceAfterCurrent
> called.
>
> Can someone explain that?
>
> Mark Miller wrote:
>   
>> Something is very odd about this if they both cover the same search and
>> the environ for both is identical. Even if one search was done twice,
>> and we divide the numbers for the new api by 2 - its still *very* odd.
>>
>> With 2.4, ScorerDocQueue.topDoc is called half a million times.
>> With 2.9, its called over 4 million times.
>>
>> Huh?
>>
>> Thomas Becker wrote:
>>   
>>     
>>> No it's only a single segment. But two calls. One doing a getHitsCount first and
>>> the other doing the actual search. I'll paste both methods below if someone's
>>> interested.
>>>
>>> Will dig into lucene's sources and compare 2.4 search behaviour for my case with
>>> 2.9 tomorrow. It was about time to get more into lucene-core sources anyhow. :)
>>>
>>> See you tomorrow guys and thanks a lot again! It's a pleasure.
>>>
>>> 	public int getHitsCount(String query, Filter filter) throws
>>> LuceneServiceException {
>>> 		log.debug("getHitsCount('{}, {}')", query, filter);
>>> 		if (StringUtils.isBlank(query)) {
>>> 			log.warn("getHitsCount: empty lucene query");
>>> 			return 0;
>>> 		}
>>> 		long startTimeMillis = System.currentTimeMillis();
>>> 		int count = 0;
>>>
>>> 		if (indexSearcher == null) {
>>> 			return 0;
>>> 		}
>>>
>>> 		BooleanQuery.setMaxClauseCount(MAXCLAUSECOUNT);
>>> 		Query q = null;
>>> 		try {
>>> 			q = createQuery(query);
>>> 			TopScoreDocCollector tsdc = TopScoreDocCollector.create(1, true);
>>> 			indexSearcher.search(q, filter, tsdc);
>>> 			count = tsdc.getTotalHits();
>>> 			log.info("getHitsCount: count = {}",count);
>>> 		} catch (ParseException ex) {
>>> 			throw new LuceneServiceException("invalid lucene query:" + query, ex);
>>> 		} catch (IOException e) {
>>> 			throw new LuceneServiceException(" indexSearcher could be corrupted", e);
>>> 		} finally {
>>> 			long durationMillis = System.currentTimeMillis() - startTimeMillis;
>>> 			if (durationMillis > slowQueryLimit) {
>>> 				log.warn("getHitsCount: Slow query: {} ms, query={}", durationMillis, query);
>>> 			}
>>> 			log.debug("getHitsCount: query took {} ms", durationMillis);
>>> 		}
>>> 		return count;
>>> 	}
>>>
>>> 	public List<Document> search(String query, Filter filter, Sort sort, int from,
>>> int size) throws LuceneServiceException {
>>> 		log.debug("{} search('{}', {}, {}, {}, {})", new Object[] { indexAlias, query,
>>> filter, sort, from, size });
>>> 		long startTimeMillis = System.currentTimeMillis();
>>>
>>> 		List<Document> docs = new ArrayList<Document>();
>>> 		if (indexSearcher == null) {
>>> 			return docs;
>>> 		}
>>> 		Query q = null;
>>> 		try {
>>> 			if (query == null) {
>>> 				log.warn("search: lucene query is null...");
>>> 				return docs;
>>> 			}
>>> 			q = createQuery(query);
>>> 			BooleanQuery.setMaxClauseCount(MAXCLAUSECOUNT);
>>> 			if (size < 0 || size > maxNumHits) {
>>> 				// set hard limit for numHits
>>> 				size = maxNumHits;
>>> 				if (log.isDebugEnabled())
>>> 					log.debug("search: Size set to hardlimit: {} for query: {} with filter:
>>> {}", new Object[] { size, query, filter });
>>> 			}
>>> 			TopFieldCollector collector = TopFieldCollector.create(sort, size + from,
>>> true, false, false, true);
>>> 			indexSearcher.search(q, filter, collector);
>>> 			if(size > collector.getTotalHits())
>>> 				size = collector.getTotalHits();
>>> 			if (size > 100000)
>>> 				log.info("search: size: {} bigger than 100.000 for query: {} with filter:
>>> {}", new Object[] { size, query, filter });
>>> 			TopDocs td = collector.topDocs(from, size);
>>> 			ScoreDoc[] scoreDocs = td.scoreDocs;
>>> 			for (ScoreDoc scoreDoc : scoreDocs) {
>>> 				docs.add(indexSearcher.doc(scoreDoc.doc));
>>> 			}
>>> 		} catch (ParseException e) {
>>> 			log.warn("search: ParseException: {}", e.getMessage());
>>> 			if (log.isDebugEnabled())
>>> 				log.warn("search: ParseException: ", e);
>>> 			return Collections.emptyList();
>>> 		} catch (IOException e) {
>>> 			log.warn("search: IOException: ", e);
>>> 			return Collections.emptyList();
>>> 		} finally {
>>> 			long durationMillis = System.currentTimeMillis() - startTimeMillis;
>>> 			if (durationMillis > slowQueryLimit) {
>>> 				log.warn("search: Slow query: {} ms, query={}, indexUsed={}",
>>> 						new Object[] { durationMillis, query,
>>> indexSearcher.getIndexReader().directory() });
>>> 			}
>>> 			log.debug("search: query took {} ms", durationMillis);
>>> 		}
>>> 		return docs;
>>> 	}
>>>
>>>
>>> Uwe Schindler wrote:
>>>   
>>>     
>>>       
>>>>>> http://ankeschwarzer.de/tmp/lucene_29_newapi_mmap_singlereq.png
>>>>>>
>>>>>> Have to verify that the last one is not by accident more than one
>>>>>>         
>>>>>>           
>>>>>>             
>>>>> request.
>>>>>       
>>>>>         
>>>>>           
>>>>>> Will
>>>>>> do the run again and then post the required info.
>>>>>>         
>>>>>>           
>>>>>>             
>>>>> The last figure shows, that IndexSearcher.searchWithFilter was called
>>>>> twice
>>>>> in contrast to the first figure, where IndexSearcher.search was called
>>>>> only
>>>>> once.
>>>>>       
>>>>>         
>>>>>           
>>>> I forgot, searchWithFilter it is called per segment in 2.9. If it was only
>>>> one search, you must have two segments and therefore no optimized index for
>>>> this to be correct?
>>>>
>>>> Uwe
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>     
>>>>       
>>>>         
>>>   
>>>     
>>>       
>>   
>>     
>
>
>   


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: lucene 2.9.0RC4 slower than 2.4.1?

Posted by Mark Miller <ma...@gmail.com>.

Notice that while DisjunctionScorer.advance and
DisjuntionScorer.advanceAfterCurrent appear to be called
in 2.9, in 2.4, I am only seeing DisjuntionScorer.advanceAfterCurrent
called.

Can someone explain that?

Mark Miller wrote:
> Something is very odd about this if they both cover the same search and
> the environ for both is identical. Even if one search was done twice,
> and we divide the numbers for the new api by 2 - its still *very* odd.
>
> With 2.4, ScorerDocQueue.topDoc is called half a million times.
> With 2.9, its called over 4 million times.
>
> Huh?
>
> Thomas Becker wrote:
>   
>> No it's only a single segment. But two calls. One doing a getHitsCount first and
>> the other doing the actual search. I'll paste both methods below if someone's
>> interested.
>>
>> Will dig into lucene's sources and compare 2.4 search behaviour for my case with
>> 2.9 tomorrow. It was about time to get more into lucene-core sources anyhow. :)
>>
>> See you tomorrow guys and thanks a lot again! It's a pleasure.
>>
>> 	public int getHitsCount(String query, Filter filter) throws
>> LuceneServiceException {
>> 		log.debug("getHitsCount('{}, {}')", query, filter);
>> 		if (StringUtils.isBlank(query)) {
>> 			log.warn("getHitsCount: empty lucene query");
>> 			return 0;
>> 		}
>> 		long startTimeMillis = System.currentTimeMillis();
>> 		int count = 0;
>>
>> 		if (indexSearcher == null) {
>> 			return 0;
>> 		}
>>
>> 		BooleanQuery.setMaxClauseCount(MAXCLAUSECOUNT);
>> 		Query q = null;
>> 		try {
>> 			q = createQuery(query);
>> 			TopScoreDocCollector tsdc = TopScoreDocCollector.create(1, true);
>> 			indexSearcher.search(q, filter, tsdc);
>> 			count = tsdc.getTotalHits();
>> 			log.info("getHitsCount: count = {}",count);
>> 		} catch (ParseException ex) {
>> 			throw new LuceneServiceException("invalid lucene query:" + query, ex);
>> 		} catch (IOException e) {
>> 			throw new LuceneServiceException(" indexSearcher could be corrupted", e);
>> 		} finally {
>> 			long durationMillis = System.currentTimeMillis() - startTimeMillis;
>> 			if (durationMillis > slowQueryLimit) {
>> 				log.warn("getHitsCount: Slow query: {} ms, query={}", durationMillis, query);
>> 			}
>> 			log.debug("getHitsCount: query took {} ms", durationMillis);
>> 		}
>> 		return count;
>> 	}
>>
>> 	public List<Document> search(String query, Filter filter, Sort sort, int from,
>> int size) throws LuceneServiceException {
>> 		log.debug("{} search('{}', {}, {}, {}, {})", new Object[] { indexAlias, query,
>> filter, sort, from, size });
>> 		long startTimeMillis = System.currentTimeMillis();
>>
>> 		List<Document> docs = new ArrayList<Document>();
>> 		if (indexSearcher == null) {
>> 			return docs;
>> 		}
>> 		Query q = null;
>> 		try {
>> 			if (query == null) {
>> 				log.warn("search: lucene query is null...");
>> 				return docs;
>> 			}
>> 			q = createQuery(query);
>> 			BooleanQuery.setMaxClauseCount(MAXCLAUSECOUNT);
>> 			if (size < 0 || size > maxNumHits) {
>> 				// set hard limit for numHits
>> 				size = maxNumHits;
>> 				if (log.isDebugEnabled())
>> 					log.debug("search: Size set to hardlimit: {} for query: {} with filter:
>> {}", new Object[] { size, query, filter });
>> 			}
>> 			TopFieldCollector collector = TopFieldCollector.create(sort, size + from,
>> true, false, false, true);
>> 			indexSearcher.search(q, filter, collector);
>> 			if(size > collector.getTotalHits())
>> 				size = collector.getTotalHits();
>> 			if (size > 100000)
>> 				log.info("search: size: {} bigger than 100.000 for query: {} with filter:
>> {}", new Object[] { size, query, filter });
>> 			TopDocs td = collector.topDocs(from, size);
>> 			ScoreDoc[] scoreDocs = td.scoreDocs;
>> 			for (ScoreDoc scoreDoc : scoreDocs) {
>> 				docs.add(indexSearcher.doc(scoreDoc.doc));
>> 			}
>> 		} catch (ParseException e) {
>> 			log.warn("search: ParseException: {}", e.getMessage());
>> 			if (log.isDebugEnabled())
>> 				log.warn("search: ParseException: ", e);
>> 			return Collections.emptyList();
>> 		} catch (IOException e) {
>> 			log.warn("search: IOException: ", e);
>> 			return Collections.emptyList();
>> 		} finally {
>> 			long durationMillis = System.currentTimeMillis() - startTimeMillis;
>> 			if (durationMillis > slowQueryLimit) {
>> 				log.warn("search: Slow query: {} ms, query={}, indexUsed={}",
>> 						new Object[] { durationMillis, query,
>> indexSearcher.getIndexReader().directory() });
>> 			}
>> 			log.debug("search: query took {} ms", durationMillis);
>> 		}
>> 		return docs;
>> 	}
>>
>>
>> Uwe Schindler wrote:
>>   
>>     
>>>>> http://ankeschwarzer.de/tmp/lucene_29_newapi_mmap_singlereq.png
>>>>>
>>>>> Have to verify that the last one is not by accident more than one
>>>>>         
>>>>>           
>>>> request.
>>>>       
>>>>         
>>>>> Will
>>>>> do the run again and then post the required info.
>>>>>         
>>>>>           
>>>> The last figure shows, that IndexSearcher.searchWithFilter was called
>>>> twice
>>>> in contrast to the first figure, where IndexSearcher.search was called
>>>> only
>>>> once.
>>>>       
>>>>         
>>> I forgot, searchWithFilter it is called per segment in 2.9. If it was only
>>> one search, you must have two segments and therefore no optimized index for
>>> this to be correct?
>>>
>>> Uwe
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>     
>>>       
>>   
>>     
>
>
>   


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: lucene 2.9.0RC4 slower than 2.4.1?

Posted by Mark Miller <ma...@gmail.com>.

Something is very odd about this if they both cover the same search and
the environ for both is identical. Even if one search was done twice,
and we divide the numbers for the new api by 2 - its still *very* odd.

With 2.4, ScorerDocQueue.topDoc is called half a million times.
With 2.9, its called over 4 million times.

Huh?

Thomas Becker wrote:
> No it's only a single segment. But two calls. One doing a getHitsCount first and
> the other doing the actual search. I'll paste both methods below if someone's
> interested.
>
> Will dig into lucene's sources and compare 2.4 search behaviour for my case with
> 2.9 tomorrow. It was about time to get more into lucene-core sources anyhow. :)
>
> See you tomorrow guys and thanks a lot again! It's a pleasure.
>
> 	public int getHitsCount(String query, Filter filter) throws
> LuceneServiceException {
> 		log.debug("getHitsCount('{}, {}')", query, filter);
> 		if (StringUtils.isBlank(query)) {
> 			log.warn("getHitsCount: empty lucene query");
> 			return 0;
> 		}
> 		long startTimeMillis = System.currentTimeMillis();
> 		int count = 0;
>
> 		if (indexSearcher == null) {
> 			return 0;
> 		}
>
> 		BooleanQuery.setMaxClauseCount(MAXCLAUSECOUNT);
> 		Query q = null;
> 		try {
> 			q = createQuery(query);
> 			TopScoreDocCollector tsdc = TopScoreDocCollector.create(1, true);
> 			indexSearcher.search(q, filter, tsdc);
> 			count = tsdc.getTotalHits();
> 			log.info("getHitsCount: count = {}",count);
> 		} catch (ParseException ex) {
> 			throw new LuceneServiceException("invalid lucene query:" + query, ex);
> 		} catch (IOException e) {
> 			throw new LuceneServiceException(" indexSearcher could be corrupted", e);
> 		} finally {
> 			long durationMillis = System.currentTimeMillis() - startTimeMillis;
> 			if (durationMillis > slowQueryLimit) {
> 				log.warn("getHitsCount: Slow query: {} ms, query={}", durationMillis, query);
> 			}
> 			log.debug("getHitsCount: query took {} ms", durationMillis);
> 		}
> 		return count;
> 	}
>
> 	public List<Document> search(String query, Filter filter, Sort sort, int from,
> int size) throws LuceneServiceException {
> 		log.debug("{} search('{}', {}, {}, {}, {})", new Object[] { indexAlias, query,
> filter, sort, from, size });
> 		long startTimeMillis = System.currentTimeMillis();
>
> 		List<Document> docs = new ArrayList<Document>();
> 		if (indexSearcher == null) {
> 			return docs;
> 		}
> 		Query q = null;
> 		try {
> 			if (query == null) {
> 				log.warn("search: lucene query is null...");
> 				return docs;
> 			}
> 			q = createQuery(query);
> 			BooleanQuery.setMaxClauseCount(MAXCLAUSECOUNT);
> 			if (size < 0 || size > maxNumHits) {
> 				// set hard limit for numHits
> 				size = maxNumHits;
> 				if (log.isDebugEnabled())
> 					log.debug("search: Size set to hardlimit: {} for query: {} with filter:
> {}", new Object[] { size, query, filter });
> 			}
> 			TopFieldCollector collector = TopFieldCollector.create(sort, size + from,
> true, false, false, true);
> 			indexSearcher.search(q, filter, collector);
> 			if(size > collector.getTotalHits())
> 				size = collector.getTotalHits();
> 			if (size > 100000)
> 				log.info("search: size: {} bigger than 100.000 for query: {} with filter:
> {}", new Object[] { size, query, filter });
> 			TopDocs td = collector.topDocs(from, size);
> 			ScoreDoc[] scoreDocs = td.scoreDocs;
> 			for (ScoreDoc scoreDoc : scoreDocs) {
> 				docs.add(indexSearcher.doc(scoreDoc.doc));
> 			}
> 		} catch (ParseException e) {
> 			log.warn("search: ParseException: {}", e.getMessage());
> 			if (log.isDebugEnabled())
> 				log.warn("search: ParseException: ", e);
> 			return Collections.emptyList();
> 		} catch (IOException e) {
> 			log.warn("search: IOException: ", e);
> 			return Collections.emptyList();
> 		} finally {
> 			long durationMillis = System.currentTimeMillis() - startTimeMillis;
> 			if (durationMillis > slowQueryLimit) {
> 				log.warn("search: Slow query: {} ms, query={}, indexUsed={}",
> 						new Object[] { durationMillis, query,
> indexSearcher.getIndexReader().directory() });
> 			}
> 			log.debug("search: query took {} ms", durationMillis);
> 		}
> 		return docs;
> 	}
>
>
> Uwe Schindler wrote:
>   
>>>> http://ankeschwarzer.de/tmp/lucene_29_newapi_mmap_singlereq.png
>>>>
>>>> Have to verify that the last one is not by accident more than one
>>>>         
>>> request.
>>>       
>>>> Will
>>>> do the run again and then post the required info.
>>>>         
>>> The last figure shows, that IndexSearcher.searchWithFilter was called
>>> twice
>>> in contrast to the first figure, where IndexSearcher.search was called
>>> only
>>> once.
>>>       
>> I forgot, searchWithFilter it is called per segment in 2.9. If it was only
>> one search, you must have two segments and therefore no optimized index for
>> this to be correct?
>>
>> Uwe
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>     
>
>   


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: lucene 2.9.0RC4 slower than 2.4.1?

Posted by Thomas Becker <th...@net-m.de>.

No it's only a single segment. But two calls. One doing a getHitsCount first and
the other doing the actual search. I'll paste both methods below if someone's
interested.

Will dig into lucene's sources and compare 2.4 search behaviour for my case with
2.9 tomorrow. It was about time to get more into lucene-core sources anyhow. :)

See you tomorrow guys and thanks a lot again! It's a pleasure.

	public int getHitsCount(String query, Filter filter) throws
LuceneServiceException {
		log.debug("getHitsCount('{}, {}')", query, filter);
		if (StringUtils.isBlank(query)) {
			log.warn("getHitsCount: empty lucene query");
			return 0;
		}
		long startTimeMillis = System.currentTimeMillis();
		int count = 0;

		if (indexSearcher == null) {
			return 0;
		}

		BooleanQuery.setMaxClauseCount(MAXCLAUSECOUNT);
		Query q = null;
		try {
			q = createQuery(query);
			TopScoreDocCollector tsdc = TopScoreDocCollector.create(1, true);
			indexSearcher.search(q, filter, tsdc);
			count = tsdc.getTotalHits();
			log.info("getHitsCount: count = {}",count);
		} catch (ParseException ex) {
			throw new LuceneServiceException("invalid lucene query:" + query, ex);
		} catch (IOException e) {
			throw new LuceneServiceException(" indexSearcher could be corrupted", e);
		} finally {
			long durationMillis = System.currentTimeMillis() - startTimeMillis;
			if (durationMillis > slowQueryLimit) {
				log.warn("getHitsCount: Slow query: {} ms, query={}", durationMillis, query);
			}
			log.debug("getHitsCount: query took {} ms", durationMillis);
		}
		return count;
	}

	public List<Document> search(String query, Filter filter, Sort sort, int from,
int size) throws LuceneServiceException {
		log.debug("{} search('{}', {}, {}, {}, {})", new Object[] { indexAlias, query,
filter, sort, from, size });
		long startTimeMillis = System.currentTimeMillis();

		List<Document> docs = new ArrayList<Document>();
		if (indexSearcher == null) {
			return docs;
		}
		Query q = null;
		try {
			if (query == null) {
				log.warn("search: lucene query is null...");
				return docs;
			}
			q = createQuery(query);
			BooleanQuery.setMaxClauseCount(MAXCLAUSECOUNT);
			if (size < 0 || size > maxNumHits) {
				// set hard limit for numHits
				size = maxNumHits;
				if (log.isDebugEnabled())
					log.debug("search: Size set to hardlimit: {} for query: {} with filter:
{}", new Object[] { size, query, filter });
			}
			TopFieldCollector collector = TopFieldCollector.create(sort, size + from,
true, false, false, true);
			indexSearcher.search(q, filter, collector);
			if(size > collector.getTotalHits())
				size = collector.getTotalHits();
			if (size > 100000)
				log.info("search: size: {} bigger than 100.000 for query: {} with filter:
{}", new Object[] { size, query, filter });
			TopDocs td = collector.topDocs(from, size);
			ScoreDoc[] scoreDocs = td.scoreDocs;
			for (ScoreDoc scoreDoc : scoreDocs) {
				docs.add(indexSearcher.doc(scoreDoc.doc));
			}
		} catch (ParseException e) {
			log.warn("search: ParseException: {}", e.getMessage());
			if (log.isDebugEnabled())
				log.warn("search: ParseException: ", e);
			return Collections.emptyList();
		} catch (IOException e) {
			log.warn("search: IOException: ", e);
			return Collections.emptyList();
		} finally {
			long durationMillis = System.currentTimeMillis() - startTimeMillis;
			if (durationMillis > slowQueryLimit) {
				log.warn("search: Slow query: {} ms, query={}, indexUsed={}",
						new Object[] { durationMillis, query,
indexSearcher.getIndexReader().directory() });
			}
			log.debug("search: query took {} ms", durationMillis);
		}
		return docs;
	}


Uwe Schindler wrote:
>>> http://ankeschwarzer.de/tmp/lucene_29_newapi_mmap_singlereq.png
>>>
>>> Have to verify that the last one is not by accident more than one
>> request.
>>> Will
>>> do the run again and then post the required info.
>> The last figure shows, that IndexSearcher.searchWithFilter was called
>> twice
>> in contrast to the first figure, where IndexSearcher.search was called
>> only
>> once.
> 
> I forgot, searchWithFilter it is called per segment in 2.9. If it was only
> one search, you must have two segments and therefore no optimized index for
> this to be correct?
> 
> Uwe
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 

-- 
Thomas Becker
Senior JEE Developer

net mobile AG
Zollhof 17
40221 Düsseldorf
GERMANY

Phone:    +49 211 97020-195
Fax:      +49 211 97020-949
Mobile:   +49 173 5146567 (private)
E-Mail:   mailto:thomas.becker@net-m.de
Internet: http://www.net-m.de

Registergericht:  Amtsgericht Düsseldorf, HRB 48022
Vorstand:         Theodor Niehues (Vorsitzender), Frank Hartmann,
                 Kai Markus Kulas, Dieter Plassmann
Vorsitzender des
Aufsichtsrates:   Dr. Michael Briem

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: lucene 2.9.0RC4 slower than 2.4.1?

Posted by Uwe Schindler <uw...@thetaphi.de>.

> > http://ankeschwarzer.de/tmp/lucene_29_newapi_mmap_singlereq.png
> >
> > Have to verify that the last one is not by accident more than one
> request.
> > Will
> > do the run again and then post the required info.
> 
> The last figure shows, that IndexSearcher.searchWithFilter was called
> twice
> in contrast to the first figure, where IndexSearcher.search was called
> only
> once.

I forgot, searchWithFilter it is called per segment in 2.9. If it was only
one search, you must have two segments and therefore no optimized index for
this to be correct?

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: lucene 2.9.0RC4 slower than 2.4.1?

Posted by Mark Miller <ma...@gmail.com>.

Ah - that explains a bit. Though if you divide by 2, the new one still
appears to overcall each method
in comparison to 2.4.

- Mark

Uwe Schindler wrote:
>> http://ankeschwarzer.de/tmp/lucene_29_newapi_mmap_singlereq.png
>>
>> Have to verify that the last one is not by accident more than one request.
>> Will
>> do the run again and then post the required info.
>>     
>
> The last figure shows, that IndexSearcher.searchWithFilter was called twice
> in contrast to the first figure, where IndexSearcher.search was called only
> once.
>
> Uwe
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>   




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: lucene 2.9.0RC4 slower than 2.4.1?

Posted by Uwe Schindler <uw...@thetaphi.de>.

> http://ankeschwarzer.de/tmp/lucene_29_newapi_mmap_singlereq.png
> 
> Have to verify that the last one is not by accident more than one request.
> Will
> do the run again and then post the required info.

The last figure shows, that IndexSearcher.searchWithFilter was called twice
in contrast to the first figure, where IndexSearcher.search was called only
once.

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: lucene 2.9.0RC4 slower than 2.4.1?

Posted by Thomas Becker <th...@net-m.de>.

New Profiling sessions with invocation counts. A single lucene search request
with huge resultset (169k items).

Quite interesting results though and there's definetly something wrong with luc
2.9 and the way I'm using it. But see yourself:

http://ankeschwarzer.de/tmp/lucene_24_oldapi_singlereq.png
http://ankeschwarzer.de/tmp/lucene_29_oldapi_singlereq.png
and even worse:
http://ankeschwarzer.de/tmp/lucene_29_newapi_mmap_singlereq.png

Have to verify that the last one is not by accident more than one request. Will
do the run again and then post the required info.

Mark Miller wrote:
> bq. I'll do some profiling now again and let you know the results.
> 
> Great - it will be interesting to see the results. My guess, based on
> the 2.9 new api profiling, is that your queries may not be agreeing with
> some of the changes somehow. Along with the profiling, can you fill us
> in on the query types you are using as well? (eg qualities)
> 
> And grab invocations if its possible.
> 

-- 
Thomas Becker
Senior JEE Developer

net mobile AG
Zollhof 17
40221 Düsseldorf
GERMANY

Phone:    +49 211 97020-195
Fax:      +49 211 97020-949
Mobile:   +49 173 5146567 (private)
E-Mail:   mailto:thomas.becker@net-m.de
Internet: http://www.net-m.de

Registergericht:  Amtsgericht Düsseldorf, HRB 48022
Vorstand:         Theodor Niehues (Vorsitzender), Frank Hartmann,
                 Kai Markus Kulas, Dieter Plassmann
Vorsitzender des
Aufsichtsrates:   Dr. Michael Briem

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: lucene 2.9.0RC4 slower than 2.4.1?

Posted by Mark Miller <ma...@gmail.com>.

bq. I'll do some profiling now again and let you know the results.

Great - it will be interesting to see the results. My guess, based on
the 2.9 new api profiling, is that your queries may not be agreeing with
some of the changes somehow. Along with the profiling, can you fill us
in on the query types you are using as well? (eg qualities)

And grab invocations if its possible.

-- 
- Mark

http://www.lucidimagination.com



Thomas Becker wrote:
> Tests run on tmpfs:
> config: impl=SeparateFile serial=false nThreads=4 iterations=100 bufsize=1024
> poolsize=2 filelen=18920301
> answer=850258539, ms=8090, MB/sec=935.4907787391842
> config: impl=ChannelFile serial=false nThreads=4 iterations=100 bufsize=1024
> poolsize=2 filelen=18920301
> answer=850258903, ms=39444, MB/sec=191.8700030422878
> config: impl=ChannelPread serial=false nThreads=4 iterations=100 bufsize=1024
> poolsize=2 filelen=18920301
> answer=850258903, ms=8504, MB/sec=889.9483066792098
> config: impl=PooledPread serial=false nThreads=4 iterations=100 bufsize=1024
> poolsize=2 filelen=18920301
> answer=850258903, ms=9585, MB/sec=789.5795931142409
>
> I did run some tests now with SimpleFSDirectory and MMapDirectory. Both are
> faster than NIOFS and the response times improved. But it's still slower than 2.4.
>
> I'll do some profiling now again and let you know the results.
>
> Thanks again for all the great support to all who've answered.
>
>
> Mark Miller wrote:
>   
>> Can you run the following test on your RAMDISK?
>>
>> http://people.apache.org/~markrmiller/FileReadTest.java
>>
>> I've taken it from the following issue (in which NIOFSDirectory was
>> developed):
>> https://issues.apache.org/jira/browse/LUCENE-753
>>
>>     
>
>   




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: lucene 2.9.0RC4 slower than 2.4.1?

Posted by Thomas Becker <th...@net-m.de>.

Tests run on tmpfs:
config: impl=SeparateFile serial=false nThreads=4 iterations=100 bufsize=1024
poolsize=2 filelen=18920301
answer=850258539, ms=8090, MB/sec=935.4907787391842
config: impl=ChannelFile serial=false nThreads=4 iterations=100 bufsize=1024
poolsize=2 filelen=18920301
answer=850258903, ms=39444, MB/sec=191.8700030422878
config: impl=ChannelPread serial=false nThreads=4 iterations=100 bufsize=1024
poolsize=2 filelen=18920301
answer=850258903, ms=8504, MB/sec=889.9483066792098
config: impl=PooledPread serial=false nThreads=4 iterations=100 bufsize=1024
poolsize=2 filelen=18920301
answer=850258903, ms=9585, MB/sec=789.5795931142409

I did run some tests now with SimpleFSDirectory and MMapDirectory. Both are
faster than NIOFS and the response times improved. But it's still slower than 2.4.

I'll do some profiling now again and let you know the results.

Thanks again for all the great support to all who've answered.


Mark Miller wrote:
> Can you run the following test on your RAMDISK?
> 
> http://people.apache.org/~markrmiller/FileReadTest.java
> 
> I've taken it from the following issue (in which NIOFSDirectory was
> developed):
> https://issues.apache.org/jira/browse/LUCENE-753
> 

-- 
Thomas Becker
Senior JEE Developer

net mobile AG
Zollhof 17
40221 Düsseldorf
GERMANY

Phone:    +49 211 97020-195
Fax:      +49 211 97020-949
Mobile:   +49 173 5146567 (private)
E-Mail:   mailto:thomas.becker@net-m.de
Internet: http://www.net-m.de

Registergericht:  Amtsgericht Düsseldorf, HRB 48022
Vorstand:         Theodor Niehues (Vorsitzender), Frank Hartmann,
                 Kai Markus Kulas, Dieter Plassmann
Vorsitzender des
Aufsichtsrates:   Dr. Michael Briem

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: lucene 2.9.0RC4 slower than 2.4.1?

Posted by Thomas Becker <th...@net-m.de>.

Will do, tomorrow.

Mark Miller wrote:
> Can you run the following test on your RAMDISK?
> 
> http://people.apache.org/~markrmiller/FileReadTest.java
> 
> I've taken it from the following issue (in which NIOFSDirectory was
> developed):
> https://issues.apache.org/jira/browse/LUCENE-753
> 

-- 
Thomas Becker
Senior JEE Developer

net mobile AG
Zollhof 17
40221 Düsseldorf
GERMANY

Phone:    +49 211 97020-195
Fax:      +49 211 97020-949
Mobile:   +49 173 5146567 (private)
E-Mail:   mailto:thomas.becker@net-m.de
Internet: http://www.net-m.de

Registergericht:  Amtsgericht Düsseldorf, HRB 48022
Vorstand:         Theodor Niehues (Vorsitzender), Frank Hartmann,
                 Kai Markus Kulas, Dieter Plassmann
Vorsitzender des
Aufsichtsrates:   Dr. Michael Briem

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: lucene 2.9.0RC4 slower than 2.4.1?

Posted by Mark Miller <ma...@gmail.com>.

Can you run the following test on your RAMDISK?

http://people.apache.org/~markrmiller/FileReadTest.java

I've taken it from the following issue (in which NIOFSDirectory was
developed):
https://issues.apache.org/jira/browse/LUCENE-753

-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: lucene 2.9.0RC4 slower than 2.4.1?

Posted by Thomas Becker <th...@net-m.de>.

Hi Uwe,

already done. See my last message.

Cheers,
Thomas

Uwe Schindler wrote:
> On 2.9. NIOFS is only used, if you use FSDirectory.open() instead of
> FSDirectory.getDirectory (Deprecated). Can you compare when you use instead
> of FSDirectory.open() the direct ctor of SimpleFSDir vs. NIOFSDir vs.
> MMapDir and compare?
> 
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
> 
>> -----Original Message-----
>> From: Mark Miller [mailto:markrmiller@gmail.com]
>> Sent: Tuesday, September 15, 2009 5:30 PM
>> To: java-user@lucene.apache.org
>> Subject: Re: lucene 2.9.0RC4 slower than 2.4.1?
>>
>> Thomas Becker wrote:
>>> Hey Mark,
>>>
>>> yes. I'm running the app on unix. You see the difference between 2.9 and
>> 2.4 here:
>>> http://ankeschwarzer.de/tmp/graph.jpg
>>>
>> Right - I know your measurements showed a difference (and will keep that
>> in mind) - but the profiling results then seem
>> oddly similar.
>>
>>> 2.4 responds much quicker thus increasing throughput severly. I'm having
>> a
>>> single segment only:
>>>
>>> -rw-r--r-- 1 asuser asgroup         20 Sep  9 16:40 segments.gen
>>> -rw-r--r-- 1 asuser asgroup        294 Sep  9 16:44 segments_1o
>>> -rw-r--r-- 1 asuser asgroup 2810603184 Sep  9 16:44 _7c.cfs
>>>
>>> The FileChannel.read hotspot is indeed strange. Especially if taking
>> into
>>> account that the index is lying on a tmpfs (in memory). And 2.4 should
>> be using
>>> NIOFSDirectory as well?! Will check that.
>>>
>> 2.4 did not use NIOFSDirectory by default - you would have had to
>> manually specified it. Now its used by default if its detects a non
>> Windows OS.
>>
>> Any particular reason your profiling output does not show invocations?
>> Its not essential most likely, but I've found it to be helpful in
>> comparisons.
>>
>> We are about to release 2.9, and its been such a long haul, I'd hate to
>> see a release with an unknown performance trap.
>>
>> --
>> - Mark
>>
>> http://www.lucidimagination.com
>>
>>
>>> Thanks a lot for your support!
>>>
>>> Cheers,
>>> Thomas
>>>
>>> Mark Miller wrote:
>>>
>>>> A few quick notes -
>>>>
>>>> Lucene 2.9 old api doesn't appear much worse than Lucene 2.4?
>>>>
>>>> You save a lot with the new Intern impl, because thats not a hotspot
>>>> anymore. But then,
>>>> RandomAccessFile seeks end up being a lot more of the pie. They look
>>>> fairly similar in speed overall?
>>>>
>>>> It looks like the major bottleneck with 2.9 new api is that its using
>>>> NIOFSDirectory (your on unix I guess, and it now
>>>> defaults to that on non Windows os's), and that appears to be a real
>>>> killer for you. Its taking half the time for its
>>>> reads.  ???
>>>>
>>>> No conclusions yet, but I'm looking it over. Some other guys will come
>>>> in with some ideas as well.
>>>>
>>>> Do confirm that those profiling results are on a single segment though.
>>>>
>>>> - Mark
>>>>
>>>>
>>>> Mark Miller wrote:
>>>>
>>>>> Thomas Becker wrote:
>>>>>
>>>>>
>>>>>> Here's the results of profiling 10 different search requests:
>>>>>>
>>>>>> http://ankeschwarzer.de/tmp/lucene_24_oldapi.png
>>>>>> http://ankeschwarzer.de/tmp/lucene_29_oldapi.png
>>>>>> http://ankeschwarzer.de/tmp/lucene_29_newapi.png
>>>>>>
>>>>>> But you already gave me a good hint. The index being used is an old
>> one build
>>>>>> with lucene 2.4. I will now try a freshly build 2.9 index and see if
>> performance
>>>>>> improves. Maybe that already solves the issue...stupid me...
>>>>>>
>>>>>>
>>>>>>
>>>>> That shouldn't be an issue unless there is some odd bug.
>>>>>
>>>>>
>>>>>
>>>>>> We're updating the index every 30 min. at the moment and it gets
>> optimized after
>>>>>> each update.
>>>>>>
>>>>>>
>>>>>>
>>>>> So this profiling is on an optimized index (eg a single segment) ?
>>>>> That would be odd indeed, and possibly point to some of the scoring
>> changes.
>>>>>
>>>>>
>>>>>> Mark Miller wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Thomas Becker wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Hey Mark,
>>>>>>>>
>>>>>>>> thanks for your reply. Will do. Results will follow in a couple of
>> minutes.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> Thanks, awesome.
>>>>>>>
>>>>>>> Also, how many segments (approx) are in your index? If there are a
>> lot,
>>>>>>> have you/can you try the same tests on an optimized index? Don't
>> want to
>>>>>>> get ahead of the profiling results, but just to continue the info
>> loop.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 

-- 
Thomas Becker
Senior JEE Developer

net mobile AG
Zollhof 17
40221 Düsseldorf
GERMANY

Phone:    +49 211 97020-195
Fax:      +49 211 97020-949
Mobile:   +49 173 5146567 (private)
E-Mail:   mailto:thomas.becker@net-m.de
Internet: http://www.net-m.de

Registergericht:  Amtsgericht Düsseldorf, HRB 48022
Vorstand:         Theodor Niehues (Vorsitzender), Frank Hartmann,
                 Kai Markus Kulas, Dieter Plassmann
Vorsitzender des
Aufsichtsrates:   Dr. Michael Briem

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: lucene 2.9.0RC4 slower than 2.4.1?

Posted by Uwe Schindler <uw...@thetaphi.de>.

On 2.9. NIOFS is only used, if you use FSDirectory.open() instead of
FSDirectory.getDirectory (Deprecated). Can you compare when you use instead
of FSDirectory.open() the direct ctor of SimpleFSDir vs. NIOFSDir vs.
MMapDir and compare?

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Mark Miller [mailto:markrmiller@gmail.com]
> Sent: Tuesday, September 15, 2009 5:30 PM
> To: java-user@lucene.apache.org
> Subject: Re: lucene 2.9.0RC4 slower than 2.4.1?
> 
> Thomas Becker wrote:
> > Hey Mark,
> >
> > yes. I'm running the app on unix. You see the difference between 2.9 and
> 2.4 here:
> >
> > http://ankeschwarzer.de/tmp/graph.jpg
> >
> Right - I know your measurements showed a difference (and will keep that
> in mind) - but the profiling results then seem
> oddly similar.
> 
> > 2.4 responds much quicker thus increasing throughput severly. I'm having
> a
> > single segment only:
> >
> > -rw-r--r-- 1 asuser asgroup         20 Sep  9 16:40 segments.gen
> > -rw-r--r-- 1 asuser asgroup        294 Sep  9 16:44 segments_1o
> > -rw-r--r-- 1 asuser asgroup 2810603184 Sep  9 16:44 _7c.cfs
> >
> > The FileChannel.read hotspot is indeed strange. Especially if taking
> into
> > account that the index is lying on a tmpfs (in memory). And 2.4 should
> be using
> > NIOFSDirectory as well?! Will check that.
> >
> 2.4 did not use NIOFSDirectory by default - you would have had to
> manually specified it. Now its used by default if its detects a non
> Windows OS.
> 
> Any particular reason your profiling output does not show invocations?
> Its not essential most likely, but I've found it to be helpful in
> comparisons.
> 
> We are about to release 2.9, and its been such a long haul, I'd hate to
> see a release with an unknown performance trap.
> 
> --
> - Mark
> 
> http://www.lucidimagination.com
> 
> 
> > Thanks a lot for your support!
> >
> > Cheers,
> > Thomas
> >
> > Mark Miller wrote:
> >
> >> A few quick notes -
> >>
> >> Lucene 2.9 old api doesn't appear much worse than Lucene 2.4?
> >>
> >> You save a lot with the new Intern impl, because thats not a hotspot
> >> anymore. But then,
> >> RandomAccessFile seeks end up being a lot more of the pie. They look
> >> fairly similar in speed overall?
> >>
> >> It looks like the major bottleneck with 2.9 new api is that its using
> >> NIOFSDirectory (your on unix I guess, and it now
> >> defaults to that on non Windows os's), and that appears to be a real
> >> killer for you. Its taking half the time for its
> >> reads.  ???
> >>
> >> No conclusions yet, but I'm looking it over. Some other guys will come
> >> in with some ideas as well.
> >>
> >> Do confirm that those profiling results are on a single segment though.
> >>
> >> - Mark
> >>
> >>
> >> Mark Miller wrote:
> >>
> >>> Thomas Becker wrote:
> >>>
> >>>
> >>>> Here's the results of profiling 10 different search requests:
> >>>>
> >>>> http://ankeschwarzer.de/tmp/lucene_24_oldapi.png
> >>>> http://ankeschwarzer.de/tmp/lucene_29_oldapi.png
> >>>> http://ankeschwarzer.de/tmp/lucene_29_newapi.png
> >>>>
> >>>> But you already gave me a good hint. The index being used is an old
> one build
> >>>> with lucene 2.4. I will now try a freshly build 2.9 index and see if
> performance
> >>>> improves. Maybe that already solves the issue...stupid me...
> >>>>
> >>>>
> >>>>
> >>> That shouldn't be an issue unless there is some odd bug.
> >>>
> >>>
> >>>
> >>>> We're updating the index every 30 min. at the moment and it gets
> optimized after
> >>>> each update.
> >>>>
> >>>>
> >>>>
> >>> So this profiling is on an optimized index (eg a single segment) ?
> >>> That would be odd indeed, and possibly point to some of the scoring
> changes.
> >>>
> >>>
> >>>
> >>>> Mark Miller wrote:
> >>>>
> >>>>
> >>>>
> >>>>> Thomas Becker wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>>> Hey Mark,
> >>>>>>
> >>>>>> thanks for your reply. Will do. Results will follow in a couple of
> minutes.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>> Thanks, awesome.
> >>>>>
> >>>>> Also, how many segments (approx) are in your index? If there are a
> lot,
> >>>>> have you/can you try the same tests on an optimized index? Don't
> want to
> >>>>> get ahead of the profiling results, but just to continue the info
> loop.
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>
> >>>
> >>
> >
> >
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: lucene 2.9.0RC4 slower than 2.4.1?

Posted by Mark Miller <ma...@gmail.com>.

The results:

config: impl=SeparateFile serial=false nThreads=4 iterations=100
bufsize=1024 poolsize=2 filelen=730554368
answer=-282295611, ms=173550, MB/sec=1683.7899579371938

config: impl=ChannelFile serial=false nThreads=4 iterations=100
bufsize=1024 poolsize=2 filelen=730554368
answer=-282295361, ms=1377768, MB/sec=212.09793463050383

config: impl=ChannelPread serial=false nThreads=4 iterations=100
bufsize=1024 poolsize=2 filelen=730554368
answer=-282295361, ms=632253, MB/sec=462.19115955163517

config: impl=PooledPread serial=false nThreads=4 iterations=100
bufsize=1024 poolsize=2 filelen=730554368
answer=-282295361, ms=774664, MB/sec=377.2238637654518

ClassicFile was heading for the same fate as ChannelFile.


I'll have to check what its like on the file system - but it appears
just ridiculously slower. Even with SeparateFile, All 4 cores are bouncing
from 0-12% independently and really favoring the low end of that.
ChannelPread appears no better.

There are results from other OS's/setups in the JIRA issue.

I'm using ext4.

Uwe Schindler wrote:
> How does a conventional file system compare?
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>   
>> -----Original Message-----
>> From: Mark Miller [mailto:markrmiller@gmail.com]
>> Sent: Tuesday, September 15, 2009 7:15 PM
>> To: java-user@lucene.apache.org
>> Subject: Re: lucene 2.9.0RC4 slower than 2.4.1?
>>
>> Mark Miller wrote:
>>     
>>> Indeed - I just ran the FileReaderTest on a Linux tmpfs ramdisk - with
>>> SeparateFile all 4 of my cores are immediately pinned and remain so.
>>> With ChannelFile, all 4 cores hover 20-30%.
>>>
>>> It would appear it may not be a good idea to use NIOFSDirectory on
>>>       
>> ramdisks.
>>     
>>> Even still though - it looks like you have a further issue - your Lucene
>>> 2.9 old-api results don't use it, and are still not good.
>>>
>>>
>>>       
>> The quick results:
>>
>> ramdisk: sudo mount -t tmpfs tmpfs /tmp/space -o
>> size=1G,nr_inodes=200k,mode=01777
>>
>> config: impl=SeparateFile serial=false nThreads=4 iterations=100
>> bufsize=1024 poolsize=2 filelen=730554368
>> answer=-282295611, ms=173550, MB/sec=1683.7899579371938
>>
>> config: impl=ChannelFile serial=false nThreads=4 iterations=100
>> bufsize=1024 poolsize=2 filelen=730554368
>> answer=-282295361, ms=1377768, MB/sec=212.09793463050383
>>
>>
>> --
>> - Mark
>>
>> http://www.lucidimagination.com
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>     
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>   


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: lucene 2.9.0RC4 slower than 2.4.1?

Posted by Uwe Schindler <uw...@thetaphi.de>.

How does a conventional file system compare?

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Mark Miller [mailto:markrmiller@gmail.com]
> Sent: Tuesday, September 15, 2009 7:15 PM
> To: java-user@lucene.apache.org
> Subject: Re: lucene 2.9.0RC4 slower than 2.4.1?
> 
> Mark Miller wrote:
> > Indeed - I just ran the FileReaderTest on a Linux tmpfs ramdisk - with
> > SeparateFile all 4 of my cores are immediately pinned and remain so.
> > With ChannelFile, all 4 cores hover 20-30%.
> >
> > It would appear it may not be a good idea to use NIOFSDirectory on
> ramdisks.
> >
> > Even still though - it looks like you have a further issue - your Lucene
> > 2.9 old-api results don't use it, and are still not good.
> >
> >
> The quick results:
> 
> ramdisk: sudo mount -t tmpfs tmpfs /tmp/space -o
> size=1G,nr_inodes=200k,mode=01777
> 
> config: impl=SeparateFile serial=false nThreads=4 iterations=100
> bufsize=1024 poolsize=2 filelen=730554368
> answer=-282295611, ms=173550, MB/sec=1683.7899579371938
> 
> config: impl=ChannelFile serial=false nThreads=4 iterations=100
> bufsize=1024 poolsize=2 filelen=730554368
> answer=-282295361, ms=1377768, MB/sec=212.09793463050383
> 
> 
> --
> - Mark
> 
> http://www.lucidimagination.com
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: lucene 2.9.0RC4 slower than 2.4.1?

Posted by Mark Miller <ma...@gmail.com>.

Mark Miller wrote:
> Indeed - I just ran the FileReaderTest on a Linux tmpfs ramdisk - with
> SeparateFile all 4 of my cores are immediately pinned and remain so.
> With ChannelFile, all 4 cores hover 20-30%.
>
> It would appear it may not be a good idea to use NIOFSDirectory on ramdisks.
>
> Even still though - it looks like you have a further issue - your Lucene
> 2.9 old-api results don't use it, and are still not good.
>
>   
The quick results:

ramdisk: sudo mount -t tmpfs tmpfs /tmp/space -o
size=1G,nr_inodes=200k,mode=01777

config: impl=SeparateFile serial=false nThreads=4 iterations=100
bufsize=1024 poolsize=2 filelen=730554368
answer=-282295611, ms=173550, MB/sec=1683.7899579371938

config: impl=ChannelFile serial=false nThreads=4 iterations=100
bufsize=1024 poolsize=2 filelen=730554368
answer=-282295361, ms=1377768, MB/sec=212.09793463050383


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: lucene 2.9.0RC4 slower than 2.4.1?

Posted by Mark Miller <ma...@gmail.com>.

Indeed - I just ran the FileReaderTest on a Linux tmpfs ramdisk - with
SeparateFile all 4 of my cores are immediately pinned and remain so.
With ChannelFile, all 4 cores hover 20-30%.

It would appear it may not be a good idea to use NIOFSDirectory on ramdisks.

Even still though - it looks like you have a further issue - your Lucene
2.9 old-api results don't use it, and are still not good.

-- 
- Mark

http://www.lucidimagination.com


Uwe Schindler wrote:
> Maybe Linux has some problems with NIO on tmpfs/other ramdisks. What Linux
> do you use, 64bit or 32bit JVM and kernel, ram fs type?
>
> If you have 64 bit and you stored your index in Linux tmpfs (not the old RAM
> fs), the fastest would be MMapDirectory, as the tmpfs RAM can be directly
> used when mapped into the JVM address space.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>   
>> -----Original Message-----
>> From: Mark Miller [mailto:markrmiller@gmail.com]
>> Sent: Tuesday, September 15, 2009 5:30 PM
>> To: java-user@lucene.apache.org
>> Subject: Re: lucene 2.9.0RC4 slower than 2.4.1?
>>
>> Thomas Becker wrote:
>>     
>>> Hey Mark,
>>>
>>> yes. I'm running the app on unix. You see the difference between 2.9 and
>>>       
>> 2.4 here:
>>     
>>> http://ankeschwarzer.de/tmp/graph.jpg
>>>
>>>       
>> Right - I know your measurements showed a difference (and will keep that
>> in mind) - but the profiling results then seem
>> oddly similar.
>>
>>     
>>> 2.4 responds much quicker thus increasing throughput severly. I'm having
>>>       
>> a
>>     
>>> single segment only:
>>>
>>> -rw-r--r-- 1 asuser asgroup         20 Sep  9 16:40 segments.gen
>>> -rw-r--r-- 1 asuser asgroup        294 Sep  9 16:44 segments_1o
>>> -rw-r--r-- 1 asuser asgroup 2810603184 Sep  9 16:44 _7c.cfs
>>>
>>> The FileChannel.read hotspot is indeed strange. Especially if taking
>>>       
>> into
>>     
>>> account that the index is lying on a tmpfs (in memory). And 2.4 should
>>>       
>> be using
>>     
>>> NIOFSDirectory as well?! Will check that.
>>>
>>>       
>> 2.4 did not use NIOFSDirectory by default - you would have had to
>> manually specified it. Now its used by default if its detects a non
>> Windows OS.
>>
>> Any particular reason your profiling output does not show invocations?
>> Its not essential most likely, but I've found it to be helpful in
>> comparisons.
>>
>> We are about to release 2.9, and its been such a long haul, I'd hate to
>> see a release with an unknown performance trap.
>>
>> --
>> - Mark
>>
>> http://www.lucidimagination.com
>>
>>
>>     
>>> Thanks a lot for your support!
>>>
>>> Cheers,
>>> Thomas
>>>
>>> Mark Miller wrote:
>>>
>>>       
>>>> A few quick notes -
>>>>
>>>> Lucene 2.9 old api doesn't appear much worse than Lucene 2.4?
>>>>
>>>> You save a lot with the new Intern impl, because thats not a hotspot
>>>> anymore. But then,
>>>> RandomAccessFile seeks end up being a lot more of the pie. They look
>>>> fairly similar in speed overall?
>>>>
>>>> It looks like the major bottleneck with 2.9 new api is that its using
>>>> NIOFSDirectory (your on unix I guess, and it now
>>>> defaults to that on non Windows os's), and that appears to be a real
>>>> killer for you. Its taking half the time for its
>>>> reads.  ???
>>>>
>>>> No conclusions yet, but I'm looking it over. Some other guys will come
>>>> in with some ideas as well.
>>>>
>>>> Do confirm that those profiling results are on a single segment though.
>>>>
>>>> - Mark
>>>>
>>>>
>>>> Mark Miller wrote:
>>>>
>>>>         
>>>>> Thomas Becker wrote:
>>>>>
>>>>>
>>>>>           
>>>>>> Here's the results of profiling 10 different search requests:
>>>>>>
>>>>>> http://ankeschwarzer.de/tmp/lucene_24_oldapi.png
>>>>>> http://ankeschwarzer.de/tmp/lucene_29_oldapi.png
>>>>>> http://ankeschwarzer.de/tmp/lucene_29_newapi.png
>>>>>>
>>>>>> But you already gave me a good hint. The index being used is an old
>>>>>>             
>> one build
>>     
>>>>>> with lucene 2.4. I will now try a freshly build 2.9 index and see if
>>>>>>             
>> performance
>>     
>>>>>> improves. Maybe that already solves the issue...stupid me...
>>>>>>
>>>>>>
>>>>>>
>>>>>>             
>>>>> That shouldn't be an issue unless there is some odd bug.
>>>>>
>>>>>
>>>>>
>>>>>           
>>>>>> We're updating the index every 30 min. at the moment and it gets
>>>>>>             
>> optimized after
>>     
>>>>>> each update.
>>>>>>
>>>>>>
>>>>>>
>>>>>>             
>>>>> So this profiling is on an optimized index (eg a single segment) ?
>>>>> That would be odd indeed, and possibly point to some of the scoring
>>>>>           
>> changes.
>>     
>>>>>
>>>>>           
>>>>>> Mark Miller wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>             
>>>>>>> Thomas Becker wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>>>> Hey Mark,
>>>>>>>>
>>>>>>>> thanks for your reply. Will do. Results will follow in a couple of
>>>>>>>>                 
>> minutes.
>>     
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>> Thanks, awesome.
>>>>>>>
>>>>>>> Also, how many segments (approx) are in your index? If there are a
>>>>>>>               
>> lot,
>>     
>>>>>>> have you/can you try the same tests on an optimized index? Don't
>>>>>>>               
>> want to
>>     
>>>>>>> get ahead of the profiling results, but just to continue the info
>>>>>>>               
>> loop.
>>     
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>>
>>>>>>             
>>>>>           
>>>       
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>     
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>   





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: lucene 2.9.0RC4 slower than 2.4.1?

Posted by Uwe Schindler <uw...@thetaphi.de>.

Maybe Linux has some problems with NIO on tmpfs/other ramdisks. What Linux
do you use, 64bit or 32bit JVM and kernel, ram fs type?

If you have 64 bit and you stored your index in Linux tmpfs (not the old RAM
fs), the fastest would be MMapDirectory, as the tmpfs RAM can be directly
used when mapped into the JVM address space.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Mark Miller [mailto:markrmiller@gmail.com]
> Sent: Tuesday, September 15, 2009 5:30 PM
> To: java-user@lucene.apache.org
> Subject: Re: lucene 2.9.0RC4 slower than 2.4.1?
> 
> Thomas Becker wrote:
> > Hey Mark,
> >
> > yes. I'm running the app on unix. You see the difference between 2.9 and
> 2.4 here:
> >
> > http://ankeschwarzer.de/tmp/graph.jpg
> >
> Right - I know your measurements showed a difference (and will keep that
> in mind) - but the profiling results then seem
> oddly similar.
> 
> > 2.4 responds much quicker thus increasing throughput severly. I'm having
> a
> > single segment only:
> >
> > -rw-r--r-- 1 asuser asgroup         20 Sep  9 16:40 segments.gen
> > -rw-r--r-- 1 asuser asgroup        294 Sep  9 16:44 segments_1o
> > -rw-r--r-- 1 asuser asgroup 2810603184 Sep  9 16:44 _7c.cfs
> >
> > The FileChannel.read hotspot is indeed strange. Especially if taking
> into
> > account that the index is lying on a tmpfs (in memory). And 2.4 should
> be using
> > NIOFSDirectory as well?! Will check that.
> >
> 2.4 did not use NIOFSDirectory by default - you would have had to
> manually specified it. Now its used by default if its detects a non
> Windows OS.
> 
> Any particular reason your profiling output does not show invocations?
> Its not essential most likely, but I've found it to be helpful in
> comparisons.
> 
> We are about to release 2.9, and its been such a long haul, I'd hate to
> see a release with an unknown performance trap.
> 
> --
> - Mark
> 
> http://www.lucidimagination.com
> 
> 
> > Thanks a lot for your support!
> >
> > Cheers,
> > Thomas
> >
> > Mark Miller wrote:
> >
> >> A few quick notes -
> >>
> >> Lucene 2.9 old api doesn't appear much worse than Lucene 2.4?
> >>
> >> You save a lot with the new Intern impl, because thats not a hotspot
> >> anymore. But then,
> >> RandomAccessFile seeks end up being a lot more of the pie. They look
> >> fairly similar in speed overall?
> >>
> >> It looks like the major bottleneck with 2.9 new api is that its using
> >> NIOFSDirectory (your on unix I guess, and it now
> >> defaults to that on non Windows os's), and that appears to be a real
> >> killer for you. Its taking half the time for its
> >> reads.  ???
> >>
> >> No conclusions yet, but I'm looking it over. Some other guys will come
> >> in with some ideas as well.
> >>
> >> Do confirm that those profiling results are on a single segment though.
> >>
> >> - Mark
> >>
> >>
> >> Mark Miller wrote:
> >>
> >>> Thomas Becker wrote:
> >>>
> >>>
> >>>> Here's the results of profiling 10 different search requests:
> >>>>
> >>>> http://ankeschwarzer.de/tmp/lucene_24_oldapi.png
> >>>> http://ankeschwarzer.de/tmp/lucene_29_oldapi.png
> >>>> http://ankeschwarzer.de/tmp/lucene_29_newapi.png
> >>>>
> >>>> But you already gave me a good hint. The index being used is an old
> one build
> >>>> with lucene 2.4. I will now try a freshly build 2.9 index and see if
> performance
> >>>> improves. Maybe that already solves the issue...stupid me...
> >>>>
> >>>>
> >>>>
> >>> That shouldn't be an issue unless there is some odd bug.
> >>>
> >>>
> >>>
> >>>> We're updating the index every 30 min. at the moment and it gets
> optimized after
> >>>> each update.
> >>>>
> >>>>
> >>>>
> >>> So this profiling is on an optimized index (eg a single segment) ?
> >>> That would be odd indeed, and possibly point to some of the scoring
> changes.
> >>>
> >>>
> >>>
> >>>> Mark Miller wrote:
> >>>>
> >>>>
> >>>>
> >>>>> Thomas Becker wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>>> Hey Mark,
> >>>>>>
> >>>>>> thanks for your reply. Will do. Results will follow in a couple of
> minutes.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>> Thanks, awesome.
> >>>>>
> >>>>> Also, how many segments (approx) are in your index? If there are a
> lot,
> >>>>> have you/can you try the same tests on an optimized index? Don't
> want to
> >>>>> get ahead of the profiling results, but just to continue the info
> loop.
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>
> >>>
> >>
> >
> >
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: lucene 2.9.0RC4 slower than 2.4.1?

Posted by Mark Miller <ma...@gmail.com>.

Thomas Becker wrote:
> Hey Mark,
>
> yes. I'm running the app on unix. You see the difference between 2.9 and 2.4 here:
>
> http://ankeschwarzer.de/tmp/graph.jpg
>   
Right - I know your measurements showed a difference (and will keep that
in mind) - but the profiling results then seem
oddly similar.

> 2.4 responds much quicker thus increasing throughput severly. I'm having a
> single segment only:
>
> -rw-r--r-- 1 asuser asgroup         20 Sep  9 16:40 segments.gen
> -rw-r--r-- 1 asuser asgroup        294 Sep  9 16:44 segments_1o
> -rw-r--r-- 1 asuser asgroup 2810603184 Sep  9 16:44 _7c.cfs
>
> The FileChannel.read hotspot is indeed strange. Especially if taking into
> account that the index is lying on a tmpfs (in memory). And 2.4 should be using
> NIOFSDirectory as well?! Will check that.
>   
2.4 did not use NIOFSDirectory by default - you would have had to
manually specified it. Now its used by default if its detects a non
Windows OS.

Any particular reason your profiling output does not show invocations?
Its not essential most likely, but I've found it to be helpful in
comparisons.

We are about to release 2.9, and its been such a long haul, I'd hate to
see a release with an unknown performance trap.

-- 
- Mark

http://www.lucidimagination.com


> Thanks a lot for your support!
>
> Cheers,
> Thomas
>
> Mark Miller wrote:
>   
>> A few quick notes -
>>
>> Lucene 2.9 old api doesn't appear much worse than Lucene 2.4?
>>
>> You save a lot with the new Intern impl, because thats not a hotspot
>> anymore. But then,
>> RandomAccessFile seeks end up being a lot more of the pie. They look
>> fairly similar in speed overall?
>>
>> It looks like the major bottleneck with 2.9 new api is that its using
>> NIOFSDirectory (your on unix I guess, and it now
>> defaults to that on non Windows os's), and that appears to be a real
>> killer for you. Its taking half the time for its
>> reads.  ???
>>
>> No conclusions yet, but I'm looking it over. Some other guys will come
>> in with some ideas as well.
>>
>> Do confirm that those profiling results are on a single segment though.
>>
>> - Mark
>>
>>
>> Mark Miller wrote:
>>     
>>> Thomas Becker wrote:
>>>   
>>>       
>>>> Here's the results of profiling 10 different search requests:
>>>>
>>>> http://ankeschwarzer.de/tmp/lucene_24_oldapi.png
>>>> http://ankeschwarzer.de/tmp/lucene_29_oldapi.png
>>>> http://ankeschwarzer.de/tmp/lucene_29_newapi.png
>>>>
>>>> But you already gave me a good hint. The index being used is an old one build
>>>> with lucene 2.4. I will now try a freshly build 2.9 index and see if performance
>>>> improves. Maybe that already solves the issue...stupid me...
>>>>   
>>>>     
>>>>         
>>> That shouldn't be an issue unless there is some odd bug.
>>>
>>>   
>>>       
>>>> We're updating the index every 30 min. at the moment and it gets optimized after
>>>> each update.
>>>>   
>>>>     
>>>>         
>>> So this profiling is on an optimized index (eg a single segment) ?
>>> That would be odd indeed, and possibly point to some of the scoring changes.
>>>
>>>   
>>>       
>>>> Mark Miller wrote:
>>>>   
>>>>     
>>>>         
>>>>> Thomas Becker wrote:
>>>>>     
>>>>>       
>>>>>           
>>>>>> Hey Mark,
>>>>>>
>>>>>> thanks for your reply. Will do. Results will follow in a couple of minutes.
>>>>>>
>>>>>>
>>>>>>   
>>>>>>       
>>>>>>         
>>>>>>             
>>>>> Thanks, awesome.
>>>>>
>>>>> Also, how many segments (approx) are in your index? If there are a lot,
>>>>> have you/can you try the same tests on an optimized index? Don't want to
>>>>> get ahead of the profiling results, but just to continue the info loop.
>>>>>
>>>>>     
>>>>>       
>>>>>           
>>>>   
>>>>     
>>>>         
>>>   
>>>       
>>     
>
>   



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: lucene 2.9.0RC4 slower than 2.4.1?

Posted by Thomas Becker <th...@net-m.de>.

Hey Mark,

yes. I'm running the app on unix. You see the difference between 2.9 and 2.4 here:

http://ankeschwarzer.de/tmp/graph.jpg

2.4 responds much quicker thus increasing throughput severly. I'm having a
single segment only:

-rw-r--r-- 1 asuser asgroup         20 Sep  9 16:40 segments.gen
-rw-r--r-- 1 asuser asgroup        294 Sep  9 16:44 segments_1o
-rw-r--r-- 1 asuser asgroup 2810603184 Sep  9 16:44 _7c.cfs

The FileChannel.read hotspot is indeed strange. Especially if taking into
account that the index is lying on a tmpfs (in memory). And 2.4 should be using
NIOFSDirectory as well?! Will check that.

Thanks a lot for your support!

Cheers,
Thomas

Mark Miller wrote:
> A few quick notes -
> 
> Lucene 2.9 old api doesn't appear much worse than Lucene 2.4?
> 
> You save a lot with the new Intern impl, because thats not a hotspot
> anymore. But then,
> RandomAccessFile seeks end up being a lot more of the pie. They look
> fairly similar in speed overall?
> 
> It looks like the major bottleneck with 2.9 new api is that its using
> NIOFSDirectory (your on unix I guess, and it now
> defaults to that on non Windows os's), and that appears to be a real
> killer for you. Its taking half the time for its
> reads.  ???
> 
> No conclusions yet, but I'm looking it over. Some other guys will come
> in with some ideas as well.
> 
> Do confirm that those profiling results are on a single segment though.
> 
> - Mark
> 
> 
> Mark Miller wrote:
>> Thomas Becker wrote:
>>   
>>> Here's the results of profiling 10 different search requests:
>>>
>>> http://ankeschwarzer.de/tmp/lucene_24_oldapi.png
>>> http://ankeschwarzer.de/tmp/lucene_29_oldapi.png
>>> http://ankeschwarzer.de/tmp/lucene_29_newapi.png
>>>
>>> But you already gave me a good hint. The index being used is an old one build
>>> with lucene 2.4. I will now try a freshly build 2.9 index and see if performance
>>> improves. Maybe that already solves the issue...stupid me...
>>>   
>>>     
>> That shouldn't be an issue unless there is some odd bug.
>>
>>   
>>> We're updating the index every 30 min. at the moment and it gets optimized after
>>> each update.
>>>   
>>>     
>> So this profiling is on an optimized index (eg a single segment) ?
>> That would be odd indeed, and possibly point to some of the scoring changes.
>>
>>   
>>> Mark Miller wrote:
>>>   
>>>     
>>>> Thomas Becker wrote:
>>>>     
>>>>       
>>>>> Hey Mark,
>>>>>
>>>>> thanks for your reply. Will do. Results will follow in a couple of minutes.
>>>>>
>>>>>
>>>>>   
>>>>>       
>>>>>         
>>>> Thanks, awesome.
>>>>
>>>> Also, how many segments (approx) are in your index? If there are a lot,
>>>> have you/can you try the same tests on an optimized index? Don't want to
>>>> get ahead of the profiling results, but just to continue the info loop.
>>>>
>>>>     
>>>>       
>>>   
>>>     
>>
>>   
> 
> 

-- 
Thomas Becker
Senior JEE Developer

net mobile AG
Zollhof 17
40221 Düsseldorf
GERMANY

Phone:    +49 211 97020-195
Fax:      +49 211 97020-949
Mobile:   +49 173 5146567 (private)
E-Mail:   mailto:thomas.becker@net-m.de
Internet: http://www.net-m.de

Registergericht:  Amtsgericht Düsseldorf, HRB 48022
Vorstand:         Theodor Niehues (Vorsitzender), Frank Hartmann,
                 Kai Markus Kulas, Dieter Plassmann
Vorsitzender des
Aufsichtsrates:   Dr. Michael Briem

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: lucene 2.9.0RC4 slower than 2.4.1?

Posted by Mark Miller <ma...@gmail.com>.

A few quick notes -

Lucene 2.9 old api doesn't appear much worse than Lucene 2.4?

You save a lot with the new Intern impl, because thats not a hotspot
anymore. But then,
RandomAccessFile seeks end up being a lot more of the pie. They look
fairly similar in speed overall?

It looks like the major bottleneck with 2.9 new api is that its using
NIOFSDirectory (your on unix I guess, and it now
defaults to that on non Windows os's), and that appears to be a real
killer for you. Its taking half the time for its
reads.  ???

No conclusions yet, but I'm looking it over. Some other guys will come
in with some ideas as well.

Do confirm that those profiling results are on a single segment though.

- Mark


Mark Miller wrote:
> Thomas Becker wrote:
>   
>> Here's the results of profiling 10 different search requests:
>>
>> http://ankeschwarzer.de/tmp/lucene_24_oldapi.png
>> http://ankeschwarzer.de/tmp/lucene_29_oldapi.png
>> http://ankeschwarzer.de/tmp/lucene_29_newapi.png
>>
>> But you already gave me a good hint. The index being used is an old one build
>> with lucene 2.4. I will now try a freshly build 2.9 index and see if performance
>> improves. Maybe that already solves the issue...stupid me...
>>   
>>     
> That shouldn't be an issue unless there is some odd bug.
>
>   
>> We're updating the index every 30 min. at the moment and it gets optimized after
>> each update.
>>   
>>     
> So this profiling is on an optimized index (eg a single segment) ?
> That would be odd indeed, and possibly point to some of the scoring changes.
>
>   
>> Mark Miller wrote:
>>   
>>     
>>> Thomas Becker wrote:
>>>     
>>>       
>>>> Hey Mark,
>>>>
>>>> thanks for your reply. Will do. Results will follow in a couple of minutes.
>>>>
>>>>
>>>>   
>>>>       
>>>>         
>>> Thanks, awesome.
>>>
>>> Also, how many segments (approx) are in your index? If there are a lot,
>>> have you/can you try the same tests on an optimized index? Don't want to
>>> get ahead of the profiling results, but just to continue the info loop.
>>>
>>>     
>>>       
>>   
>>     
>
>
>   


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: lucene 2.9.0RC4 slower than 2.4.1?

Posted by Mark Miller <ma...@gmail.com>.

Thomas Becker wrote:
> Here's the results of profiling 10 different search requests:
>
> http://ankeschwarzer.de/tmp/lucene_24_oldapi.png
> http://ankeschwarzer.de/tmp/lucene_29_oldapi.png
> http://ankeschwarzer.de/tmp/lucene_29_newapi.png
>
> But you already gave me a good hint. The index being used is an old one build
> with lucene 2.4. I will now try a freshly build 2.9 index and see if performance
> improves. Maybe that already solves the issue...stupid me...
>   
That shouldn't be an issue unless there is some odd bug.

> We're updating the index every 30 min. at the moment and it gets optimized after
> each update.
>   
So this profiling is on an optimized index (eg a single segment) ?
That would be odd indeed, and possibly point to some of the scoring changes.

>
> Mark Miller wrote:
>   
>> Thomas Becker wrote:
>>     
>>> Hey Mark,
>>>
>>> thanks for your reply. Will do. Results will follow in a couple of minutes.
>>>
>>>
>>>   
>>>       
>> Thanks, awesome.
>>
>> Also, how many segments (approx) are in your index? If there are a lot,
>> have you/can you try the same tests on an optimized index? Don't want to
>> get ahead of the profiling results, but just to continue the info loop.
>>
>>     
>
>   


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: lucene 2.9.0RC4 slower than 2.4.1?

Posted by Thomas Becker <th...@net-m.de>.

Here's the results of profiling 10 different search requests:

http://ankeschwarzer.de/tmp/lucene_24_oldapi.png
http://ankeschwarzer.de/tmp/lucene_29_oldapi.png
http://ankeschwarzer.de/tmp/lucene_29_newapi.png

But you already gave me a good hint. The index being used is an old one build
with lucene 2.4. I will now try a freshly build 2.9 index and see if performance
improves. Maybe that already solves the issue...stupid me...
We're updating the index every 30 min. at the moment and it gets optimized after
each update.


Mark Miller wrote:
> Thomas Becker wrote:
>> Hey Mark,
>>
>> thanks for your reply. Will do. Results will follow in a couple of minutes.
>>
>>
>>   
> Thanks, awesome.
> 
> Also, how many segments (approx) are in your index? If there are a lot,
> have you/can you try the same tests on an optimized index? Don't want to
> get ahead of the profiling results, but just to continue the info loop.
> 

-- 
Thomas Becker
Senior JEE Developer

net mobile AG
Zollhof 17
40221 Düsseldorf
GERMANY

Phone:    +49 211 97020-195
Fax:      +49 211 97020-949
Mobile:   +49 173 5146567 (private)
E-Mail:   mailto:thomas.becker@net-m.de
Internet: http://www.net-m.de

Registergericht:  Amtsgericht Düsseldorf, HRB 48022
Vorstand:         Theodor Niehues (Vorsitzender), Frank Hartmann,
                 Kai Markus Kulas, Dieter Plassmann
Vorsitzender des
Aufsichtsrates:   Dr. Michael Briem

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: lucene 2.9.0RC4 slower than 2.4.1?

Posted by Mark Miller <ma...@gmail.com>.

Thomas Becker wrote:
> Hey Mark,
>
> thanks for your reply. Will do. Results will follow in a couple of minutes.
>
>
>   
Thanks, awesome.

Also, how many segments (approx) are in your index? If there are a lot,
have you/can you try the same tests on an optimized index? Don't want to
get ahead of the profiling results, but just to continue the info loop.

-- 
- Mark

http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: lucene 2.9.0RC4 slower than 2.4.1?

Posted by Thomas Becker <th...@net-m.de>.

Hey Mark,

thanks for your reply. Will do. Results will follow in a couple of minutes.

Yes the custom sorts are doing something tricky. :) I'll try to explain them in
few words and paste the code.

But even w/o them 2.9 is slower. Testcase 2 and 3 have only different lucene jars.

CustomFieldComparatorPrefix.java:

a field containing for example releaseDates for sorting. But there's different
releaseDates for a single document and different countries for example. They're
prefixed and comma separated in a single field.

Here's the code:

public final class CustomFieldComparatorPrefix extends FieldComparatorSource {

	/**
	 *
	 */
	private static final long serialVersionUID = 200907240001L;

	private final String prefix;

	public CustomFieldComparatorPrefix(String prefix) {
		this.prefix = prefix;
	}

	/*
	 * (non-Javadoc)
	 *
	 * @see
	 * org.apache.lucene.search.FieldComparatorSource#newComparator(java.lang
	 * .String, int, int, boolean)
	 */
	public FieldComparator newComparator(final String fieldname, final int numHits,
int sortPos, boolean reversed) throws IOException {
		return new FieldComparator() {

			private int[] currentReaderValues;

			private int[] values = new int[numHits];

			private int bottom;

			/*
			 * (non-Javadoc)
			 *
			 * @see org.apache.lucene.search.FieldComparator#compare(int, int)
			 */
			public int compare(int slot1, int slot2) {
				// TODO: there are sneaky non-branch ways to compute
				// -1/+1/0 sign
				// Cannot return values[slot1] - values[slot2] because that
				// may overflow
				final int v1 = values[slot1];
				final int v2 = values[slot2];
				if (v1 > v2) {
					return 1;
				} else if (v1 < v2) {
					return -1;
				} else {
					return 0;
				}
			}

			/*
			 * (non-Javadoc)
			 *
			 * @see org.apache.lucene.search.FieldComparator#compareBottom(int)
			 */
			public int compareBottom(int doc) throws IOException {
				// TODO: there are sneaky non-branch ways to compute
				// -1/+1/0 sign
				// Cannot return bottom - values[slot2] because that
				// may overflow
				final int v2 = currentReaderValues[doc];
				if (bottom > v2) {
					return 1;
				} else if (bottom < v2) {
					return -1;
				} else {
					return 0;
				}
			}

			/*
			 * (non-Javadoc)
			 *
			 * @see org.apache.lucene.search.FieldComparator#copy(int, int)
			 */
			public void copy(int slot, int doc) throws IOException {
				values[slot] = currentReaderValues[doc];
			}

			/*
			 * (non-Javadoc)
			 *
			 * @see org.apache.lucene.search.FieldComparator#setBottom(int)
			 */
			public void setBottom(int slot) {
				this.bottom = values[slot];
			}

			/*
			 * (non-Javadoc)
			 *
			 * @see org.apache.lucene.search.FieldComparator#sortType()
			 */
			public int sortType() {
				return SortField.CUSTOM;
			}

			/*
			 * (non-Javadoc)
			 *
			 * @see org.apache.lucene.search.FieldComparator#value(int)
			 */
			public Comparable<Integer> value(int slot) {
				return new Integer(values[slot]);
			}

			@Override
			public void setNextReader(IndexReader reader, int docBase) throws IOException {
				currentReaderValues = FieldCache.DEFAULT.getInts(reader, fieldname, new
PrefixedIntParser(prefix));
			}

		};
	}
}

CustomFieldComparatorPosition.java:

works similar to the one above. But in the field are different positions for
different contentgroups. Example "10_1,11_5,14_1". Whereas the prefix are
defining the contentgroup id and the digit after the underscore is the actual
position to define the documents place in the sort order.
This one could in theory use the same comparator as above, but the app will run
oom due to too many different contentgroups. Since only few (~280.000) documents
have positions set I wrote my own lucene cache implementation storing only not
default values.

Comparator Source:

public final class CustomFieldComparatorPosition extends FieldComparatorSource {
	private final Logger log = LoggerFactory.getLogger(this.getClass());

	/**
	 *
	 */
	private static final long serialVersionUID = 200907240001L;

	private final String prefix;

	private static PositionCache cache;

	private StopWatch sw = new StopWatch();

	public CustomFieldComparatorPosition(String prefix) {
		if (cache == null) {
			log.debug("CustomFieldComparatorPosition:initializing PositionCache");
			cache = new PositionCache();
		}
		this.prefix = prefix;
	}

	/*
	 * (non-Javadoc)
	 *
	 * @see
	 * org.apache.lucene.search.FieldComparatorSource#newComparator(java.lang
	 * .String, int, int, boolean)
	 */
	public FieldComparator newComparator(final String fieldname, final int numHits,
int sortPos, boolean reversed) throws IOException {
		return new FieldComparator() {

			private Map<Integer, Integer> currentReaderValues;

			private int[] values = new int[numHits];

			private int bottom;

			/*
			 * (non-Javadoc)
			 *
			 * @see org.apache.lucene.search.FieldComparator#compare(int, int)
			 */
			public int compare(int slot1, int slot2) {
				// TODO: there are sneaky non-branch ways to compute
				// -1/+1/0 sign
				// Cannot return values[slot1] - values[slot2] because that
				// may overflow
				final int v1 = values[slot1];
				final int v2 = values[slot2];
				if (v1 > v2) {
					return 1;
				} else if (v1 < v2) {
					return -1;
				} else {
					return 0;
				}
			}

			/*
			 * (non-Javadoc)
			 *
			 * @see org.apache.lucene.search.FieldComparator#compareBottom(int)
			 */
			public int compareBottom(int doc) throws IOException {
				int i;
				try {
					i = currentReaderValues.get(doc);
				} catch (NullPointerException e) {
					i = 999999;
				}
				// TODO: there are sneaky non-branch ways to compute
				// -1/+1/0 sign
				// Cannot return bottom - values[slot2] because that
				// may overflow
				final int v2 = i;
				if (bottom > v2) {
					return 1;
				} else if (bottom < v2) {
					return -1;
				} else {
					return 0;
				}
			}

			/*
			 * (non-Javadoc)
			 *
			 * @see org.apache.lucene.search.FieldComparator#copy(int, int)
			 */
			public void copy(int slot, int doc) throws IOException {
				// This will be executed n times where n is the amount of
				// documents in the index.
				int value;
				try{
					value = currentReaderValues.get(doc);
				}catch(NullPointerException e){
					value = 999999;
				}
				values[slot] = value;
			}

			/*
			 * (non-Javadoc)
			 *
			 * @see org.apache.lucene.search.FieldComparator#setBottom(int)
			 */
			public void setBottom(int slot) {
				this.bottom = values[slot];
			}

			/*
			 * (non-Javadoc)
			 *
			 * @see org.apache.lucene.search.FieldComparator#sortType()
			 */
			public int sortType() {
				return SortField.CUSTOM;
			}

			/*
			 * (non-Javadoc)
			 *
			 * @see org.apache.lucene.search.FieldComparator#value(int)
			 */
			public Comparable<Integer> value(int slot) {
				return new Integer(values[slot]);
			}

			@Override
			public void setNextReader(IndexReader reader, int docBase) throws IOException {
				sw.start();
				try {
					currentReaderValues = cache.getInts(reader, fieldname, new
PrefixedIntParser(prefix));
				} catch (InterruptedException e) {
					throw new IllegalStateException(e.getCause());
				}
				sw.stop();
				if (sw.getTime() > 3000) {
					log.info("setNextReader: Slow: Time to get currentReaderValues from cache:
{}ms. Items in cache: {}", sw.getTime(), cache.getItemCount());
				}
			}

		};
	}
}

PositionCache.java:
/**
 * This cache implementation caches the position values of items stored as
 * documents in lucene. This cache has a WeakHashMap with an IndexReader
 * reference as a key thus if the indexReader reference gets deleted, the cache
 * is marked to be gced. The innerCache is a Map containing field + parser
 * (contracttocontentgroup prefix) as the key and as a value yet another map.
 * The latter map finally contains the docIds as key and positionvalue for this
 * prefix as value.
 *
 * @author Thomas Becker (thomas.becker@net-m.de)
 *
 */
public class PositionCache {
	final Map<Object, Map<String, Future<Map<Integer, Integer>>>> readerCache = new
WeakHashMap<Object, Map<String, Future<Map<Integer, Integer>>>>();

	AtomicInteger itemCount = new AtomicInteger(0); // only for debugging...only
increasing

	public Map<Integer, Integer> getInts(final IndexReader reader, final String
field, final IntParser parser) throws InterruptedException {

		String key = parser.toString();
		// Future<Map<Integer,Integer> contains docId as key and position as
		// value
		HashMap<String, Future<Map<Integer, Integer>>> innerCache;
		final Object readerKey = reader.getFieldCacheKey();
		synchronized (readerCache) {
			innerCache = (HashMap<String, Future<Map<Integer, Integer>>>)
readerCache.get(readerKey);
			if (innerCache == null) {
				innerCache = new HashMap<String, Future<Map<Integer, Integer>>>();
				readerCache.put(readerKey, innerCache);
			}
		}
		Future<Map<Integer, Integer>> f = innerCache.get(key);
		if (f == null) {
			Callable<Map<Integer, Integer>> eval = new Callable<Map<Integer, Integer>>() {
				public Map<Integer, Integer> call() throws InterruptedException, IOException {
					HashMap<Integer, Integer> docPositions = new HashMap<Integer, Integer>();
					TermDocs termDocs = reader.termDocs();
					TermEnum termEnum = reader.terms(new Term(field));
					try {
						do {
							Term term = termEnum.term();
							if (term == null || term.field() != field)
								break;
							int termval = parser.parseInt(term.text());
							termDocs.seek(termEnum);
							while (termDocs.next()) {
								// do not store defaults to save memory
								if (termval < 999999) {
									itemCount.incrementAndGet();
									docPositions.put(termDocs.doc(), termval);
								}
							}
						} while (termEnum.next());
					} finally {
						termDocs.close();
						termEnum.close();
					}
					return docPositions;
				}
			};
			FutureTask<Map<Integer, Integer>> ft = new FutureTask<Map<Integer,
Integer>>(eval);
			f = innerCache.put(key, ft);
			if (f == null) {
				f = ft;
				ft.run();
			}
		}
		try {
			return f.get();
		} catch (CancellationException e) {
			innerCache.remove(key);
		} catch (ExecutionException e) {
			throw new IllegalStateException(e.getCause());
		}
		return null;
	}


Cheers,
Thomas


Mark Miller wrote:
> Hey Thomas - any chance you can do some quick profiling and grab the
> hotspots from the 3 configurations?
> 
> Are your custom sorts doing anything tricky?
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: lucene 2.9.0RC4 slower than 2.4.1?

Posted by Mark Miller <ma...@gmail.com>.

Hey Thomas - any chance you can do some quick profiling and grab the
hotspots from the 3 configurations?

Are your custom sorts doing anything tricky?

-- 
- Mark

http://www.lucidimagination.com


Thomas Becker wrote:
> Urm and uploaded here:
> http://ankeschwarzer.de/tmp/graph.jpg
>
> Sorry.
>
> Thomas Becker wrote:
>   
>> Missed the attachment, sorry.
>>
>> Thomas Becker wrote:
>>     
>>> Hi all,
>>>
>>> I'm experiencing a performance degradation after migrating to 2.9 and running
>>> some tests. I'm getting out of ideas and any help to identify the reasons why
>>> 2.9 is slower than 2.4 are highly appreciated.
>>>
>>> We've had some issues with custom sorting in lucene 2.4.1. We worked around them
>>> by sorting the resultsets manually and caching the results after sorting (memory
>>> consuming but fast).
>>>
>>> I now migrated to lucene 2.9.0RC4. Build some new FieldComparatorSource
>>> implementation for sorting and refactored all deprecated api calls to the new
>>> lucene 2.9 api.
>>>
>>> Everything works fine from a functional perspective. But performance severly is
>>> (negatively) affected by lucene 2.9.
>>>
>>> I profiled the application for a couple of hours, build a jmeter load test and
>>> compared the following scenarios:
>>>
>>> 1. lucene 2.9 - new api
>>> 2. lucene 2.9 - old api and custom sorting after lucene
>>> 3. lucene 2.4.1 - old api and custom sorting after lucene (what we had up2now)
>>>
>>> Please find attached an rrd graph showing the results. The lighter the color the
>>> faster the request has been served. y=# requests, x=time.
>>>
>>> Most interestingly simply switching the lucene jars between 2.4 and 2.9 degraded
>>> response times and therefore throughput (see results of testcase 2 and 3).
>>> Adapting to the new api decreased performance again. The difference between
>>> testcase 1 and 2 is most probably due to precached custom sorted results.
>>>
>>> The application under test is a dedicated lucene search engine doing nothing
>>> else, but serving search requests. We're running a cluster of them in prd and
>>> it's incredibly fast. With the old implementation and prd traffic we've above
>>> 98% of the requests served in 200ms.
>>> The index under test contains about 3 million documents (with lots of fields),
>>> consumes about 2,5gig disk space and is stored on a tmpfs RAMDISK provided by
>>> the linux kernel.
>>>
>>> Most interesting methods used for searching are:
>>>
>>> getHitsCount (is there a way to speed this up?):
>>>
>>> 	public int getHitsCount(String query, Filter filter) throws
>>> LuceneServiceException {
>>> 		log.debug("getHitsCount('{}, {}')", query, filter);
>>> 		if (StringUtils.isBlank(query)) {
>>> 			log.warn("getHitsCount: empty lucene query");
>>> 			return 0;
>>> 		}
>>> 		long startTimeMillis = System.currentTimeMillis();
>>> 		int count = 0;
>>>
>>> 		if (indexSearcher == null) {
>>> 			return 0;
>>> 		}
>>>
>>> 		BooleanQuery.setMaxClauseCount(MAXCLAUSECOUNT);
>>> 		Query q = null;
>>> 		try {
>>> 			q = createQuery(query);
>>> 			TopScoreDocCollector tsdc = TopScoreDocCollector.create(1, true);
>>> 			indexSearcher.search(q, filter, tsdc);
>>> 			count = tsdc.getTotalHits();
>>> 			log.info("getHitsCount: count = {}",count);
>>> 		} catch (ParseException ex) {
>>> 			throw new LuceneServiceException("invalid lucene query:" + query, ex);
>>> 		} catch (IOException e) {
>>> 			throw new LuceneServiceException(" indexSearcher could be corrupted", e);
>>> 		} finally {
>>> 			long durationMillis = System.currentTimeMillis() - startTimeMillis;
>>> 			if (durationMillis > slowQueryLimit) {
>>> 				log.warn("getHitsCount: Slow query: {} ms, query={}", durationMillis, query);
>>> 			}
>>> 			log.debug("getHitsCount: query took {} ms", durationMillis);
>>> 		}
>>> 		return count;
>>> 	}
>>>
>>> search:
>>> 	public List<Document> search(String query, Filter filter, Sort sort, int from,
>>> int size) throws LuceneServiceException {
>>> 		log.debug("{} search('{}', {}, {}, {}, {})", new Object[] { indexAlias, query,
>>> filter, sort, from, size });
>>> 		long startTimeMillis = System.currentTimeMillis();
>>>
>>> 		List<Document> docs = new ArrayList<Document>();
>>> 		if (indexSearcher == null) {
>>> 			return docs;
>>> 		}
>>> 		Query q = null;
>>> 		try {
>>> 			if (query == null) {
>>> 				log.warn("search: lucene query is null...");
>>> 				return docs;
>>> 			}
>>> 			q = createQuery(query);
>>> 			BooleanQuery.setMaxClauseCount(MAXCLAUSECOUNT);
>>> 			if (size < 0 || size > maxNumHits) {
>>> 				// set hard limit for numHits
>>> 				size = maxNumHits;
>>> 				if (log.isDebugEnabled())
>>> 					log.debug("search: Size set to hardlimit: {} for query: {} with filter:
>>> {}", new Object[] { size, query, filter });
>>> 			}
>>> 			TopFieldCollector collector = TopFieldCollector.create(sort, size + from,
>>> true, false, false, true);
>>> 			indexSearcher.search(q, filter, collector);
>>> 			if(size > collector.getTotalHits())
>>> 				size = collector.getTotalHits();
>>> 			if (size > 100000)
>>> 				log.info("search: size: {} bigger than 100.000 for query: {} with filter:
>>> {}", new Object[] { size, query, filter });
>>> 			TopDocs td = collector.topDocs(from, size);
>>> 			ScoreDoc[] scoreDocs = td.scoreDocs;
>>> 			for (ScoreDoc scoreDoc : scoreDocs) {
>>> 				docs.add(indexSearcher.doc(scoreDoc.doc));
>>> 			}
>>> 		} catch (ParseException e) {
>>> 			log.warn("search: ParseException: {}", e.getMessage());
>>> 			if (log.isDebugEnabled())
>>> 				log.warn("search: ParseException: ", e);
>>> 			return Collections.emptyList();
>>> 		} catch (IOException e) {
>>> 			log.warn("search: IOException: ", e);
>>> 			return Collections.emptyList();
>>> 		} finally {
>>> 			long durationMillis = System.currentTimeMillis() - startTimeMillis;
>>> 			if (durationMillis > slowQueryLimit) {
>>> 				log.warn("search: Slow query: {} ms, query={}, indexUsed={}",
>>> 						new Object[] { durationMillis, query,
>>> indexSearcher.getIndexReader().directory() });
>>> 			}
>>> 			log.debug("search: query took {} ms", durationMillis);
>>> 		}
>>> 		return docs;
>>> 	}
>>>
>>> I'm wondering why others are experiencing better performance with 2.9 and why
>>> our implementations performance is going bad. Maybe our way of using the 2.9 api
>>> is not the best and sorting is definetly expensive.
>>>
>>> Any ideas are appreciated. I'm torning out my hair since hours and days to find
>>> the root cause. Also hints how I could find the bottlenecks myself are appreciated.
>>>
>>> Cheers,
>>> Thomas
>>>
>>>       
>> ------------------------------------------------------------------------
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>     
>
>   




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: lucene 2.9.0RC4 slower than 2.4.1?

Posted by Thomas Becker <th...@net-m.de>.

Urm and uploaded here:
http://ankeschwarzer.de/tmp/graph.jpg

Sorry.

Thomas Becker wrote:
> Missed the attachment, sorry.
> 
> Thomas Becker wrote:
>> Hi all,
>>
>> I'm experiencing a performance degradation after migrating to 2.9 and running
>> some tests. I'm getting out of ideas and any help to identify the reasons why
>> 2.9 is slower than 2.4 are highly appreciated.
>>
>> We've had some issues with custom sorting in lucene 2.4.1. We worked around them
>> by sorting the resultsets manually and caching the results after sorting (memory
>> consuming but fast).
>>
>> I now migrated to lucene 2.9.0RC4. Build some new FieldComparatorSource
>> implementation for sorting and refactored all deprecated api calls to the new
>> lucene 2.9 api.
>>
>> Everything works fine from a functional perspective. But performance severly is
>> (negatively) affected by lucene 2.9.
>>
>> I profiled the application for a couple of hours, build a jmeter load test and
>> compared the following scenarios:
>>
>> 1. lucene 2.9 - new api
>> 2. lucene 2.9 - old api and custom sorting after lucene
>> 3. lucene 2.4.1 - old api and custom sorting after lucene (what we had up2now)
>>
>> Please find attached an rrd graph showing the results. The lighter the color the
>> faster the request has been served. y=# requests, x=time.
>>
>> Most interestingly simply switching the lucene jars between 2.4 and 2.9 degraded
>> response times and therefore throughput (see results of testcase 2 and 3).
>> Adapting to the new api decreased performance again. The difference between
>> testcase 1 and 2 is most probably due to precached custom sorted results.
>>
>> The application under test is a dedicated lucene search engine doing nothing
>> else, but serving search requests. We're running a cluster of them in prd and
>> it's incredibly fast. With the old implementation and prd traffic we've above
>> 98% of the requests served in 200ms.
>> The index under test contains about 3 million documents (with lots of fields),
>> consumes about 2,5gig disk space and is stored on a tmpfs RAMDISK provided by
>> the linux kernel.
>>
>> Most interesting methods used for searching are:
>>
>> getHitsCount (is there a way to speed this up?):
>>
>> 	public int getHitsCount(String query, Filter filter) throws
>> LuceneServiceException {
>> 		log.debug("getHitsCount('{}, {}')", query, filter);
>> 		if (StringUtils.isBlank(query)) {
>> 			log.warn("getHitsCount: empty lucene query");
>> 			return 0;
>> 		}
>> 		long startTimeMillis = System.currentTimeMillis();
>> 		int count = 0;
>>
>> 		if (indexSearcher == null) {
>> 			return 0;
>> 		}
>>
>> 		BooleanQuery.setMaxClauseCount(MAXCLAUSECOUNT);
>> 		Query q = null;
>> 		try {
>> 			q = createQuery(query);
>> 			TopScoreDocCollector tsdc = TopScoreDocCollector.create(1, true);
>> 			indexSearcher.search(q, filter, tsdc);
>> 			count = tsdc.getTotalHits();
>> 			log.info("getHitsCount: count = {}",count);
>> 		} catch (ParseException ex) {
>> 			throw new LuceneServiceException("invalid lucene query:" + query, ex);
>> 		} catch (IOException e) {
>> 			throw new LuceneServiceException(" indexSearcher could be corrupted", e);
>> 		} finally {
>> 			long durationMillis = System.currentTimeMillis() - startTimeMillis;
>> 			if (durationMillis > slowQueryLimit) {
>> 				log.warn("getHitsCount: Slow query: {} ms, query={}", durationMillis, query);
>> 			}
>> 			log.debug("getHitsCount: query took {} ms", durationMillis);
>> 		}
>> 		return count;
>> 	}
>>
>> search:
>> 	public List<Document> search(String query, Filter filter, Sort sort, int from,
>> int size) throws LuceneServiceException {
>> 		log.debug("{} search('{}', {}, {}, {}, {})", new Object[] { indexAlias, query,
>> filter, sort, from, size });
>> 		long startTimeMillis = System.currentTimeMillis();
>>
>> 		List<Document> docs = new ArrayList<Document>();
>> 		if (indexSearcher == null) {
>> 			return docs;
>> 		}
>> 		Query q = null;
>> 		try {
>> 			if (query == null) {
>> 				log.warn("search: lucene query is null...");
>> 				return docs;
>> 			}
>> 			q = createQuery(query);
>> 			BooleanQuery.setMaxClauseCount(MAXCLAUSECOUNT);
>> 			if (size < 0 || size > maxNumHits) {
>> 				// set hard limit for numHits
>> 				size = maxNumHits;
>> 				if (log.isDebugEnabled())
>> 					log.debug("search: Size set to hardlimit: {} for query: {} with filter:
>> {}", new Object[] { size, query, filter });
>> 			}
>> 			TopFieldCollector collector = TopFieldCollector.create(sort, size + from,
>> true, false, false, true);
>> 			indexSearcher.search(q, filter, collector);
>> 			if(size > collector.getTotalHits())
>> 				size = collector.getTotalHits();
>> 			if (size > 100000)
>> 				log.info("search: size: {} bigger than 100.000 for query: {} with filter:
>> {}", new Object[] { size, query, filter });
>> 			TopDocs td = collector.topDocs(from, size);
>> 			ScoreDoc[] scoreDocs = td.scoreDocs;
>> 			for (ScoreDoc scoreDoc : scoreDocs) {
>> 				docs.add(indexSearcher.doc(scoreDoc.doc));
>> 			}
>> 		} catch (ParseException e) {
>> 			log.warn("search: ParseException: {}", e.getMessage());
>> 			if (log.isDebugEnabled())
>> 				log.warn("search: ParseException: ", e);
>> 			return Collections.emptyList();
>> 		} catch (IOException e) {
>> 			log.warn("search: IOException: ", e);
>> 			return Collections.emptyList();
>> 		} finally {
>> 			long durationMillis = System.currentTimeMillis() - startTimeMillis;
>> 			if (durationMillis > slowQueryLimit) {
>> 				log.warn("search: Slow query: {} ms, query={}, indexUsed={}",
>> 						new Object[] { durationMillis, query,
>> indexSearcher.getIndexReader().directory() });
>> 			}
>> 			log.debug("search: query took {} ms", durationMillis);
>> 		}
>> 		return docs;
>> 	}
>>
>> I'm wondering why others are experiencing better performance with 2.9 and why
>> our implementations performance is going bad. Maybe our way of using the 2.9 api
>> is not the best and sorting is definetly expensive.
>>
>> Any ideas are appreciated. I'm torning out my hair since hours and days to find
>> the root cause. Also hints how I could find the bottlenecks myself are appreciated.
>>
>> Cheers,
>> Thomas
>>
> 
> 
> ------------------------------------------------------------------------
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org

-- 
Thomas Becker
Senior JEE Developer

net mobile AG
Zollhof 17
40221 Düsseldorf
GERMANY

Phone:    +49 211 97020-195
Fax:      +49 211 97020-949
Mobile:   +49 173 5146567 (private)
E-Mail:   mailto:thomas.becker@net-m.de
Internet: http://www.net-m.de

Registergericht:  Amtsgericht Düsseldorf, HRB 48022
Vorstand:         Theodor Niehues (Vorsitzender), Frank Hartmann,
                 Kai Markus Kulas, Dieter Plassmann
Vorsitzender des
Aufsichtsrates:   Dr. Michael Briem

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: lucene 2.9.0RC4 slower than 2.4.1?

Posted by Thomas Becker <th...@net-m.de>.

Missed the attachment, sorry.

Thomas Becker wrote:
> Hi all,
> 
> I'm experiencing a performance degradation after migrating to 2.9 and running
> some tests. I'm getting out of ideas and any help to identify the reasons why
> 2.9 is slower than 2.4 are highly appreciated.
> 
> We've had some issues with custom sorting in lucene 2.4.1. We worked around them
> by sorting the resultsets manually and caching the results after sorting (memory
> consuming but fast).
> 
> I now migrated to lucene 2.9.0RC4. Build some new FieldComparatorSource
> implementation for sorting and refactored all deprecated api calls to the new
> lucene 2.9 api.
> 
> Everything works fine from a functional perspective. But performance severly is
> (negatively) affected by lucene 2.9.
> 
> I profiled the application for a couple of hours, build a jmeter load test and
> compared the following scenarios:
> 
> 1. lucene 2.9 - new api
> 2. lucene 2.9 - old api and custom sorting after lucene
> 3. lucene 2.4.1 - old api and custom sorting after lucene (what we had up2now)
> 
> Please find attached an rrd graph showing the results. The lighter the color the
> faster the request has been served. y=# requests, x=time.
> 
> Most interestingly simply switching the lucene jars between 2.4 and 2.9 degraded
> response times and therefore throughput (see results of testcase 2 and 3).
> Adapting to the new api decreased performance again. The difference between
> testcase 1 and 2 is most probably due to precached custom sorted results.
> 
> The application under test is a dedicated lucene search engine doing nothing
> else, but serving search requests. We're running a cluster of them in prd and
> it's incredibly fast. With the old implementation and prd traffic we've above
> 98% of the requests served in 200ms.
> The index under test contains about 3 million documents (with lots of fields),
> consumes about 2,5gig disk space and is stored on a tmpfs RAMDISK provided by
> the linux kernel.
> 
> Most interesting methods used for searching are:
> 
> getHitsCount (is there a way to speed this up?):
> 
> 	public int getHitsCount(String query, Filter filter) throws
> LuceneServiceException {
> 		log.debug("getHitsCount('{}, {}')", query, filter);
> 		if (StringUtils.isBlank(query)) {
> 			log.warn("getHitsCount: empty lucene query");
> 			return 0;
> 		}
> 		long startTimeMillis = System.currentTimeMillis();
> 		int count = 0;
> 
> 		if (indexSearcher == null) {
> 			return 0;
> 		}
> 
> 		BooleanQuery.setMaxClauseCount(MAXCLAUSECOUNT);
> 		Query q = null;
> 		try {
> 			q = createQuery(query);
> 			TopScoreDocCollector tsdc = TopScoreDocCollector.create(1, true);
> 			indexSearcher.search(q, filter, tsdc);
> 			count = tsdc.getTotalHits();
> 			log.info("getHitsCount: count = {}",count);
> 		} catch (ParseException ex) {
> 			throw new LuceneServiceException("invalid lucene query:" + query, ex);
> 		} catch (IOException e) {
> 			throw new LuceneServiceException(" indexSearcher could be corrupted", e);
> 		} finally {
> 			long durationMillis = System.currentTimeMillis() - startTimeMillis;
> 			if (durationMillis > slowQueryLimit) {
> 				log.warn("getHitsCount: Slow query: {} ms, query={}", durationMillis, query);
> 			}
> 			log.debug("getHitsCount: query took {} ms", durationMillis);
> 		}
> 		return count;
> 	}
> 
> search:
> 	public List<Document> search(String query, Filter filter, Sort sort, int from,
> int size) throws LuceneServiceException {
> 		log.debug("{} search('{}', {}, {}, {}, {})", new Object[] { indexAlias, query,
> filter, sort, from, size });
> 		long startTimeMillis = System.currentTimeMillis();
> 
> 		List<Document> docs = new ArrayList<Document>();
> 		if (indexSearcher == null) {
> 			return docs;
> 		}
> 		Query q = null;
> 		try {
> 			if (query == null) {
> 				log.warn("search: lucene query is null...");
> 				return docs;
> 			}
> 			q = createQuery(query);
> 			BooleanQuery.setMaxClauseCount(MAXCLAUSECOUNT);
> 			if (size < 0 || size > maxNumHits) {
> 				// set hard limit for numHits
> 				size = maxNumHits;
> 				if (log.isDebugEnabled())
> 					log.debug("search: Size set to hardlimit: {} for query: {} with filter:
> {}", new Object[] { size, query, filter });
> 			}
> 			TopFieldCollector collector = TopFieldCollector.create(sort, size + from,
> true, false, false, true);
> 			indexSearcher.search(q, filter, collector);
> 			if(size > collector.getTotalHits())
> 				size = collector.getTotalHits();
> 			if (size > 100000)
> 				log.info("search: size: {} bigger than 100.000 for query: {} with filter:
> {}", new Object[] { size, query, filter });
> 			TopDocs td = collector.topDocs(from, size);
> 			ScoreDoc[] scoreDocs = td.scoreDocs;
> 			for (ScoreDoc scoreDoc : scoreDocs) {
> 				docs.add(indexSearcher.doc(scoreDoc.doc));
> 			}
> 		} catch (ParseException e) {
> 			log.warn("search: ParseException: {}", e.getMessage());
> 			if (log.isDebugEnabled())
> 				log.warn("search: ParseException: ", e);
> 			return Collections.emptyList();
> 		} catch (IOException e) {
> 			log.warn("search: IOException: ", e);
> 			return Collections.emptyList();
> 		} finally {
> 			long durationMillis = System.currentTimeMillis() - startTimeMillis;
> 			if (durationMillis > slowQueryLimit) {
> 				log.warn("search: Slow query: {} ms, query={}, indexUsed={}",
> 						new Object[] { durationMillis, query,
> indexSearcher.getIndexReader().directory() });
> 			}
> 			log.debug("search: query took {} ms", durationMillis);
> 		}
> 		return docs;
> 	}
> 
> I'm wondering why others are experiencing better performance with 2.9 and why
> our implementations performance is going bad. Maybe our way of using the 2.9 api
> is not the best and sorting is definetly expensive.
> 
> Any ideas are appreciated. I'm torning out my hair since hours and days to find
> the root cause. Also hints how I could find the bottlenecks myself are appreciated.
> 
> Cheers,
> Thomas
> 

-- 
Thomas Becker
Senior JEE Developer

net mobile AG
Zollhof 17
40221 D�sseldorf
GERMANY

Phone:    +49 211 97020-195
Fax:      +49 211 97020-949
Mobile:   +49 173 5146567 (private)
E-Mail:   mailto:thomas.becker@net-m.de
Internet: http://www.net-m.de

Registergericht:  Amtsgericht D�sseldorf, HRB 48022
Vorstand:         Theodor Niehues (Vorsitzender), Frank Hartmann,
                 Kai Markus Kulas, Dieter Plassmann
Vorsitzender des
Aufsichtsrates:   Dr. Michael Briem