You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Karl Wettin <ka...@gmail.com> on 2009/10/08 10:04:44 UTC

Output from a small Snowball benchmark

There have been a few small comments in the Jira about the reflection  
in Snowball's Among class. There is very little to do about this  
unless one want to redesign the stemmers so they include an inner  
class that handle the method callbacks. That's quite a bit of work and  
I don't even know how much CPU one would save by doing this.

So I was thinking maybe it would save a some resources if one reused  
the stemmers instead of reinstantiating them, which I presume  
everybody does.

I thought it would make most sense to simulate query time stemming so  
my benchmark contained 4 words where 2 of them are plural. Each test  
ran 1 000 000 times. The amount of CPU time used is bearly noticeable  
relative to what other things cost: 0.0109ms/iteration when  
reinstantiating, 0.0067ms/iteration when reusing.

The heap consuption was however rather different. At the end of  
reinstantiation it had consumed about 10x more than when reusing.  
~20MB vs. ~2MB.


I realize people don't usally run 1 000 000 queries in so short time,  
but at least this is an indication that one could save some GC time  
here. Many a mickle makes a muckle...

So I was thinking that perhaps it would make sense with something like  
a singleton concurrent queue in the SnowballFilter and a new  
constructor that takes the snowball program implementation class as an  
argument.

But this might also be way premature optimization.


          karl

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org