You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by lutan <ws...@live.cn> on 2008/06/10 15:11:12 UTC

The performance of lucene searching(web entironment) test

I have recently done some tests on lucene. I do not know whether the test results normal. hd entironment:Intel(R) Xeon(R) CPU   5110  @ 1.60GHz4GB ram sw entironment:centOS4.6+sun jdk 1.5+jboss+lucene2.3.2+je-analysis(a chinese analysis)there are 10 million+ documents which total about 3GB test steps: 1 run single searcher.jsp in jboss(tuning ,and use 1GB ram)2 use loadrunner  to test   simulation  10 user concurrent  request.    the TPS(transactions per second) about 10   simulation  50 user concurrent  request.    the TPS(transactions per second) about 8   simulation  100 user concurrent  request.    the TPS(transactions per second) about 2 and the jsp was very simple,index in local file system-------------------------------------------------------------------------------------------------  <body>    <center>   <form action="lucene.jsp" method="post" name="form1" >    <input type="text" value="" name="keyword2"/>    <input type="submit" value="searcher" onclick="SUB()"/>   
  <input type="reset" value="exit"/>   </form>   </center>     <hr>  <%   if(request.getParameter("keyword2")!=null && !"".equals(request.getParameter("keyword2"))) {    String dir="/usr/local/index";  String key="name";  String word = new String(request.getParameter("keyword2"),"utf-8") ;  Searcher searcher = null;  searcher = new IndexSearcher(FSDirectory.getDirectory(dir, false));  Analyzer myAnalyzer=new jeasy.analysis.MMAnalyzer();  QueryParser queryParser=new QueryParser(key,myAnalyzer);  Query query=queryParser.parse(word);           Hits hits = null;  long startTime = System.nanoTime();        hits= searcher.search(query);          long estimatedTime = System.nanoTime() - startTime;         BigDecimal bb = new BigDecimal(estimatedTime);        BigDecimal ee = new BigDecimal(1000000000);        System.out.println("Key word: "+word+" Hits:" + hits.length()+"  Cost time: "+ bb.divide(ee) + "/s");    searcher.close();    }  out.print("ABC") ; %>  </body>   ---------------
 ----------------------search.jsp--------------------------------------------------------- and I also try to use Singleton IndexSearcher ,but it's seam not helpful.-------------------------------------------------------------------------------- public IndexSearcher getIndexSearcher() throws IOException {  if (this.indexSearcher == null) {   return new IndexSearcher(FSDirectory.getDirectory(folder, false));  } else {   IndexReader ir = indexSearcher.getIndexReader();   if (!ir.isCurrent()) {    this.indexSearcher.close();    this.indexSearcher = new IndexSearcher(FSDirectory.getDirectory(folder, false));    ir = indexSearcher.getIndexReader();    if (ir.hasDeletions()) {     if (this.indexWriter != null) {      this.indexWriter.optimize();     }    }   }   return this.indexSearcher;  } }------------------------------------GetsingletonIndexsearcher.java --------------------------------------------- use the same code in application search one times per 0.5s average.so how do I i
 mprove the seaching  performance in  concurrent entironment ? Does the hd entironment: Intel(R) Xeon(R) CPU   5110  @ 1.60GHz4GB ramgive  me     50+TPS?
_________________________________________________________________
用手机MSN聊天写邮件看空间,无限沟通,分享精彩!
http://mobile.msn.com.cn/

RE: The performance of lucene searching(web entironment) test

Posted by lutan <ws...@live.cn>.
Very grateful for Toke Eskildsen of attention my questions.
> Date: Fri, 13 Jun 2008 08:59:27 +0200> From: te@statsbiblioteket.dk> Subject: RE: The performance of lucene searching(web entironment) test> To: java-user@lucene.apache.org> > On Wed, 2008-06-11 at 18:56 +0800, lutan wrote:> > Yes ,I have test again with same entironment but to use singleton > > IndexSearcher.the performance has increased. there 100 concurrent> > user request use different keyword ,and get 60 TPS(2 TPS before).> > and now the bottleneck seem to be CPU,and the CPU using approach > > 100%.and both RAM(using 70MB average), HD using as normal.> > It sounds like you have found the solution to your immediate problem.> Great.> 
 
 
The performance increase dependents on your suggestion.
Today I hava another tesing,and using  RemoteSearchable(code like 
the example of <lucene in action> supply).
app runing setps:
1,A customer request a keyword to web(JBoss:192.168.0.1). 
2,JBoss call RMIServer(192.168.0.2)(the index file on it).
other tesing entironment as same as before.
 
the result:
loadrunner: 300 concurrent user(I find one user ,one TCP/IP 
connection  form WebServer  to  RMIServer),
and  the TPS got 180+,web response time is
 about 2 second average. both WebServer and RMIServer
 has being using as normal of 
cpu(50%),ram(not full).
 
the performance almost  achieve thrice !
 It's amazing to me:)
I consider the method of RMI would hava low performance(
because of expensively net using),
but the  result is really puzzled me  :(
 
 
 
> > Could I consider that as long as I have a larger capacity RAM ,and I > > will get a good performance.> > Depends on your index-size (in bytes). When your index grows, less and> less of it can fit in the disk-cache and more time will be required for> proper warm-up. But the change will happen gradually, so you'll only be> surprised if you suddenly increase your index-size to double or more> size.> 
> > I don't understand " for disk-cache" meaning very clear.Could you please> > explain it again.Thanks a lot!(does't cache on RAM?)> > does warm-up == cache?> > There are (at least) two important memory mechanisms to consider.> My apologies if some of this is basic knowledge to you:> > 1) Disk-cache.> In general, the free RAM on your Linux-system is used for disk-cache.> With an index-size of 3GB and (just a guess) 1 GB free RAM, the> operating system is able to cache 1/3 or less of your index. If you open> the same index several times in a row, the disk-cache will be warmed to> the relevant parts of your index, so that you're not even hitting the> disk after a while. At least not for opening. This is the effect you> observed with your non-singleton based test, where the speed increased> slowly up to a not-so-high level.> > 2) Lucene internal structures.> I don't know much about this, so I hope somebody will correct me if I> make mistakes: Lucene has some internal structures
  that are initialized> when searches are performed. Depending on setup, this initialization can> be quite heavy (custom search for example). Performing warm-up, such as> searching with previously logged queries, will initialize these> structures before the real queries are received. This is the effect you> observed with your singleton searcher.> > 1 & 2 can be seen in combination, as the initialization of the internal> structures in Lucene requires a fair amount of seeks in the index data.> If there's nothing in the disk-cache and a conventional platter-based> harddisk is used, it takes some time. If the disk-cache is warmed from> previous use or a solid state drive setup is used, it is much faster.> 
 
 
I have understand it by your reply,thanks a lot.
 
> > how many docs do lucene will be cached default?and could I control the> > cache size?> > I don't know. Maybe someone else will chime in?> > > ---------------------------------------------------------------------> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org> For additional commands, e-mail: java-user-help@lucene.apache.org> 
_________________________________________________________________
用手机MSN聊天写邮件看空间,无限沟通,分享精彩!
http://mobile.msn.com.cn/

RE: The performance of lucene searching(web entironment) test

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
On Wed, 2008-06-11 at 18:56 +0800, lutan wrote:
> Yes ,I have test again with same entironment but to use singleton 
> IndexSearcher.the  performance has increased. there 100 concurrent
> user request use different keyword ,and get 60 TPS(2 TPS before).
> and now the bottleneck  seem to be CPU,and the CPU using approach 
> 100%.and both RAM(using 70MB average), HD using as normal.

It sounds like you have found the solution to your immediate problem.
Great.

> Could I consider that as long as I have a larger capacity RAM ,and I 
> will get a good performance.

Depends on your index-size (in bytes). When your index grows, less and
less of it can fit in the disk-cache and more time will be required for
proper warm-up. But the change will happen gradually, so you'll only be
surprised if you suddenly increase your index-size to double or more
size.

> I don't understand  " for disk-cache" meaning  very  clear.Could you please
> explain it again.Thanks a lot!(does't cache on RAM?)
> does warm-up  ==  cache?

There are (at least) two important memory mechanisms to consider.
My apologies if some of this is basic knowledge to you:

1) Disk-cache.
In general, the free RAM on your Linux-system is used for disk-cache.
With an index-size of 3GB and (just a guess) 1 GB free RAM, the
operating system is able to cache 1/3 or less of your index. If you open
the same index several times in a row, the disk-cache will be warmed to
the relevant parts of your index, so that you're not even hitting the
disk after a while. At least not for opening. This is the effect you
observed with your non-singleton based test, where the speed increased
slowly up to a not-so-high level.

2) Lucene internal structures.
I don't know much about this, so I hope somebody will correct me if I
make mistakes: Lucene has some internal structures that are initialized
when searches are performed. Depending on setup, this initialization can
be quite heavy (custom search for example). Performing warm-up, such as
searching with previously logged queries, will initialize these
structures before the real queries are received. This is the effect you
observed with your singleton searcher.

1 & 2 can be seen in combination, as the initialization of the internal
structures in Lucene requires a fair amount of seeks in the index data.
If there's nothing in the disk-cache and a conventional platter-based
harddisk is used, it takes some time. If the disk-cache is warmed from
previous use or a solid state drive setup is used, it is much faster.

>  how many docs do lucene will be cached default?and could I control the
>  cache size?

I don't know. Maybe someone else will chime in?


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: The performance of lucene searching(web entironment) test

Posted by lutan <ws...@live.cn>.
Thanks for you replay!> Date: Wed, 11 Jun 2008 09:19:46 +0200> From: te@statsbiblioteket.dk> Subject: RE: The performance of lucene searching(web entironment) test> To: java-user@lucene.apache.org> > On Wed, 2008-06-11 at 00:17 +0800, lutan wrote:> > In my test case , I start loadrunner jsut test for 5 minute,and the response > > growth slowly.the TPS(transactions per second) seems stoped at 10 finally.> > That's without reusing the searcher, right? In that case the increased> rate must be attributed to the disk cache being warmed. Please try and> test again with the searcher being reused.> 
 
 
Yes ,I have test again with same entironment but to use singleton IndexSearcher.the  performance 
has increased. there 100 concurrent user request use different keyword ,and get 60 TPS(2 TPS before).
and now the bottleneck  seem to be CPU,and the CPU using approach 100%.and both RAM(using 70MB average),
HD using as normal.
 
> > In addition,does lucene has bottleneck about the number of documents or index size..?> > Not to my knowledge. Search time and RAM consumption goes up, of course,> but I'm not aware of any special point where things start to go bad at> an increased rate.> 
 
Could I consider that as long as I have a larger capacity RAM ,and I 
will get a good performance.
 
 
> > Does the hd entironment: Intel(R) Xeon(R) CPU 5110 @ 1.60GHz4GB > > ramgive me 50+TPS?> > With an index of 10M/3GB? It doesn't sound unrealistic after warm-up.> How much RAM is available for disk-cache, when the machine is running?> 
 
 
I don't understand  " for disk-cache" meaning  very  clear.Could you please
explain it again.Thanks a lot!(does't cache on RAM?)
 does warm-up  ==  cache?
 how many docs do lucene will be cached default?and could I control the cache size?
 
I am new to lucene ,maybe my questions  looks  not professional.
forgive me. 
> > ---------------------------------------------------------------------> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org> For additional commands, e-mail: java-user-help@lucene.apache.org> 
_________________________________________________________________
新年换新颜,快来妆扮自己的MSN给心仪的TA一个惊喜!
http://im.live.cn/emoticons/?ID=18

RE: The performance of lucene searching(web entironment) test

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
On Wed, 2008-06-11 at 00:17 +0800, lutan wrote:
> In my test case , I start loadrunner jsut test for 5 minute,and the response 
> growth slowly.the TPS(transactions per second) seems stoped at 10 finally.

That's without reusing the searcher, right? In that case the increased
rate must be attributed to the disk cache being warmed. Please try and
test again with the searcher being reused.

> In addition,does lucene has bottleneck about the number of documents or index size..?

Not to my knowledge. Search time and RAM consumption goes up, of course,
but I'm not aware of any special point where things start to go bad at
an increased rate.

> Does the hd entironment: Intel(R) Xeon(R) CPU   5110  @ 1.60GHz4GB 
> ramgive  me     50+TPS?

With an index of 10M/3GB? It doesn't sound unrealistic after warm-up.
How much RAM is available for disk-cache, when the machine is running?


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: The performance of lucene searching(web entironment) test

Posted by lutan <ws...@live.cn>.
Thanks  for the reply! 
 
In my test case , I start loadrunner jsut test for 5 minute,and the response growth slowly.the TPS(transactions per second) seems stoped at 10 finally.
I will run a test for a longer time again.
In addition,does lucene has bottleneck about the number of documents or index size..?
 
> Date: Tue, 10 Jun 2008 16:34:17 +0200> From: te@statsbiblioteket.dk> Subject: Re: The performance of lucene searching(web entironment) test> To: java-user@lucene.apache.org> > On Tue, 2008-06-10 at 21:11 +0800, lutan wrote:> > [A lot of text with code and no newlines, making it very hard to read]> > In your test you're reusing the searcher. For each search your program> performs, you will see faster response times, until the searcher is> fully warmed.> > If your production-system, you re-open your searcher every time and do> not have the benefit of a warmed searcher.> > So yes, Singleton searcher helps, as opposed to opening a searcher for> every search. Try making a test where the only thing you do is open a> searcher 100 times and you will see that it takes a non-trivial amount> of time.> > > > ---------------------------------------------------------------------> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org> For additional commands, e-mail: java-user-h
 elp@lucene.apache.org> 
_________________________________________________________________
Windows Live Photo gallery 数码相机的超级伴侣,轻松管理和编辑照片,还能制作全景美图!
http://get.live.cn/product/photo.html

Re: The performance of lucene searching(web entironment) test

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
On Tue, 2008-06-10 at 21:11 +0800, lutan wrote:
> [A lot of text with code and no newlines, making it very hard to read]

In your test you're reusing the searcher. For each search your program
performs, you will see faster response times, until the searcher is
fully warmed.

If your production-system, you re-open your searcher every time and do
not have the benefit of a warmed searcher.

So yes, Singleton searcher helps, as opposed to opening a searcher for
every search. Try making a test where the only thing you do is open a
searcher 100 times and you will see that it takes a non-trivial amount
of time.



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org