You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Haishan Chen <ha...@msn.com> on 2007/11/01 01:01:29 UTC

RE: Phrase Query Performance Question


> From: mike.klaas@gmail.com> Subject: Re: Phrase Query Performance Question> Date: Wed, 31 Oct 2007 15:25:42 -0700> To: solr-user@lucene.apache.org> > On 31-Oct-07, at 2:40 PM, Haishan Chen wrote:> > >> > http://mail-archives.apache.org/mod_mbox/lucene-java-user/ > > 200512.mbox/%3c4397F720.9070007@getopt.org%3e> > It mentioned that http://websearch.archive.org/katrina/ (in nutch) > > had 10M documents and a search of "hurricane katrina" was able to > > return in 1.35 seconds with 600,867 hits. Althought the computer > > it was using might be more powerful than mine. I feel 937ms for a > > phrase query on a single field is kind of slower. Nutch actually > > expand a search to more complex queries. My index and the number of > > hits on my query ("auto repair") is about one fifth of > > websearch.archive.org and its testing query. So I feel a reasonable > > performance for my query should be less than 300 ms. I am not sure > > if I am right on that logic.> > I'm not sure that it is reasonable, but I'm not sure that it isn't. > However, have you tried other queries? 937ms seems a little high, > even for phrase queries.> > > Anyway I will collect the statistic on linux first and try out > > other options.> > Have you tried using the performance enhancements present in solr-trunk?> > -Mike
 
Here are some query statistic. The phrase queries look slow to me.  
These are queries have more than 100000 hits. For those return a couple thousand hits the responds time is quite fast. 
But this is query on one field only. 
 
("auto repair")  100384 hits 946 ms(auto repair)  100384 hits  31ms("car repair"~1000000)  112183 hits  766 ms(car repair)    112183 hits  63 ms("business service"~1000000) 1209751 hits  1500 ms(business service)  1209751 hits  234 ms("shopping center"~1000000) 119481 hits       359 ms(shopping center~1000000) 119481 hits       63 ms
 
I don't know what is solr-trunk yet but I will find out
 
Thank you
Haishan
 
 
 
_________________________________________________________________
Climb to the top of the charts!  Play Star Shuffle:  the word scramble challenge with star power.
http://club.live.com/star_shuffle.aspx?icid=starshuffle_wlmailtextlink_oct

RE: Phrase Query Performance Question

Posted by Haishan Chen <ha...@msn.com>.



> Date: Wed, 31 Oct 2007 19:19:07 -0700> From: hossman_lucene@fucit.org> To: solr-user@lucene.apache.org> Subject: RE: Phrase Query Performance Question> > > : ("auto repair") 100384 hits 946 ms(auto repair) 100384 hits 31ms("car > : repair"~1000000) 112183 hits 766 ms(car repair) 112183 hits 63 > : ms("business service"~1000000) 1209751 hits 1500 ms(business service) > : 1209751 hits 234 ms("shopping center"~1000000) 119481 hits 359 > : ms(shopping center~1000000) 119481 hits 63 ms> > if i'm reading those numbers right, every document in your corpus > containing the words "auto" or "repair" also contains the exact phrase > "auto repair" with no slop ... this seems HIGHLY unlikely. can you show > us *exactly* what the query URLs you are using look like, and show us what > the request handler section of your solrconfig.xml looks like.
 
 
Yes that's exactly what the documents are like. The documents are categorized. I indexed the category with the content 
of the documents using text field type.  The URL I used is select?q=content:("auto repair"~1000000)&fl=title. All other options like faceting, highlighting are not used.
 
> > also: where are you getting these times from? are these from the logging > output solr produces, or from the client you have hitting solr?> > : I don't know what is solr-trunk yet but I will find out> > he's refering to the unreleased develoment code, which you can checkout > from the "trunk" of the SOlr subversion repository...> > http://lucene.apache.org/solr/version_control.html> > > -Hoss> 
 
I am getting the time from the client browser
 
 
Thanks
-Haishan
 
 
 
 
 
 
 
_________________________________________________________________
Help yourself to FREE treats served up daily at the Messenger Café. Stop by today.
http://www.cafemessenger.com/info/info_sweetstuff2.html?ocid=TXT_TAGLM_OctWLtagline

RE: Phrase Query Performance Question

Posted by Chris Hostetter <ho...@fucit.org>.
: ("auto repair")  100384 hits 946 ms(auto repair)  100384 hits 31ms("car 
: repair"~1000000)  112183 hits 766 ms(car repair)  112183 hits 63 
: ms("business service"~1000000) 1209751 hits 1500 ms(business service)  
: 1209751 hits 234 ms("shopping center"~1000000) 119481 hits 359 
: ms(shopping center~1000000) 119481 hits 63 ms

if i'm reading those numbers right, every document in your corpus 
containing the words "auto" or "repair" also contains the exact phrase 
"auto repair" with no slop ... this seems HIGHLY unlikely.  can you show 
us *exactly* what the query URLs you are using look like, and show us what 
the request handler section of your solrconfig.xml looks like.

also: where are you getting these times from?  are these from the logging 
output solr produces, or from the client you have hitting solr?

: I don't know what is solr-trunk yet but I will find out

he's refering to the unreleased develoment code, which you can checkout 
from the "trunk" of the SOlr subversion repository...

http://lucene.apache.org/solr/version_control.html


-Hoss