You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "michael.boom" <my...@yahoo.com> on 2013/10/29 12:20:04 UTC

Phrase query combined with term query for maximum accuracy

For maximum search accuracy on my SolrCloud system i was thinking of
combining phrase search with term search in the following way:
search term: john doe
search fields: title, description - a match in the title is more relevant
than one in the description

What i want to achieve - the following document ranking:
1. a hard match for "john doe" in the title
2. a  hard match for "john doe" in the description
3. a match of "john" OR "doe" in the title
4. a match of "john" OR "doe" in the description

What I've got until now:
Using edismax parser:
q.op=or&q=title:"john doe"^100 OR description:"john doe"^50 OR title:john
doe^30 OR description:john doe^10

Would the above query provide me what i want, or is there a better way to do
it?
Thanks!




-----
Thanks,
Michael
--
View this message in context: http://lucene.472066.n3.nabble.com/Phrase-query-combined-with-term-query-for-maximum-accuracy-tp4098215.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Phrase query combined with term query for maximum accuracy

Posted by "michael.boom" <my...@yahoo.com>.
Thanks Jack!
I tried it and i get a really funny behaviour: I have two collections,
having the same solrconfig.xml and the same schema definition, except for
the type of some fields, which in collection_DE are customized for German
languange and in collection_US for English

  <fieldType name="text_de" class="solr.TextField"
positionIncrementGap="100">
    <analyzer> 
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.StopFilterFactory" ignoreCase="true"
words="lang/stopwords_de.txt" format="snowball" />
      <filter class="solr.GermanNormalizationFilterFactory"/>
      <filter class="solr.GermanLightStemFilterFactory"/>
    </analyzer>
  </fieldType>

  <fieldType name="text_en" class="solr.TextField"
positionIncrementGap="100">
    <analyzer> 
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.StopFilterFactory" ignoreCase="true"
words="lang/stopwords_en.txt" format="snowball" />
    </analyzer>
  </fieldType>

Fields "title" and "text" have the corresponding type (text_de in
collection_DE and text_en in collection_US)

Now, when i run this query:
/solr/collection_US/select/?q=title:"blue hat"^100 OR text:"blue hat"^50 OR
title:(blue hat)^30 OR text:(blue
hat)^10&fq=active:true&start=0&rows=40&sort=score+desc&fl=*,score&country=US

i get error:

No live SolrServers available to handle this
request:[http://xxx:8983/solr/collection_US_shard2_replica1,
http://xxx:8983/solr/collection_US_shard2_replica2]","trace":"org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException: No live SolrServers
available to handle this
request:[http://xx:8983/solr/collection_US_shard2_replica1,
http://xx:8983/solr/collection_US_shard2_replica2]\n\tat
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:302)\n\tat
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)\n\tat
org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)\n\tat
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1489)\n\tat
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:517)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:138)\n\tat
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:540)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:213)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1097)\n\tat
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:446)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:175)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1031)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:136)\n\tat
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:200)\n\tat
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:109)\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)\n\tat
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:317)\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)\n\tat
org.eclipse.jetty.server.Server.handle(Server.java:445)\n\tat
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:269)\n\tat
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:229)\n\tat
org.eclipse.jetty.io.AbstractConnection$ReadCallback.run(AbstractConnection.java:358)\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:601)\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:532)\n\tat
java.lang.Thread.run(Thread.java:724)\nCaused by:
org.apache.solr.client.solrj.SolrServerException: No live SolrServers
available to handle this
request:[http://xxx:8983/solr/collection_US_shard2_replica1,
http://xxx:8983/solr/collection_US_shard2_replica2]\n\tat
org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:333)\n\tat
org.apache.solr.handler.component.HttpShardHandlerFactory.makeLoadBalancedRequest(HttpShardHandlerFactory.java:214)\n\tat
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:158)\n\tat
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:119)\n\tat
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)\n\tat
java.util.concurrent.FutureTask.run(FutureTask.java:166)\n\tat
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)\n\tat
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)\n\tat
java.util.concurrent.FutureTask.run(FutureTask.java:166)\n\tat
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)\n\tat
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)\n\t...
1 more\nCaused by:
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Server
at http://xxx:8983/solr/collection_US_shard2_replica2 returned non ok
status:500, message:Server Error\n\tat
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:385)\n\tat
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)\n\tat
org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:264)\n\t...
11 more\n","code":500}}


but if i run this query (the previous with lowercased operators), everything
works fine.
/solr/collection_US/select/?q=title:"blue hat"^100 or text:"blue hat"^50 or
title:(blue hat)^30 or text:(blue
hat)^10&fq=active:true&start=0&rows=40&sort=score+desc&fl=*,score&country=US


for collection_DE it works fine with both lowercase and uppercase operators



-----
Thanks,
Michael
--
View this message in context: http://lucene.472066.n3.nabble.com/Phrase-query-combined-with-term-query-for-maximum-accuracy-tp4098215p4098596.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Phrase query combined with term query for maximum accuracy

Posted by "michael.boom" <my...@yahoo.com>.
One more thing i just noticed:
if for collection_US i try to search for 
title:"blue hat"^100 OR text:"blue hat"^50    -> i get the same error
but if i search for :
title:"blue hat"^100 OR text:"bluehat"^50     -> it works fine



-----
Thanks,
Michael
--
View this message in context: http://lucene.472066.n3.nabble.com/Phrase-query-combined-with-term-query-for-maximum-accuracy-tp4098215p4098599.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Phrase query combined with term query for maximum accuracy

Posted by Jack Krupansky <ja...@basetechnology.com>.
You need some parentheses:

title:john doe^30 OR description:john doe^10

should be:

title:(john doe)^30 OR description:(john doe)^10

-- Jack Krupansky

-----Original Message----- 
From: michael.boom
Sent: Tuesday, October 29, 2013 7:20 AM
To: solr-user@lucene.apache.org
Subject: Phrase query combined with term query for maximum accuracy

For maximum search accuracy on my SolrCloud system i was thinking of
combining phrase search with term search in the following way:
search term: john doe
search fields: title, description - a match in the title is more relevant
than one in the description

What i want to achieve - the following document ranking:
1. a hard match for "john doe" in the title
2. a  hard match for "john doe" in the description
3. a match of "john" OR "doe" in the title
4. a match of "john" OR "doe" in the description

What I've got until now:
Using edismax parser:
q.op=or&q=title:"john doe"^100 OR description:"john doe"^50 OR title:john
doe^30 OR description:john doe^10

Would the above query provide me what i want, or is there a better way to do
it?
Thanks!




-----
Thanks,
Michael
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Phrase-query-combined-with-term-query-for-maximum-accuracy-tp4098215.html
Sent from the Solr - User mailing list archive at Nabble.com.