You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by roySolr <ro...@gmail.com> on 2011/07/14 11:29:29 UTC

- character in search query

Hello,

I have some problem with characters in the search term. I have some query's
like this:

Arsenal - london
Ajax - amsterdam
Arsenal - moskou
Arsenal - China

When i send arsenal - london to SOLR i get 2 results, China and moskou. I
looked in the debugQuery and it looks like solr is searching for Arsenal
that's not in london. How can i fix that SOLR handle the - as normal text?

i tried something like this but it's not working:
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="-"
replacement=""/>

Quotes is working("Arsenal - london") but then i cannot search for london
arsenal anymore.


--
View this message in context: http://lucene.472066.n3.nabble.com/character-in-search-query-tp3168604p3168604.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: - character in search query

Posted by Erick Erickson <er...@gmail.com>.
dismax is a fairly narrow-use parser. By that I mean it was created
to solve a specific issue. It has some pronounced warts as you've
discovered.

edismax is the preferred parser if you have access to it. I'd just
ignore dismax if you have access to edismax. There's been some
talk of deprecating dismax in favor of edismax in fact.

But if you really want to know, see:
https://issues.apache.org/jira/browse/SOLR-1553

Best
Erick

On Wed, Jul 20, 2011 at 5:10 AM, roySolr <ro...@gmail.com> wrote:
> When i use the edismax handler the escaping works great(before i used the
> dismax handler).The debugQuery shows me this:
>
> +((DisjunctionMaxQuery((name:arsenal)~1.0)
> DisjunctionMaxQuery((name:london)~1.0))~2
>
> The "\" is not in the parsedquery, so i get the results i wanted. I don't
> know why the dismax handler working this way.
>
> Can someone tells me the difference between the dismax and edismax handler?
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/character-in-search-query-tp3168604p3184941.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: - character in search query

Posted by roySolr <ro...@gmail.com>.
When i use the edismax handler the escaping works great(before i used the
dismax handler).The debugQuery shows me this:

+((DisjunctionMaxQuery((name:arsenal)~1.0)
DisjunctionMaxQuery((name:london)~1.0))~2

The "\" is not in the parsedquery, so i get the results i wanted. I don't
know why the dismax handler working this way.

Can someone tells me the difference between the dismax and edismax handler?



--
View this message in context: http://lucene.472066.n3.nabble.com/character-in-search-query-tp3168604p3184941.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: - character in search query

Posted by roySolr <ro...@gmail.com>.
Here is my complete fieldtype:

<fieldType name="name" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
	<charFilter class="solr.HTMLStripCharFilterFactory"/>
      	<tokenizer class="solr.PatternTokenizerFactory" pattern="\s|," />
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.PatternReplaceFilterFactory" pattern="-"
replacement=""/> 
      	<filter class="solr.ASCIIFoldingFilterFactory"/> 
      </analyzer>
    </fieldType>

In the Field Analysis i see that the - is removed by the
patternreplaceFilter. When i escaped the term($q =
SolrUtils::escapeQueryChars($q);) i see in my debugQuery something like
this(term = arsenal - london):

+((DisjunctionMaxQuery((name:arsenal)~1.0) DisjunctionMaxQuery((name:"\
london"~1.0))~2) ()

When i don't escaped the query i get something like this:

+((DisjunctionMaxQuery((name:arsenal)~1.0)
-DisjunctionMaxQuery((name:london)~1.0))~1) ()

The "-" is my term is used by the -DisjunctionMaxQuery. How can i fix this
problem? What is the Easiest way?



--
View this message in context: http://lucene.472066.n3.nabble.com/character-in-search-query-tp3168604p3184805.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: - character in search query

Posted by Erick Erickson <er...@gmail.com>.
Let's see the complete <fieldType> definition. Have you looked at
your index with, say, Luke and seen what's actually in your
index? And do you re-index after each schema change?

What does your admin/analysis page look like? Have you considered
PatternReplaceCharFilterFactory rather than the tokenizer?

Best
Erick

On Tue, Jul 19, 2011 at 7:48 AM, roySolr <ro...@gmail.com> wrote:
> Anybody?
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/character-in-search-query-tp3168604p3182228.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: - character in search query

Posted by roySolr <ro...@gmail.com>.
Anybody?

--
View this message in context: http://lucene.472066.n3.nabble.com/character-in-search-query-tp3168604p3182228.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: - character in search query

Posted by roySolr <ro...@gmail.com>.
Yes, i had some tokenizer like this:

<tokenizer class="solr.PatternTokenizerFactory" pattern="\s|-|," />

Now i removed the - from this tokenizer and the debugQuery looks like this:

(name:arsenal | city:arsenal)~1.0 (name:\- | city:\-)~1.0 (name:london |
city:london)~1.0 

Still i get no results..

--
View this message in context: http://lucene.472066.n3.nabble.com/character-in-search-query-tp3168604p3168885.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: - character in search query

Posted by François Schiettecatte <fs...@gmail.com>.
Easy, the hyphen is out on its own (with spaces on either side) and is probably getting removed from the search by the tokenizer. Check your analysis.

François

On Jul 14, 2011, at 6:05 AM, roySolr wrote:

> It looks like it's still not working.
> 
> I send this to SOLR: q=arsenal \- london
> 
> I get no results. When i look at the debugQuery i see this:
> 
> (name: arsenal | city:arsenal)~1.0 (name: \ | city:\)~1.0 (name: london |
> city: london)~1.0
> 
> 
> my requesthandler:
> 
>    <requestHandler name="dismax" class="solr.SearchHandler" default="true">
>    <lst name="defaults">
> 	     <str name="defType">dismax</str>
> 	     <str name="qf">
> 	        name city
> 	     </str>
> 	</lst>
>  </requestHandler>
> 
> What is going wrong?
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/character-in-search-query-tp3168604p3168666.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: - character in search query

Posted by roySolr <ro...@gmail.com>.
It looks like it's still not working.

I send this to SOLR: q=arsenal \- london

I get no results. When i look at the debugQuery i see this:

(name: arsenal | city:arsenal)~1.0 (name: \ | city:\)~1.0 (name: london |
city: london)~1.0


my requesthandler:

    <requestHandler name="dismax" class="solr.SearchHandler" default="true">
    <lst name="defaults">
	     <str name="defType">dismax</str>
	     <str name="qf">
	        name city
	     </str>
	</lst>
  </requestHandler>

What is going wrong?

--
View this message in context: http://lucene.472066.n3.nabble.com/character-in-search-query-tp3168604p3168666.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: - character in search query

Posted by roySolr <ro...@gmail.com>.
thanks!

I use the escape function of the solr pecl package to escape special
characters

http://docs.php.net/manual/kr/solrutils.escapequerychars.php

--
View this message in context: http://lucene.472066.n3.nabble.com/character-in-search-query-tp3168604p3168638.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: - character in search query

Posted by James Bond Fang <mi...@qq.com>.
Using '\' to escape.

--
View this message in context: http://lucene.472066.n3.nabble.com/character-in-search-query-tp3168604p3168612.html
Sent from the Solr - User mailing list archive at Nabble.com.