You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Kevin Xiao <ke...@nextbio.com> on 2008/02/26 10:13:54 UTC

solr to handle special charater

Hi there,

I am new to Solr. I used the following analyzer - I tried both WhitespaceTokenizerFactory and StandardTokenizerFactory, but when I search "xyz - abc", it didn't returns anything, ("xyz abc" returns "xyz - abc" though). I used the tokenizer/filter on both index and query time. Is that a solr bug? How do I make it work?

Thanks,
- Kevin

      <fieldType name="prefix_token_1" class="solr.TextField" positionIncrementGap="1">
            <analyzer type="index">
                <!--
                  <tokenizer class="solr.WhitespaceTokenizerFactory" />
                  -->
                  <tokenizer class="solr.StandardTokenizerFactory" />
                  <filter class="solr.LowerCaseFilterFactory" />
                  <!--
                  <filter class="solr.EdgeNGramFilterFactory"     minGramSize="1" maxGramSize="100" />
                  -->
            </analyzer>
            <analyzer type="query">
                <!--
                  <tokenizer class="solr.WhitespaceTokenizerFactory" />
                  -->
                  <tokenizer class="solr.StandardTokenizerFactory" />
                  <filter class="solr.LowerCaseFilterFactory" />
            </analyzer>
      </fieldType>

RE: solr to handle special charater

Posted by Chris Hostetter <ho...@fucit.org>.

: By the way, I used DisMaxRequestHandler in solrconfig.xml. I googled a 
: little about DisMaxRequestHandler, it says that '+' and '-' characters 
: prefixing nonwhitespace characters are treated as "mandatory" and 
: "prohibited" modifiers for the subsequent terms, but it doesn't say 
: anything about just '+' or '-' characters.

Hmmm... well if you add debugQuery=true to your requests, and look at the 
parsed query string, you can see that the "-" is getting applied to the 
DisjuntionMax query being built for the second clause.  which means either 
the documentation in the wiki is wrong, or there is a bug.

i aparently wrote that documentation, and i wrote the orriginal dismax 
code ... I thought someone else had at some point added in some special 
escaping for "-" or "+" followed by whitespace (which would explain why i 
wrote that in the documentation), but i can't see any evidence of it now.

so i'm going to change the wiki, and open a bug to add a feature like that 
... but in the meantime...

: Does anyone know a workaround before I stripped off '+'/'-' by myself?

...i would just do that. 


-Hoss

RE: solr to handle special charater

Posted by Kevin Xiao <ke...@nextbio.com>.

By the way, I used DisMaxRequestHandler in solrconfig.xml. I googled a little about DisMaxRequestHandler, it says that '+' and '-' characters prefixing nonwhitespace characters are treated as "mandatory" and "prohibited" modifiers for the subsequent terms, but it doesn't say anything about just '+' or '-' characters.

Does anyone know a workaround before I stripped off '+'/'-' by myself?

Thanks,
- Kevin

-----Original Message-----
From: Kevin Xiao [mailto:kevin@nextbio.com]
Sent: Tuesday, February 26, 2008 1:14 AM
To: solr-user@lucene.apache.org
Subject: solr to handle special charater

Hi there,

I am new to Solr. I used the following analyzer - I tried both WhitespaceTokenizerFactory and StandardTokenizerFactory, but when I search "xyz - abc", it didn't returns anything, ("xyz abc" returns "xyz - abc" though). I used the tokenizer/filter on both index and query time. Is that a solr bug? How do I make it work?

Thanks,
- Kevin

      <fieldType name="prefix_token_1" class="solr.TextField" positionIncrementGap="1">
            <analyzer type="index">
                <!--
                  <tokenizer class="solr.WhitespaceTokenizerFactory" />
                  -->
                  <tokenizer class="solr.StandardTokenizerFactory" />
                  <filter class="solr.LowerCaseFilterFactory" />
                  <!--
                  <filter class="solr.EdgeNGramFilterFactory"     minGramSize="1" maxGramSize="100" />
                  -->
            </analyzer>
            <analyzer type="query">
                <!--
                  <tokenizer class="solr.WhitespaceTokenizerFactory" />
                  -->
                  <tokenizer class="solr.StandardTokenizerFactory" />
                  <filter class="solr.LowerCaseFilterFactory" />
            </analyzer>
      </fieldType>