You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Kevin Xiao <ke...@nextbio.com> on 2008/02/26 10:13:54 UTC
solr to handle special charater
Hi there,
I am new to Solr. I used the following analyzer - I tried both WhitespaceTokenizerFactory and StandardTokenizerFactory, but when I search "xyz - abc", it didn't returns anything, ("xyz abc" returns "xyz - abc" though). I used the tokenizer/filter on both index and query time. Is that a solr bug? How do I make it work?
Thanks,
- Kevin
<fieldType name="prefix_token_1" class="solr.TextField" positionIncrementGap="1">
<analyzer type="index">
<!--
<tokenizer class="solr.WhitespaceTokenizerFactory" />
-->
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
<!--
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="100" />
-->
</analyzer>
<analyzer type="query">
<!--
<tokenizer class="solr.WhitespaceTokenizerFactory" />
-->
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
</fieldType>
RE: solr to handle special charater
Posted by Chris Hostetter <ho...@fucit.org>.
: By the way, I used DisMaxRequestHandler in solrconfig.xml. I googled a
: little about DisMaxRequestHandler, it says that '+' and '-' characters
: prefixing nonwhitespace characters are treated as "mandatory" and
: "prohibited" modifiers for the subsequent terms, but it doesn't say
: anything about just '+' or '-' characters.
Hmmm... well if you add debugQuery=true to your requests, and look at the
parsed query string, you can see that the "-" is getting applied to the
DisjuntionMax query being built for the second clause. which means either
the documentation in the wiki is wrong, or there is a bug.
i aparently wrote that documentation, and i wrote the orriginal dismax
code ... I thought someone else had at some point added in some special
escaping for "-" or "+" followed by whitespace (which would explain why i
wrote that in the documentation), but i can't see any evidence of it now.
so i'm going to change the wiki, and open a bug to add a feature like that
... but in the meantime...
: Does anyone know a workaround before I stripped off '+'/'-' by myself?
...i would just do that.
-Hoss
RE: solr to handle special charater
Posted by Kevin Xiao <ke...@nextbio.com>.
By the way, I used DisMaxRequestHandler in solrconfig.xml. I googled a little about DisMaxRequestHandler, it says that '+' and '-' characters prefixing nonwhitespace characters are treated as "mandatory" and "prohibited" modifiers for the subsequent terms, but it doesn't say anything about just '+' or '-' characters.
Does anyone know a workaround before I stripped off '+'/'-' by myself?
Thanks,
- Kevin
-----Original Message-----
From: Kevin Xiao [mailto:kevin@nextbio.com]
Sent: Tuesday, February 26, 2008 1:14 AM
To: solr-user@lucene.apache.org
Subject: solr to handle special charater
Hi there,
I am new to Solr. I used the following analyzer - I tried both WhitespaceTokenizerFactory and StandardTokenizerFactory, but when I search "xyz - abc", it didn't returns anything, ("xyz abc" returns "xyz - abc" though). I used the tokenizer/filter on both index and query time. Is that a solr bug? How do I make it work?
Thanks,
- Kevin
<fieldType name="prefix_token_1" class="solr.TextField" positionIncrementGap="1">
<analyzer type="index">
<!--
<tokenizer class="solr.WhitespaceTokenizerFactory" />
-->
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
<!--
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="100" />
-->
</analyzer>
<analyzer type="query">
<!--
<tokenizer class="solr.WhitespaceTokenizerFactory" />
-->
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
</fieldType>