You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by deepak agrawal <dk...@gmail.com> on 2010/01/15 11:45:40 UTC
Problem with text field in Solr
HI,
I am using Solr in which I have BODY field as text.
But when i am searching with BODY having word like *aviation*
when i am Searching *BODY:avia** (aviation is coming)
when i am Searching *BODY:aviat** (aviation is coming)
when i am searching *BODY:aviati** (aviation is not coming)
when i am searching *BODY:aviatio** (aviation is not coming)
when i am searching *BODY:aviation** (aviation is not coming)
Please help me how can i search these type of world with (*aviati*,**
aviatio*,**aviation**)
Below is the detail of How we are using BODY with Text.
*<field name="BODY" type="text" indexed="true" stored="true"
multiValued="true" termVectors="true"/>*
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<!-- in this example, we will only use synonyms at query time
<filter class="solr.SynonymFilterFactory"
synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
-->
<!-- Case insensitive stop word removal.
enablePositionIncrements=true ensures that a 'gap' is left to
allow for accurate phrase queries.
-->
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
enablePositionIncrements="true"
/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
--
DEEPAK AGRAWAL
+91-9379433455
GOOD LUCK.....
Re: Problem with text field in Solr
Posted by MitchK <mi...@web.de>.
What is analysis.jsp showing to you, when you query the words?
Due to stemming the input, there could be the mistake.
What happens, if you search for "aviation" without wildcards?
--
View this message in context: http://old.nabble.com/Problem-with-text-field-in-Solr-tp27175346p27175827.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problem with text field in Solr
Posted by Sven Maurmann <sv...@kippdata.de>.
Hi,
from a first glance on your configuration it appears that run run
into the
following:
You use a wildcard query to query a stemmed term (aviation becomes
aviat)
in the index. Now if you provide a wildcard query with the trailing
asterisk as the only wildcard, this wildcard query is rewritten as a
prefix query, which is not (!) stemmed.
Therefore everything seems to be fine for your first two examples (as
avia
and aviat are both prefixes of the stemmed aviation), but the
remaining
three queries try to match the prefixes aviati, aviatio and aviation
against
the stemm aviat of aviation - and fail.
You may want to consult either the Lucene documentation (on the
QueryParser
for example) of the appropriate chapters in the excellent book Lucene
in
Action (LIA) by Hatcher and Gospodnetic.
Hope that helps.
Sven
--On Friday, January 15, 2010 04:15:40 PM +0530 deepak agrawal
<dk...@gmail.com> wrote:
> HI,
>
> I am using Solr in which I have BODY field as text.
> But when i am searching with BODY having word like *aviation*
>
> when i am Searching *BODY:avia** (aviation is coming)
> when i am Searching *BODY:aviat** (aviation is coming)
> when i am searching *BODY:aviati** (aviation is not coming)
> when i am searching *BODY:aviatio** (aviation is not coming)
> when i am searching *BODY:aviation** (aviation is not coming)
>
> Please help me how can i search these type of world with
> (*aviati*,** aviatio*,**aviation**)
>
> Below is the detail of How we are using BODY with Text.
>
> *<field name="BODY" type="text" indexed="true" stored="true"
> multiValued="true" termVectors="true"/>*
>
> <fieldType name="text" class="solr.TextField"
> positionIncrementGap="100"> <analyzer type="index">
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> <!-- in this example, we will only use synonyms at query
> time <filter class="solr.SynonymFilterFactory"
> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
> -->
> <!-- Case insensitive stop word removal.
> enablePositionIncrements=true ensures that a 'gap' is
> left to allow for accurate phrase queries.
> -->
> <filter class="solr.StopFilterFactory"
> ignoreCase="true"
> words="stopwords.txt"
> enablePositionIncrements="true"
> />
> <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.EnglishPorterFilterFactory"
> protected="protwords.txt"/>
> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>
> </analyzer>
> <analyzer type="query">
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> <filter class="solr.SynonymFilterFactory"
> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
> <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt"/>
> <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.EnglishPorterFilterFactory"
> protected="protwords.txt"/>
> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
> </analyzer>
> </fieldType>
>
> --
> DEEPAK AGRAWAL
> +91-9379433455
> GOOD LUCK.....
--
kippdata informationstechnologie GmbH
Sven Maurmann Tel: 0228 98549 -12
Bornheimer Str. 33a Fax: 0228 98549 -50
D-53111 Bonn sven.maurmann@kippdata.de
HRB 8018 Amtsgericht Bonn / USt.-IdNr. DE 196 457 417
Geschäftsführer: Dr. Thomas Höfer, Rainer Jung, Sven Maurmann