You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by deepak agrawal <dk...@gmail.com> on 2010/01/15 11:45:40 UTC

Problem with text field in Solr

HI,

I am using Solr in which I have BODY field as text.
But when i am searching with BODY having word like *aviation*

when i am Searching *BODY:avia** (aviation is coming)
when i am Searching *BODY:aviat** (aviation is coming)
when i am searching *BODY:aviati** (aviation is not coming)
when i am searching *BODY:aviatio** (aviation is not coming)
when i am searching *BODY:aviation** (aviation is not coming)

Please help me how  can i search these type of world with (*aviati*,**
aviatio*,**aviation**)

Below is the detail of How we are using BODY with Text.

*<field name="BODY" type="text" indexed="true" stored="true"
multiValued="true" termVectors="true"/>*

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <!-- in this example, we will only use synonyms at query time
        <filter class="solr.SynonymFilterFactory"
synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        -->
        <!-- Case insensitive stop word removal.
             enablePositionIncrements=true ensures that a 'gap' is left to
             allow for accurate phrase queries.
        -->
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
       <filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>

      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldType>

-- 
DEEPAK AGRAWAL
+91-9379433455
GOOD LUCK.....

Re: Problem with text field in Solr

Posted by MitchK <mi...@web.de>.

What is analysis.jsp showing to you, when you query the words?
Due to stemming the input, there could be the mistake.

What happens, if you search for "aviation" without wildcards?
-- 
View this message in context: http://old.nabble.com/Problem-with-text-field-in-Solr-tp27175346p27175827.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Problem with text field in Solr

Posted by Sven Maurmann <sv...@kippdata.de>.

Hi,

from a first glance on your configuration it appears that run run 
into the
following:

You use a wildcard query to query a stemmed term (aviation becomes 
aviat)
in the index. Now if you provide a wildcard query with the trailing
asterisk as the only wildcard, this wildcard query is rewritten as a
prefix query, which is not (!) stemmed.

Therefore everything seems to be fine for your first two examples (as 
avia
and aviat are both prefixes of the stemmed aviation), but the 
remaining
three queries try to match the prefixes aviati, aviatio and aviation 
against
the stemm aviat of aviation - and fail.

You may want to consult either the Lucene documentation (on the 
QueryParser
for example) of the appropriate chapters in the excellent book Lucene 
in
Action (LIA) by Hatcher and Gospodnetic.

Hope that helps.

Sven



--On Friday, January 15, 2010 04:15:40 PM +0530 deepak agrawal 
<dk...@gmail.com> wrote:

> HI,
>
> I am using Solr in which I have BODY field as text.
> But when i am searching with BODY having word like *aviation*
>
> when i am Searching *BODY:avia** (aviation is coming)
> when i am Searching *BODY:aviat** (aviation is coming)
> when i am searching *BODY:aviati** (aviation is not coming)
> when i am searching *BODY:aviatio** (aviation is not coming)
> when i am searching *BODY:aviation** (aviation is not coming)
>
> Please help me how  can i search these type of world with
> (*aviati*,** aviatio*,**aviation**)
>
> Below is the detail of How we are using BODY with Text.
>
> *<field name="BODY" type="text" indexed="true" stored="true"
> multiValued="true" termVectors="true"/>*
>
> <fieldType name="text" class="solr.TextField"
> positionIncrementGap="100">       <analyzer type="index">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <!-- in this example, we will only use synonyms at query
> time         <filter class="solr.SynonymFilterFactory"
> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
>         -->
>         <!-- Case insensitive stop word removal.
>              enablePositionIncrements=true ensures that a 'gap' is
> left to              allow for accurate phrase queries.
>         -->
>         <filter class="solr.StopFilterFactory"
>                 ignoreCase="true"
>                 words="stopwords.txt"
>                 enablePositionIncrements="true"
>                 />
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.EnglishPorterFilterFactory"
> protected="protwords.txt"/>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.SynonymFilterFactory"
> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt"/>
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.EnglishPorterFilterFactory"
> protected="protwords.txt"/>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>       </analyzer>
>     </fieldType>
>
> --
> DEEPAK AGRAWAL
> +91-9379433455
> GOOD LUCK.....



-- 
kippdata informationstechnologie GmbH
Sven Maurmann               Tel: 0228 98549 -12
Bornheimer Str. 33a         Fax: 0228 98549 -50
D-53111 Bonn                sven.maurmann@kippdata.de

HRB 8018 Amtsgericht Bonn / USt.-IdNr. DE 196 457 417
Geschäftsführer: Dr. Thomas Höfer, Rainer Jung, Sven Maurmann