You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Vannia Rajan <kv...@gmail.com> on 2009/08/20 17:29:08 UTC

Solr Quoted search confusions

Hi,*

   *I need some help to clarify how solr indexes documents. I have 6
documents with various forms of the word "ilike" (complete word and not "i
like") - one having "ilike" as such and others having a special character in
between "i" and "like".

   What i expected from solr is that, when i do a Quoted search "ilike", it
should return only the document that had "ilike" exactly. But, what i get
from solr is that various forms of the word "ilike" are also included in the
results. Is there an option/configuration that i can do to solr so that i
will get only the result with exact word "ilike"?
*

  The result i obtained from solr is shown below,

http://localhost:8080/solr/select/?q=%22ilike%22&fl=description,score
<response>
-
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">20</int>
-
<lst name="params">
<str name="fl">description,score</str>
<str name="q">"ilike"</str>
</lst>
</lst>
-
<result name="response" numFound="5" start="0" maxScore="0.5">
-
<doc>
<float name="score">0.5</float>
<str name="description">Ilike company is doing great!</str>
</doc>
-
<doc>
<float name="score">0.375</float>
<str name="description">I:like company is doing great!</str>
</doc>
-
<doc>
<float name="score">0.3125</float>
<str name="description">I-like it very much. Really, this can come
up!.</str>
</doc>
-
<doc>
<float name="score">0.3125</float>
<str name="description">I;like it very much. Really, i say.</str>
</doc>
-
<doc>
<float name="score">0.25</float>
-
<str name="description">
i.like it very much. full stop can come? i don't know.
</str>
</doc>
</result>
</response*

-- 
Thanks,
Vanniarajan

Re: Solr Quoted search confusions

Posted by Vannia Rajan <kv...@gmail.com>.

Thank you for your response, it just worked!

On Fri, Aug 21, 2009 at 1:29 PM, Chris Male <ge...@gmail.com> wrote:

> Hi,
>
> I think the cause of the problem is the WordDelimiterFilterFactory.  With
> your current configuration indexing i-like results in 3 terms being indexed
> - i, like and ilike.  Then when you query for ilike, you match the 3rd
> term.  The term ilike is created by the WordDelimiterFilter due to the
> catenateWords="1" configuration.  When I change this to 0 only i and like
> are created, hence ilike no longer matches i-like.
>
> Hope that fixes your problem.
>
> Thanks,
> Chris
>
> On Fri, Aug 21, 2009 at 7:16 AM, Vannia Rajan <kvanniarajan@gmail.com
> >wrote:
>
> > Hi,
> >
> > On Thu, Aug 20, 2009 at 9:13 PM, Chris Male <ge...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > What analyzers/filters have you configured for the field that you are
> > > searching? One could be causing the various versions of "ilike" to be
> > > indexed the same way.
> > >
> >
> >   I'm using "text" field with the following analyzers / filters for the
> > field "description" (which has various forms of word "ilike":
> >
> >        <fieldType name="text" class="solr.TextField"
> > positionIncrementGap="100">
> >            <analyzer type="index">
> >                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >                <filter class="solr.StopFilterFactory"
> >                        ignoreCase="true"
> >                        words="stopwords.txt"
> >                        enablePositionIncrements="true"
> >                        />
> >                <filter class="solr.WordDelimiterFilterFactory"
> > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> >                <filter class="solr.LowerCaseFilterFactory"/>
> >                <filter class="solr.EnglishPorterFilterFactory"
> > protected="protwords.txt"/>
> >                <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
> >            </analyzer>
> >            <analyzer type="query">
> >                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >                <filter class="solr.SynonymFilterFactory"
> > synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
> >                <filter class="solr.StopFilterFactory" ignoreCase="true"
> > words="stopwords.txt"/>
> >                <filter class="solr.WordDelimiterFilterFactory"
> > generateWordParts="1" generateNumberParts="1" catenateWords="0"
> > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
> >                <filter class="solr.LowerCaseFilterFactory"/>
> >                <filter class="solr.EnglishPorterFilterFactory"
> > protected="protwords.txt"/>
> >                <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
> >            </analyzer>
> >        </fieldType>
> >
> >
> > Is there anything that i could tune here to get the intended results?
> >
> >
> > >
> > > Thanks
> > > Chris
> > >
> > > On Thu, Aug 20, 2009 at 5:29 PM, Vannia Rajan <kvanniarajan@gmail.com
> > > >wrote:
> > >
> > > > Hi,*
> > > >
> > > >   *I need some help to clarify how solr indexes documents. I have 6
> > > > documents with various forms of the word "ilike" (complete word and
> not
> > > "i
> > > > like") - one having "ilike" as such and others having a special
> > character
> > > > in
> > > > between "i" and "like".
> > > >
> > > >   What i expected from solr is that, when i do a Quoted search
> "ilike",
> > > it
> > > > should return only the document that had "ilike" exactly. But, what i
> > get
> > > > from solr is that various forms of the word "ilike" are also included
> > in
> > > > the
> > > > results. Is there an option/configuration that i can do to solr so
> that
> > i
> > > > will get only the result with exact word "ilike"?
> > > > *
> > > >
> > > >  The result i obtained from solr is shown below,
> > > >
> > > >
> http://localhost:8080/solr/select/?q=%22ilike%22&fl=description,score
> > > > <response>
> > > > -
> > > > <lst name="responseHeader">
> > > > <int name="status">0</int>
> > > > <int name="QTime">20</int>
> > > > -
> > > > <lst name="params">
> > > > <str name="fl">description,score</str>
> > > > <str name="q">"ilike"</str>
> > > > </lst>
> > > > </lst>
> > > > -
> > > > <result name="response" numFound="5" start="0" maxScore="0.5">
> > > > -
> > > > <doc>
> > > > <float name="score">0.5</float>
> > > > <str name="description">Ilike company is doing great!</str>
> > > > </doc>
> > > > -
> > > > <doc>
> > > > <float name="score">0.375</float>
> > > > <str name="description">I:like company is doing great!</str>
> > > > </doc>
> > > > -
> > > > <doc>
> > > > <float name="score">0.3125</float>
> > > > <str name="description">I-like it very much. Really, this can come
> > > > up!.</str>
> > > > </doc>
> > > > -
> > > > <doc>
> > > > <float name="score">0.3125</float>
> > > > <str name="description">I;like it very much. Really, i say.</str>
> > > > </doc>
> > > > -
> > > > <doc>
> > > > <float name="score">0.25</float>
> > > > -
> > > > <str name="description">
> > > > i.like it very much. full stop can come? i don't know.
> > > > </str>
> > > > </doc>
> > > > </result>
> > > > </response*
> > > >
> > > > --
> > > > Thanks,
> > > > Vanniarajan
> > > >
> > >
> >
> >
> >
> > --
> > Thanks,
> > Vanniarajan
> >
>



-- 
Thanks,
Vanniarajan

Re: Solr Quoted search confusions

Posted by Chris Male <ge...@gmail.com>.

Hi,

I think the cause of the problem is the WordDelimiterFilterFactory.  With
your current configuration indexing i-like results in 3 terms being indexed
- i, like and ilike.  Then when you query for ilike, you match the 3rd
term.  The term ilike is created by the WordDelimiterFilter due to the
catenateWords="1" configuration.  When I change this to 0 only i and like
are created, hence ilike no longer matches i-like.

Hope that fixes your problem.

Thanks,
Chris

On Fri, Aug 21, 2009 at 7:16 AM, Vannia Rajan <kv...@gmail.com>wrote:

> Hi,
>
> On Thu, Aug 20, 2009 at 9:13 PM, Chris Male <ge...@gmail.com> wrote:
>
> > Hi,
> >
> > What analyzers/filters have you configured for the field that you are
> > searching? One could be causing the various versions of "ilike" to be
> > indexed the same way.
> >
>
>   I'm using "text" field with the following analyzers / filters for the
> field "description" (which has various forms of word "ilike":
>
>        <fieldType name="text" class="solr.TextField"
> positionIncrementGap="100">
>            <analyzer type="index">
>                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>                <filter class="solr.StopFilterFactory"
>                        ignoreCase="true"
>                        words="stopwords.txt"
>                        enablePositionIncrements="true"
>                        />
>                <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>                <filter class="solr.LowerCaseFilterFactory"/>
>                <filter class="solr.EnglishPorterFilterFactory"
> protected="protwords.txt"/>
>                <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>            </analyzer>
>            <analyzer type="query">
>                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>                <filter class="solr.SynonymFilterFactory"
> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>                <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt"/>
>                <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>                <filter class="solr.LowerCaseFilterFactory"/>
>                <filter class="solr.EnglishPorterFilterFactory"
> protected="protwords.txt"/>
>                <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>            </analyzer>
>        </fieldType>
>
>
> Is there anything that i could tune here to get the intended results?
>
>
> >
> > Thanks
> > Chris
> >
> > On Thu, Aug 20, 2009 at 5:29 PM, Vannia Rajan <kvanniarajan@gmail.com
> > >wrote:
> >
> > > Hi,*
> > >
> > >   *I need some help to clarify how solr indexes documents. I have 6
> > > documents with various forms of the word "ilike" (complete word and not
> > "i
> > > like") - one having "ilike" as such and others having a special
> character
> > > in
> > > between "i" and "like".
> > >
> > >   What i expected from solr is that, when i do a Quoted search "ilike",
> > it
> > > should return only the document that had "ilike" exactly. But, what i
> get
> > > from solr is that various forms of the word "ilike" are also included
> in
> > > the
> > > results. Is there an option/configuration that i can do to solr so that
> i
> > > will get only the result with exact word "ilike"?
> > > *
> > >
> > >  The result i obtained from solr is shown below,
> > >
> > > http://localhost:8080/solr/select/?q=%22ilike%22&fl=description,score
> > > <response>
> > > -
> > > <lst name="responseHeader">
> > > <int name="status">0</int>
> > > <int name="QTime">20</int>
> > > -
> > > <lst name="params">
> > > <str name="fl">description,score</str>
> > > <str name="q">"ilike"</str>
> > > </lst>
> > > </lst>
> > > -
> > > <result name="response" numFound="5" start="0" maxScore="0.5">
> > > -
> > > <doc>
> > > <float name="score">0.5</float>
> > > <str name="description">Ilike company is doing great!</str>
> > > </doc>
> > > -
> > > <doc>
> > > <float name="score">0.375</float>
> > > <str name="description">I:like company is doing great!</str>
> > > </doc>
> > > -
> > > <doc>
> > > <float name="score">0.3125</float>
> > > <str name="description">I-like it very much. Really, this can come
> > > up!.</str>
> > > </doc>
> > > -
> > > <doc>
> > > <float name="score">0.3125</float>
> > > <str name="description">I;like it very much. Really, i say.</str>
> > > </doc>
> > > -
> > > <doc>
> > > <float name="score">0.25</float>
> > > -
> > > <str name="description">
> > > i.like it very much. full stop can come? i don't know.
> > > </str>
> > > </doc>
> > > </result>
> > > </response*
> > >
> > > --
> > > Thanks,
> > > Vanniarajan
> > >
> >
>
>
>
> --
> Thanks,
> Vanniarajan
>

Re: Solr Quoted search confusions

Posted by Vannia Rajan <kv...@gmail.com>.

Hi,

On Thu, Aug 20, 2009 at 9:13 PM, Chris Male <ge...@gmail.com> wrote:

> Hi,
>
> What analyzers/filters have you configured for the field that you are
> searching? One could be causing the various versions of "ilike" to be
> indexed the same way.
>

  I'm using "text" field with the following analyzers / filters for the
field "description" (which has various forms of word "ilike":

        <fieldType name="text" class="solr.TextField"
positionIncrementGap="100">
            <analyzer type="index">
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                <filter class="solr.StopFilterFactory"
                        ignoreCase="true"
                        words="stopwords.txt"
                        enablePositionIncrements="true"
                        />
                <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>
                <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
            </analyzer>
            <analyzer type="query">
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                <filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
                <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
                <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>
                <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
            </analyzer>
        </fieldType>


Is there anything that i could tune here to get the intended results?


>
> Thanks
> Chris
>
> On Thu, Aug 20, 2009 at 5:29 PM, Vannia Rajan <kvanniarajan@gmail.com
> >wrote:
>
> > Hi,*
> >
> >   *I need some help to clarify how solr indexes documents. I have 6
> > documents with various forms of the word "ilike" (complete word and not
> "i
> > like") - one having "ilike" as such and others having a special character
> > in
> > between "i" and "like".
> >
> >   What i expected from solr is that, when i do a Quoted search "ilike",
> it
> > should return only the document that had "ilike" exactly. But, what i get
> > from solr is that various forms of the word "ilike" are also included in
> > the
> > results. Is there an option/configuration that i can do to solr so that i
> > will get only the result with exact word "ilike"?
> > *
> >
> >  The result i obtained from solr is shown below,
> >
> > http://localhost:8080/solr/select/?q=%22ilike%22&fl=description,score
> > <response>
> > -
> > <lst name="responseHeader">
> > <int name="status">0</int>
> > <int name="QTime">20</int>
> > -
> > <lst name="params">
> > <str name="fl">description,score</str>
> > <str name="q">"ilike"</str>
> > </lst>
> > </lst>
> > -
> > <result name="response" numFound="5" start="0" maxScore="0.5">
> > -
> > <doc>
> > <float name="score">0.5</float>
> > <str name="description">Ilike company is doing great!</str>
> > </doc>
> > -
> > <doc>
> > <float name="score">0.375</float>
> > <str name="description">I:like company is doing great!</str>
> > </doc>
> > -
> > <doc>
> > <float name="score">0.3125</float>
> > <str name="description">I-like it very much. Really, this can come
> > up!.</str>
> > </doc>
> > -
> > <doc>
> > <float name="score">0.3125</float>
> > <str name="description">I;like it very much. Really, i say.</str>
> > </doc>
> > -
> > <doc>
> > <float name="score">0.25</float>
> > -
> > <str name="description">
> > i.like it very much. full stop can come? i don't know.
> > </str>
> > </doc>
> > </result>
> > </response*
> >
> > --
> > Thanks,
> > Vanniarajan
> >
>



-- 
Thanks,
Vanniarajan

Re: Solr Quoted search confusions

Posted by Chris Male <ge...@gmail.com>.

Hi,

What analyzers/filters have you configured for the field that you are
searching? One could be causing the various versions of "ilike" to be
indexed the same way.

Thanks
Chris

On Thu, Aug 20, 2009 at 5:29 PM, Vannia Rajan <kv...@gmail.com>wrote:

> Hi,*
>
>   *I need some help to clarify how solr indexes documents. I have 6
> documents with various forms of the word "ilike" (complete word and not "i
> like") - one having "ilike" as such and others having a special character
> in
> between "i" and "like".
>
>   What i expected from solr is that, when i do a Quoted search "ilike", it
> should return only the document that had "ilike" exactly. But, what i get
> from solr is that various forms of the word "ilike" are also included in
> the
> results. Is there an option/configuration that i can do to solr so that i
> will get only the result with exact word "ilike"?
> *
>
>  The result i obtained from solr is shown below,
>
> http://localhost:8080/solr/select/?q=%22ilike%22&fl=description,score
> <response>
> -
> <lst name="responseHeader">
> <int name="status">0</int>
> <int name="QTime">20</int>
> -
> <lst name="params">
> <str name="fl">description,score</str>
> <str name="q">"ilike"</str>
> </lst>
> </lst>
> -
> <result name="response" numFound="5" start="0" maxScore="0.5">
> -
> <doc>
> <float name="score">0.5</float>
> <str name="description">Ilike company is doing great!</str>
> </doc>
> -
> <doc>
> <float name="score">0.375</float>
> <str name="description">I:like company is doing great!</str>
> </doc>
> -
> <doc>
> <float name="score">0.3125</float>
> <str name="description">I-like it very much. Really, this can come
> up!.</str>
> </doc>
> -
> <doc>
> <float name="score">0.3125</float>
> <str name="description">I;like it very much. Really, i say.</str>
> </doc>
> -
> <doc>
> <float name="score">0.25</float>
> -
> <str name="description">
> i.like it very much. full stop can come? i don't know.
> </str>
> </doc>
> </result>
> </response*
>
> --
> Thanks,
> Vanniarajan
>