You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Isan Fulia <is...@germinait.com> on 2011/09/26 12:09:38 UTC

Solr stopword problem in Query

Hi all,

I have a text field named* textForQuery* .
Following content has been indexed into solr in field textForQuery
*Coke Studio at MTV*

when i fired the query as
*textForQuery:("coke studio at mtv")* the results showed 0 documents

After runing the same query in debugMode i got the following results

<result name="response" numFound="0" start="0"/>
<lst name="debug">
<str name="rawquerystring">textForQuery:("coke studio at mtv")</str>
<str name="querystring">textForQuery:("coke studio at mtv")</str>
<str name="parsedquery">PhraseQuery(textForQuery:"coke studio ? mtv")</str>
<str name="parsedquery_toString">textForQuery:"coke studio *? *mtv"</str>

Why the query did not matched any document even when there is a document
with value of textForQuery as *Coke Studio at MTV*?
Is this because of the stopword *at* present in stopwordList?



-- 
Thanks & Regards,
Isan Fulia.

Re: Solr stopword problem in Query

Posted by Bill Bell <bi...@gmail.com>.
This is pretty serious issue

Bill Bell
Sent from mobile


On Sep 26, 2011, at 4:09 AM, Isan Fulia <is...@germinait.com> wrote:

> Hi all,
> 
> I have a text field named* textForQuery* .
> Following content has been indexed into solr in field textForQuery
> *Coke Studio at MTV*
> 
> when i fired the query as
> *textForQuery:("coke studio at mtv")* the results showed 0 documents
> 
> After runing the same query in debugMode i got the following results
> 
> <result name="response" numFound="0" start="0"/>
> <lst name="debug">
> <str name="rawquerystring">textForQuery:("coke studio at mtv")</str>
> <str name="querystring">textForQuery:("coke studio at mtv")</str>
> <str name="parsedquery">PhraseQuery(textForQuery:"coke studio ? mtv")</str>
> <str name="parsedquery_toString">textForQuery:"coke studio *? *mtv"</str>
> 
> Why the query did not matched any document even when there is a document
> with value of textForQuery as *Coke Studio at MTV*?
> Is this because of the stopword *at* present in stopwordList?
> 
> 
> 
> -- 
> Thanks & Regards,
> Isan Fulia.

Re: Solr stopword problem in Query

Posted by Isan Fulia <is...@germinait.com>.
Thanks Erick.

On 29 September 2011 18:31, Erick Erickson <er...@gmail.com> wrote:

> I think your problem is that you've set
>
> omitTermFreqAndPositions="true"
>
> It's not real clear from the Wiki page, but
> the tricky little phrase
>
> "Queries that rely on position that are issued
> on a field with this option will silently fail to
> find documents."
>
> And phrase queries rely on position information
>
> Best
> Erick
>
> On Tue, Sep 27, 2011 at 11:00 AM, Rahul Warawdekar
> <ra...@gmail.com> wrote:
> > Hi Isan,
> >
> > The schema.xml seems OK to me.
> >
> > Is "textForQuery" the only field you are searching in ?
> > Are you also searching on any other non text based fields ? If yes,
> please
> > provide schema description for those fields also.
> > Also, provide your solrconfig.xml file.
> >
> >
> > On Tue, Sep 27, 2011 at 1:12 AM, Isan Fulia <isan.fulia@germinait.com
> >wrote:
> >
> >> Hi Rahul,
> >>
> >> I also tried searching "Coke Studio MTV" but no documents were returned.
> >>
> >> Here is the snippet of my schema file.
> >>
> >>  <fieldType name="text" class="solr.TextField"
> >> positionIncrementGap="100" autoGeneratePhraseQueries="true">
> >>
> >>      <analyzer type="index">
> >>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >>
> >>        <filter class="solr.StopFilterFactory"
> >>                ignoreCase="true"
> >>
> >>                words="stopwords_en.txt"
> >>                enablePositionIncrements="true"
> >>
> >>                />
> >>        <filter class="solr.WordDelimiterFilterFactory"
> >> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> >> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> >>
> >>        <filter class="solr.LowerCaseFilterFactory"/>
> >>
> >>        <filter class="solr.KeywordMarkerFilterFactory"
> >> protected="protwords.txt"/>
> >>
> >>        <filter class="solr.PorterStemFilterFactory"/>
> >>      </analyzer>
> >>
> >>      <analyzer type="query">
> >>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >>
> >>        <filter class="solr.SynonymFilterFactory"
> >> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
> >>
> >>        <filter class="solr.StopFilterFactory"
> >>                ignoreCase="true"
> >>
> >>                words="stopwords_en.txt"
> >>                enablePositionIncrements="true"
> >>
> >>                />
> >>        <filter class="solr.WordDelimiterFilterFactory"
> >> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> >> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
> >>
> >>        <filter class="solr.LowerCaseFilterFactory"/>
> >>
> >>        <filter class="solr.KeywordMarkerFilterFactory"
> >> protected="protwords.txt"/>
> >>
> >>        <filter class="solr.PorterStemFilterFactory"/>
> >>      </analyzer>
> >>
> >>    </fieldType>
> >>
> >>
> >> *<field name="content" type="text" indexed="false" stored="true"
> >> multiValued="false"/>
> >> <field name="title" type="text" indexed="false" stored="true"
> >> multiValued="false"/>
> >>
> >> **<field name="textForQuery" type="text" indexed="true" stored="false"
> >> multiValued="true" omitTermFreqAndPositions="true"/>**
> >>
> >> <copyField source="content" dest="textForQuery"/>
> >> <copyField source="title" dest="textForQuery"/>*
> >>
> >>
> >> Thanks,
> >> Isan Fulia.
> >>
> >>
> >> On 26 September 2011 21:19, Rahul Warawdekar <
> rahul.warawdekar@gmail.com
> >> >wrote:
> >>
> >> > Hi Isan,
> >> >
> >> > Does your search return any documents when you remove the 'at' keyword
> >> and
> >> > just search for "Coke studio MTV" ?
> >> > Also, can you please provide the snippet of schema.xml file where you
> >> have
> >> > mentioned this field name and its "type" description ?
> >> >
> >> > On Mon, Sep 26, 2011 at 6:09 AM, Isan Fulia <isan.fulia@germinait.com
> >> > >wrote:
> >> >
> >> > > Hi all,
> >> > >
> >> > > I have a text field named* textForQuery* .
> >> > > Following content has been indexed into solr in field textForQuery
> >> > > *Coke Studio at MTV*
> >> > >
> >> > > when i fired the query as
> >> > > *textForQuery:("coke studio at mtv")* the results showed 0 documents
> >> > >
> >> > > After runing the same query in debugMode i got the following results
> >> > >
> >> > > <result name="response" numFound="0" start="0"/>
> >> > > <lst name="debug">
> >> > > <str name="rawquerystring">textForQuery:("coke studio at mtv")</str>
> >> > > <str name="querystring">textForQuery:("coke studio at mtv")</str>
> >> > > <str name="parsedquery">PhraseQuery(textForQuery:"coke studio ?
> >> > mtv")</str>
> >> > > <str name="parsedquery_toString">textForQuery:"coke studio *?
> >> *mtv"</str>
> >> > >
> >> > > Why the query did not matched any document even when there is a
> >> document
> >> > > with value of textForQuery as *Coke Studio at MTV*?
> >> > > Is this because of the stopword *at* present in stopwordList?
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > > Thanks & Regards,
> >> > > Isan Fulia.
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > Thanks and Regards
> >> > Rahul A. Warawdekar
> >> >
> >>
> >>
> >>
> >> --
> >> Thanks & Regards,
> >> Isan Fulia.
> >>
> >
> >
> >
> > --
> > Thanks and Regards
> > Rahul A. Warawdekar
> >
>



-- 
Thanks & Regards,
Isan Fulia.

Re: Solr stopword problem in Query

Posted by Erick Erickson <er...@gmail.com>.
I think your problem is that you've set

omitTermFreqAndPositions="true"

It's not real clear from the Wiki page, but
the tricky little phrase

"Queries that rely on position that are issued
on a field with this option will silently fail to
find documents."

And phrase queries rely on position information

Best
Erick

On Tue, Sep 27, 2011 at 11:00 AM, Rahul Warawdekar
<ra...@gmail.com> wrote:
> Hi Isan,
>
> The schema.xml seems OK to me.
>
> Is "textForQuery" the only field you are searching in ?
> Are you also searching on any other non text based fields ? If yes, please
> provide schema description for those fields also.
> Also, provide your solrconfig.xml file.
>
>
> On Tue, Sep 27, 2011 at 1:12 AM, Isan Fulia <is...@germinait.com>wrote:
>
>> Hi Rahul,
>>
>> I also tried searching "Coke Studio MTV" but no documents were returned.
>>
>> Here is the snippet of my schema file.
>>
>>  <fieldType name="text" class="solr.TextField"
>> positionIncrementGap="100" autoGeneratePhraseQueries="true">
>>
>>      <analyzer type="index">
>>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>
>>        <filter class="solr.StopFilterFactory"
>>                ignoreCase="true"
>>
>>                words="stopwords_en.txt"
>>                enablePositionIncrements="true"
>>
>>                />
>>        <filter class="solr.WordDelimiterFilterFactory"
>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>>
>>        <filter class="solr.LowerCaseFilterFactory"/>
>>
>>        <filter class="solr.KeywordMarkerFilterFactory"
>> protected="protwords.txt"/>
>>
>>        <filter class="solr.PorterStemFilterFactory"/>
>>      </analyzer>
>>
>>      <analyzer type="query">
>>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>
>>        <filter class="solr.SynonymFilterFactory"
>> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>>
>>        <filter class="solr.StopFilterFactory"
>>                ignoreCase="true"
>>
>>                words="stopwords_en.txt"
>>                enablePositionIncrements="true"
>>
>>                />
>>        <filter class="solr.WordDelimiterFilterFactory"
>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
>> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>>
>>        <filter class="solr.LowerCaseFilterFactory"/>
>>
>>        <filter class="solr.KeywordMarkerFilterFactory"
>> protected="protwords.txt"/>
>>
>>        <filter class="solr.PorterStemFilterFactory"/>
>>      </analyzer>
>>
>>    </fieldType>
>>
>>
>> *<field name="content" type="text" indexed="false" stored="true"
>> multiValued="false"/>
>> <field name="title" type="text" indexed="false" stored="true"
>> multiValued="false"/>
>>
>> **<field name="textForQuery" type="text" indexed="true" stored="false"
>> multiValued="true" omitTermFreqAndPositions="true"/>**
>>
>> <copyField source="content" dest="textForQuery"/>
>> <copyField source="title" dest="textForQuery"/>*
>>
>>
>> Thanks,
>> Isan Fulia.
>>
>>
>> On 26 September 2011 21:19, Rahul Warawdekar <rahul.warawdekar@gmail.com
>> >wrote:
>>
>> > Hi Isan,
>> >
>> > Does your search return any documents when you remove the 'at' keyword
>> and
>> > just search for "Coke studio MTV" ?
>> > Also, can you please provide the snippet of schema.xml file where you
>> have
>> > mentioned this field name and its "type" description ?
>> >
>> > On Mon, Sep 26, 2011 at 6:09 AM, Isan Fulia <isan.fulia@germinait.com
>> > >wrote:
>> >
>> > > Hi all,
>> > >
>> > > I have a text field named* textForQuery* .
>> > > Following content has been indexed into solr in field textForQuery
>> > > *Coke Studio at MTV*
>> > >
>> > > when i fired the query as
>> > > *textForQuery:("coke studio at mtv")* the results showed 0 documents
>> > >
>> > > After runing the same query in debugMode i got the following results
>> > >
>> > > <result name="response" numFound="0" start="0"/>
>> > > <lst name="debug">
>> > > <str name="rawquerystring">textForQuery:("coke studio at mtv")</str>
>> > > <str name="querystring">textForQuery:("coke studio at mtv")</str>
>> > > <str name="parsedquery">PhraseQuery(textForQuery:"coke studio ?
>> > mtv")</str>
>> > > <str name="parsedquery_toString">textForQuery:"coke studio *?
>> *mtv"</str>
>> > >
>> > > Why the query did not matched any document even when there is a
>> document
>> > > with value of textForQuery as *Coke Studio at MTV*?
>> > > Is this because of the stopword *at* present in stopwordList?
>> > >
>> > >
>> > >
>> > > --
>> > > Thanks & Regards,
>> > > Isan Fulia.
>> > >
>> >
>> >
>> >
>> > --
>> > Thanks and Regards
>> > Rahul A. Warawdekar
>> >
>>
>>
>>
>> --
>> Thanks & Regards,
>> Isan Fulia.
>>
>
>
>
> --
> Thanks and Regards
> Rahul A. Warawdekar
>

Re: Solr stopword problem in Query

Posted by Rahul Warawdekar <ra...@gmail.com>.
Hi Isan,

The schema.xml seems OK to me.

Is "textForQuery" the only field you are searching in ?
Are you also searching on any other non text based fields ? If yes, please
provide schema description for those fields also.
Also, provide your solrconfig.xml file.


On Tue, Sep 27, 2011 at 1:12 AM, Isan Fulia <is...@germinait.com>wrote:

> Hi Rahul,
>
> I also tried searching "Coke Studio MTV" but no documents were returned.
>
> Here is the snippet of my schema file.
>
>  <fieldType name="text" class="solr.TextField"
> positionIncrementGap="100" autoGeneratePhraseQueries="true">
>
>      <analyzer type="index">
>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>
>        <filter class="solr.StopFilterFactory"
>                ignoreCase="true"
>
>                words="stopwords_en.txt"
>                enablePositionIncrements="true"
>
>                />
>        <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>
>        <filter class="solr.LowerCaseFilterFactory"/>
>
>        <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>
>        <filter class="solr.PorterStemFilterFactory"/>
>      </analyzer>
>
>      <analyzer type="query">
>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>
>        <filter class="solr.SynonymFilterFactory"
> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>
>        <filter class="solr.StopFilterFactory"
>                ignoreCase="true"
>
>                words="stopwords_en.txt"
>                enablePositionIncrements="true"
>
>                />
>        <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>
>        <filter class="solr.LowerCaseFilterFactory"/>
>
>        <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>
>        <filter class="solr.PorterStemFilterFactory"/>
>      </analyzer>
>
>    </fieldType>
>
>
> *<field name="content" type="text" indexed="false" stored="true"
> multiValued="false"/>
> <field name="title" type="text" indexed="false" stored="true"
> multiValued="false"/>
>
> **<field name="textForQuery" type="text" indexed="true" stored="false"
> multiValued="true" omitTermFreqAndPositions="true"/>**
>
> <copyField source="content" dest="textForQuery"/>
> <copyField source="title" dest="textForQuery"/>*
>
>
> Thanks,
> Isan Fulia.
>
>
> On 26 September 2011 21:19, Rahul Warawdekar <rahul.warawdekar@gmail.com
> >wrote:
>
> > Hi Isan,
> >
> > Does your search return any documents when you remove the 'at' keyword
> and
> > just search for "Coke studio MTV" ?
> > Also, can you please provide the snippet of schema.xml file where you
> have
> > mentioned this field name and its "type" description ?
> >
> > On Mon, Sep 26, 2011 at 6:09 AM, Isan Fulia <isan.fulia@germinait.com
> > >wrote:
> >
> > > Hi all,
> > >
> > > I have a text field named* textForQuery* .
> > > Following content has been indexed into solr in field textForQuery
> > > *Coke Studio at MTV*
> > >
> > > when i fired the query as
> > > *textForQuery:("coke studio at mtv")* the results showed 0 documents
> > >
> > > After runing the same query in debugMode i got the following results
> > >
> > > <result name="response" numFound="0" start="0"/>
> > > <lst name="debug">
> > > <str name="rawquerystring">textForQuery:("coke studio at mtv")</str>
> > > <str name="querystring">textForQuery:("coke studio at mtv")</str>
> > > <str name="parsedquery">PhraseQuery(textForQuery:"coke studio ?
> > mtv")</str>
> > > <str name="parsedquery_toString">textForQuery:"coke studio *?
> *mtv"</str>
> > >
> > > Why the query did not matched any document even when there is a
> document
> > > with value of textForQuery as *Coke Studio at MTV*?
> > > Is this because of the stopword *at* present in stopwordList?
> > >
> > >
> > >
> > > --
> > > Thanks & Regards,
> > > Isan Fulia.
> > >
> >
> >
> >
> > --
> > Thanks and Regards
> > Rahul A. Warawdekar
> >
>
>
>
> --
> Thanks & Regards,
> Isan Fulia.
>



-- 
Thanks and Regards
Rahul A. Warawdekar

Re: Solr stopword problem in Query

Posted by Isan Fulia <is...@germinait.com>.
Hi Rahul,

I also tried searching "Coke Studio MTV" but no documents were returned.

Here is the snippet of my schema file.

 <fieldType name="text" class="solr.TextField"
positionIncrementGap="100" autoGeneratePhraseQueries="true">

      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>

        <filter class="solr.StopFilterFactory"
                ignoreCase="true"

                words="stopwords_en.txt"
                enablePositionIncrements="true"

                />
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>

        <filter class="solr.LowerCaseFilterFactory"/>

        <filter class="solr.KeywordMarkerFilterFactory"
protected="protwords.txt"/>

        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>

      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>

        <filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="true"/>

        <filter class="solr.StopFilterFactory"
                ignoreCase="true"

                words="stopwords_en.txt"
                enablePositionIncrements="true"

                />
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>

        <filter class="solr.LowerCaseFilterFactory"/>

        <filter class="solr.KeywordMarkerFilterFactory"
protected="protwords.txt"/>

        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>

    </fieldType>


*<field name="content" type="text" indexed="false" stored="true"
multiValued="false"/>
<field name="title" type="text" indexed="false" stored="true"
multiValued="false"/>

**<field name="textForQuery" type="text" indexed="true" stored="false"
multiValued="true" omitTermFreqAndPositions="true"/>**

<copyField source="content" dest="textForQuery"/>
<copyField source="title" dest="textForQuery"/>*


Thanks,
Isan Fulia.


On 26 September 2011 21:19, Rahul Warawdekar <ra...@gmail.com>wrote:

> Hi Isan,
>
> Does your search return any documents when you remove the 'at' keyword and
> just search for "Coke studio MTV" ?
> Also, can you please provide the snippet of schema.xml file where you have
> mentioned this field name and its "type" description ?
>
> On Mon, Sep 26, 2011 at 6:09 AM, Isan Fulia <isan.fulia@germinait.com
> >wrote:
>
> > Hi all,
> >
> > I have a text field named* textForQuery* .
> > Following content has been indexed into solr in field textForQuery
> > *Coke Studio at MTV*
> >
> > when i fired the query as
> > *textForQuery:("coke studio at mtv")* the results showed 0 documents
> >
> > After runing the same query in debugMode i got the following results
> >
> > <result name="response" numFound="0" start="0"/>
> > <lst name="debug">
> > <str name="rawquerystring">textForQuery:("coke studio at mtv")</str>
> > <str name="querystring">textForQuery:("coke studio at mtv")</str>
> > <str name="parsedquery">PhraseQuery(textForQuery:"coke studio ?
> mtv")</str>
> > <str name="parsedquery_toString">textForQuery:"coke studio *? *mtv"</str>
> >
> > Why the query did not matched any document even when there is a document
> > with value of textForQuery as *Coke Studio at MTV*?
> > Is this because of the stopword *at* present in stopwordList?
> >
> >
> >
> > --
> > Thanks & Regards,
> > Isan Fulia.
> >
>
>
>
> --
> Thanks and Regards
> Rahul A. Warawdekar
>



-- 
Thanks & Regards,
Isan Fulia.

Re: Solr stopword problem in Query

Posted by Rahul Warawdekar <ra...@gmail.com>.
Hi Isan,

Does your search return any documents when you remove the 'at' keyword and
just search for "Coke studio MTV" ?
Also, can you please provide the snippet of schema.xml file where you have
mentioned this field name and its "type" description ?

On Mon, Sep 26, 2011 at 6:09 AM, Isan Fulia <is...@germinait.com>wrote:

> Hi all,
>
> I have a text field named* textForQuery* .
> Following content has been indexed into solr in field textForQuery
> *Coke Studio at MTV*
>
> when i fired the query as
> *textForQuery:("coke studio at mtv")* the results showed 0 documents
>
> After runing the same query in debugMode i got the following results
>
> <result name="response" numFound="0" start="0"/>
> <lst name="debug">
> <str name="rawquerystring">textForQuery:("coke studio at mtv")</str>
> <str name="querystring">textForQuery:("coke studio at mtv")</str>
> <str name="parsedquery">PhraseQuery(textForQuery:"coke studio ? mtv")</str>
> <str name="parsedquery_toString">textForQuery:"coke studio *? *mtv"</str>
>
> Why the query did not matched any document even when there is a document
> with value of textForQuery as *Coke Studio at MTV*?
> Is this because of the stopword *at* present in stopwordList?
>
>
>
> --
> Thanks & Regards,
> Isan Fulia.
>



-- 
Thanks and Regards
Rahul A. Warawdekar