You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by mike anderson <sa...@gmail.com> on 2009/09/14 06:13:46 UTC

stopfilterFactory isn't removing field name

I'm kind of stumped by this one.. is it something obvious?
I'm running the latest trunk. In some cases the stopFilterFactory isn't
removing the field name.

Thanks in advance,

-mike

>From debugQuery (both words are in the stopwords file):

http://localhost:8983/solr/select?q=citations:for&debugQuery=true

<str name="rawquerystring">citations:for</str>
<str name="querystring">citations:for</str>
<str name="parsedquery">citations:</str>
<str name="parsedquery_toString">citations:</str>


http://localhost:8983/solr/select?q=citations:the&debugQuery=true

<str name="rawquerystring">citations:the</str>
<str name="querystring">citations:the</str>
<str name="parsedquery"></str>
<str name="parsedquery_toString"></str>




schema analyzer for this field:
<!-- Citation text -->
<fieldType name="citationtext" class="solr.TextField"
positionIncrementGap="100">
      <analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
         <filter class="solr.SynonymFilterFactory"
synonyms="substitutions.txt" ignoreCase="true" expand="false"/>
<filter class="solr.StandardFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="false"
words="citationstopwords.txt"/>
        <filter class="solr.LowerCaseFilterFactory"/>
   <filter class="solr.ISOLatin1AccentFilterFactory"/>

        <!--<filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>-->
      </analyzer>
      <analyzer type="query">
      <tokenizer class="solr.StandardTokenizerFactory"/>
       <filter class="solr.SynonymFilterFactory"
synonyms="substitutions.txt" ignoreCase="true" expand="false"/>
  <filter class="solr.StandardFilterFactory"/>
  <filter class="solr.StopFilterFactory" ignoreCase="false"
words="citationstopwords.txt"/>
      <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.ISOLatin1AccentFilterFactory"/>
       <!-- <filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/> -->
      </analyzer>
    </fieldType>

Re: stopfilterFactory isn't removing field name

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Tue, Sep 15, 2009 at 1:14 PM, mike anderson <sa...@gmail.com> wrote:
> Could this be related to SOLR-1423?

Nope, and I haven't been able to reproduce the bug you saw either.

-Yonik

> On Mon, Sep 14, 2009 at 8:51 AM, Yonik Seeley <yo...@lucidimagination.com>wrote:
>
>> Thanks, I'll see if I can reproduce...
>>
>> -Yonik
>> http://www.lucidimagination.com
>>
>> On Mon, Sep 14, 2009 at 2:10 AM, mike anderson <sa...@gmail.com>
>> wrote:
>> > Yeah.. that was weird. removing the line "forever,for ever" from my
>> synonyms
>> > file fixed the problem. In fact, i was having the same problem for every
>> > double word like that. I decided I didn't really need the synonym filter
>> for
>> > that field so I just took it out, but I'd really like to know what the
>> > problem is.
>> > -mike
>> >
>> > On Mon, Sep 14, 2009 at 1:10 AM, Yonik Seeley <
>> yonik@lucidimagination.com>
>> > wrote:
>> >>
>> >> That's pretty strange... perhaps something to do with your synonyms
>> >> file mapping "for" to a zero length token?
>> >>
>> >> -Yonik
>> >> http://www.lucidimagination.com
>> >>
>> >> On Mon, Sep 14, 2009 at 12:13 AM, mike anderson <saidtherobot@gmail.com
>> >
>> >> wrote:
>> >> > I'm kind of stumped by this one.. is it something obvious?
>> >> > I'm running the latest trunk. In some cases the stopFilterFactory
>> isn't
>> >> > removing the field name.
>> >> >
>> >> > Thanks in advance,
>> >> >
>> >> > -mike
>> >> >
>> >> > From debugQuery (both words are in the stopwords file):
>> >> >
>> >> > http://localhost:8983/solr/select?q=citations:for&debugQuery=true
>> >> >
>> >> > <str name="rawquerystring">citations:for</str>
>> >> > <str name="querystring">citations:for</str>
>> >> > <str name="parsedquery">citations:</str>
>> >> > <str name="parsedquery_toString">citations:</str>
>> >> >
>> >> >
>> >> > http://localhost:8983/solr/select?q=citations:the&debugQuery=true
>> >> >
>> >> > <str name="rawquerystring">citations:the</str>
>> >> > <str name="querystring">citations:the</str>
>> >> > <str name="parsedquery"></str>
>> >> > <str name="parsedquery_toString"></str>
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > schema analyzer for this field:
>> >> > <!-- Citation text -->
>> >> > <fieldType name="citationtext" class="solr.TextField"
>> >> > positionIncrementGap="100">
>> >> >      <analyzer type="index">
>> >> > <tokenizer class="solr.StandardTokenizerFactory"/>
>> >> >         <filter class="solr.SynonymFilterFactory"
>> >> > synonyms="substitutions.txt" ignoreCase="true" expand="false"/>
>> >> > <filter class="solr.StandardFilterFactory"/>
>> >> >        <filter class="solr.StopFilterFactory" ignoreCase="false"
>> >> > words="citationstopwords.txt"/>
>> >> >        <filter class="solr.LowerCaseFilterFactory"/>
>> >> >   <filter class="solr.ISOLatin1AccentFilterFactory"/>
>> >> >
>> >> >        <!--<filter class="solr.EnglishPorterFilterFactory"
>> >> > protected="protwords.txt"/>-->
>> >> >      </analyzer>
>> >> >      <analyzer type="query">
>> >> >      <tokenizer class="solr.StandardTokenizerFactory"/>
>> >> >       <filter class="solr.SynonymFilterFactory"
>> >> > synonyms="substitutions.txt" ignoreCase="true" expand="false"/>
>> >> >  <filter class="solr.StandardFilterFactory"/>
>> >> >  <filter class="solr.StopFilterFactory" ignoreCase="false"
>> >> > words="citationstopwords.txt"/>
>> >> >      <filter class="solr.LowerCaseFilterFactory"/>
>> >> >    <filter class="solr.ISOLatin1AccentFilterFactory"/>
>> >> >       <!-- <filter class="solr.EnglishPorterFilterFactory"
>> >> > protected="protwords.txt"/> -->
>> >> >      </analyzer>
>> >> >    </fieldType>
>> >> >
>> >
>> >
>>
>

Re: stopfilterFactory isn't removing field name

Posted by mike anderson <sa...@gmail.com>.
Could this be related to SOLR-1423?

On Mon, Sep 14, 2009 at 8:51 AM, Yonik Seeley <yo...@lucidimagination.com>wrote:

> Thanks, I'll see if I can reproduce...
>
> -Yonik
> http://www.lucidimagination.com
>
> On Mon, Sep 14, 2009 at 2:10 AM, mike anderson <sa...@gmail.com>
> wrote:
> > Yeah.. that was weird. removing the line "forever,for ever" from my
> synonyms
> > file fixed the problem. In fact, i was having the same problem for every
> > double word like that. I decided I didn't really need the synonym filter
> for
> > that field so I just took it out, but I'd really like to know what the
> > problem is.
> > -mike
> >
> > On Mon, Sep 14, 2009 at 1:10 AM, Yonik Seeley <
> yonik@lucidimagination.com>
> > wrote:
> >>
> >> That's pretty strange... perhaps something to do with your synonyms
> >> file mapping "for" to a zero length token?
> >>
> >> -Yonik
> >> http://www.lucidimagination.com
> >>
> >> On Mon, Sep 14, 2009 at 12:13 AM, mike anderson <saidtherobot@gmail.com
> >
> >> wrote:
> >> > I'm kind of stumped by this one.. is it something obvious?
> >> > I'm running the latest trunk. In some cases the stopFilterFactory
> isn't
> >> > removing the field name.
> >> >
> >> > Thanks in advance,
> >> >
> >> > -mike
> >> >
> >> > From debugQuery (both words are in the stopwords file):
> >> >
> >> > http://localhost:8983/solr/select?q=citations:for&debugQuery=true
> >> >
> >> > <str name="rawquerystring">citations:for</str>
> >> > <str name="querystring">citations:for</str>
> >> > <str name="parsedquery">citations:</str>
> >> > <str name="parsedquery_toString">citations:</str>
> >> >
> >> >
> >> > http://localhost:8983/solr/select?q=citations:the&debugQuery=true
> >> >
> >> > <str name="rawquerystring">citations:the</str>
> >> > <str name="querystring">citations:the</str>
> >> > <str name="parsedquery"></str>
> >> > <str name="parsedquery_toString"></str>
> >> >
> >> >
> >> >
> >> >
> >> > schema analyzer for this field:
> >> > <!-- Citation text -->
> >> > <fieldType name="citationtext" class="solr.TextField"
> >> > positionIncrementGap="100">
> >> >      <analyzer type="index">
> >> > <tokenizer class="solr.StandardTokenizerFactory"/>
> >> >         <filter class="solr.SynonymFilterFactory"
> >> > synonyms="substitutions.txt" ignoreCase="true" expand="false"/>
> >> > <filter class="solr.StandardFilterFactory"/>
> >> >        <filter class="solr.StopFilterFactory" ignoreCase="false"
> >> > words="citationstopwords.txt"/>
> >> >        <filter class="solr.LowerCaseFilterFactory"/>
> >> >   <filter class="solr.ISOLatin1AccentFilterFactory"/>
> >> >
> >> >        <!--<filter class="solr.EnglishPorterFilterFactory"
> >> > protected="protwords.txt"/>-->
> >> >      </analyzer>
> >> >      <analyzer type="query">
> >> >      <tokenizer class="solr.StandardTokenizerFactory"/>
> >> >       <filter class="solr.SynonymFilterFactory"
> >> > synonyms="substitutions.txt" ignoreCase="true" expand="false"/>
> >> >  <filter class="solr.StandardFilterFactory"/>
> >> >  <filter class="solr.StopFilterFactory" ignoreCase="false"
> >> > words="citationstopwords.txt"/>
> >> >      <filter class="solr.LowerCaseFilterFactory"/>
> >> >    <filter class="solr.ISOLatin1AccentFilterFactory"/>
> >> >       <!-- <filter class="solr.EnglishPorterFilterFactory"
> >> > protected="protwords.txt"/> -->
> >> >      </analyzer>
> >> >    </fieldType>
> >> >
> >
> >
>

Re: stopfilterFactory isn't removing field name

Posted by Yonik Seeley <yo...@lucidimagination.com>.
Thanks, I'll see if I can reproduce...

-Yonik
http://www.lucidimagination.com

On Mon, Sep 14, 2009 at 2:10 AM, mike anderson <sa...@gmail.com> wrote:
> Yeah.. that was weird. removing the line "forever,for ever" from my synonyms
> file fixed the problem. In fact, i was having the same problem for every
> double word like that. I decided I didn't really need the synonym filter for
> that field so I just took it out, but I'd really like to know what the
> problem is.
> -mike
>
> On Mon, Sep 14, 2009 at 1:10 AM, Yonik Seeley <yo...@lucidimagination.com>
> wrote:
>>
>> That's pretty strange... perhaps something to do with your synonyms
>> file mapping "for" to a zero length token?
>>
>> -Yonik
>> http://www.lucidimagination.com
>>
>> On Mon, Sep 14, 2009 at 12:13 AM, mike anderson <sa...@gmail.com>
>> wrote:
>> > I'm kind of stumped by this one.. is it something obvious?
>> > I'm running the latest trunk. In some cases the stopFilterFactory isn't
>> > removing the field name.
>> >
>> > Thanks in advance,
>> >
>> > -mike
>> >
>> > From debugQuery (both words are in the stopwords file):
>> >
>> > http://localhost:8983/solr/select?q=citations:for&debugQuery=true
>> >
>> > <str name="rawquerystring">citations:for</str>
>> > <str name="querystring">citations:for</str>
>> > <str name="parsedquery">citations:</str>
>> > <str name="parsedquery_toString">citations:</str>
>> >
>> >
>> > http://localhost:8983/solr/select?q=citations:the&debugQuery=true
>> >
>> > <str name="rawquerystring">citations:the</str>
>> > <str name="querystring">citations:the</str>
>> > <str name="parsedquery"></str>
>> > <str name="parsedquery_toString"></str>
>> >
>> >
>> >
>> >
>> > schema analyzer for this field:
>> > <!-- Citation text -->
>> > <fieldType name="citationtext" class="solr.TextField"
>> > positionIncrementGap="100">
>> >      <analyzer type="index">
>> > <tokenizer class="solr.StandardTokenizerFactory"/>
>> >         <filter class="solr.SynonymFilterFactory"
>> > synonyms="substitutions.txt" ignoreCase="true" expand="false"/>
>> > <filter class="solr.StandardFilterFactory"/>
>> >        <filter class="solr.StopFilterFactory" ignoreCase="false"
>> > words="citationstopwords.txt"/>
>> >        <filter class="solr.LowerCaseFilterFactory"/>
>> >   <filter class="solr.ISOLatin1AccentFilterFactory"/>
>> >
>> >        <!--<filter class="solr.EnglishPorterFilterFactory"
>> > protected="protwords.txt"/>-->
>> >      </analyzer>
>> >      <analyzer type="query">
>> >      <tokenizer class="solr.StandardTokenizerFactory"/>
>> >       <filter class="solr.SynonymFilterFactory"
>> > synonyms="substitutions.txt" ignoreCase="true" expand="false"/>
>> >  <filter class="solr.StandardFilterFactory"/>
>> >  <filter class="solr.StopFilterFactory" ignoreCase="false"
>> > words="citationstopwords.txt"/>
>> >      <filter class="solr.LowerCaseFilterFactory"/>
>> >    <filter class="solr.ISOLatin1AccentFilterFactory"/>
>> >       <!-- <filter class="solr.EnglishPorterFilterFactory"
>> > protected="protwords.txt"/> -->
>> >      </analyzer>
>> >    </fieldType>
>> >
>
>

Re: stopfilterFactory isn't removing field name

Posted by mike anderson <sa...@gmail.com>.
Yeah.. that was weird. removing the line "forever,for ever" from my synonyms
file fixed the problem. In fact, i was having the same problem for every
double word like that. I decided I didn't really need the synonym filter for
that field so I just took it out, but I'd really like to know what the
problem is.
-mike

On Mon, Sep 14, 2009 at 1:10 AM, Yonik Seeley <yo...@lucidimagination.com>wrote:

> That's pretty strange... perhaps something to do with your synonyms
> file mapping "for" to a zero length token?
>
> -Yonik
> http://www.lucidimagination.com
>
> On Mon, Sep 14, 2009 at 12:13 AM, mike anderson <sa...@gmail.com>
> wrote:
> > I'm kind of stumped by this one.. is it something obvious?
> > I'm running the latest trunk. In some cases the stopFilterFactory isn't
> > removing the field name.
> >
> > Thanks in advance,
> >
> > -mike
> >
> > From debugQuery (both words are in the stopwords file):
> >
> > http://localhost:8983/solr/select?q=citations:for&debugQuery=true
> >
> > <str name="rawquerystring">citations:for</str>
> > <str name="querystring">citations:for</str>
> > <str name="parsedquery">citations:</str>
> > <str name="parsedquery_toString">citations:</str>
> >
> >
> > http://localhost:8983/solr/select?q=citations:the&debugQuery=true
> >
> > <str name="rawquerystring">citations:the</str>
> > <str name="querystring">citations:the</str>
> > <str name="parsedquery"></str>
> > <str name="parsedquery_toString"></str>
> >
> >
> >
> >
> > schema analyzer for this field:
> > <!-- Citation text -->
> > <fieldType name="citationtext" class="solr.TextField"
> > positionIncrementGap="100">
> >      <analyzer type="index">
> > <tokenizer class="solr.StandardTokenizerFactory"/>
> >         <filter class="solr.SynonymFilterFactory"
> > synonyms="substitutions.txt" ignoreCase="true" expand="false"/>
> > <filter class="solr.StandardFilterFactory"/>
> >        <filter class="solr.StopFilterFactory" ignoreCase="false"
> > words="citationstopwords.txt"/>
> >        <filter class="solr.LowerCaseFilterFactory"/>
> >   <filter class="solr.ISOLatin1AccentFilterFactory"/>
> >
> >        <!--<filter class="solr.EnglishPorterFilterFactory"
> > protected="protwords.txt"/>-->
> >      </analyzer>
> >      <analyzer type="query">
> >      <tokenizer class="solr.StandardTokenizerFactory"/>
> >       <filter class="solr.SynonymFilterFactory"
> > synonyms="substitutions.txt" ignoreCase="true" expand="false"/>
> >  <filter class="solr.StandardFilterFactory"/>
> >  <filter class="solr.StopFilterFactory" ignoreCase="false"
> > words="citationstopwords.txt"/>
> >      <filter class="solr.LowerCaseFilterFactory"/>
> >    <filter class="solr.ISOLatin1AccentFilterFactory"/>
> >       <!-- <filter class="solr.EnglishPorterFilterFactory"
> > protected="protwords.txt"/> -->
> >      </analyzer>
> >    </fieldType>
> >
>

Re: stopfilterFactory isn't removing field name

Posted by Yonik Seeley <yo...@lucidimagination.com>.
That's pretty strange... perhaps something to do with your synonyms
file mapping "for" to a zero length token?

-Yonik
http://www.lucidimagination.com

On Mon, Sep 14, 2009 at 12:13 AM, mike anderson <sa...@gmail.com> wrote:
> I'm kind of stumped by this one.. is it something obvious?
> I'm running the latest trunk. In some cases the stopFilterFactory isn't
> removing the field name.
>
> Thanks in advance,
>
> -mike
>
> From debugQuery (both words are in the stopwords file):
>
> http://localhost:8983/solr/select?q=citations:for&debugQuery=true
>
> <str name="rawquerystring">citations:for</str>
> <str name="querystring">citations:for</str>
> <str name="parsedquery">citations:</str>
> <str name="parsedquery_toString">citations:</str>
>
>
> http://localhost:8983/solr/select?q=citations:the&debugQuery=true
>
> <str name="rawquerystring">citations:the</str>
> <str name="querystring">citations:the</str>
> <str name="parsedquery"></str>
> <str name="parsedquery_toString"></str>
>
>
>
>
> schema analyzer for this field:
> <!-- Citation text -->
> <fieldType name="citationtext" class="solr.TextField"
> positionIncrementGap="100">
>      <analyzer type="index">
> <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.SynonymFilterFactory"
> synonyms="substitutions.txt" ignoreCase="true" expand="false"/>
> <filter class="solr.StandardFilterFactory"/>
>        <filter class="solr.StopFilterFactory" ignoreCase="false"
> words="citationstopwords.txt"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>   <filter class="solr.ISOLatin1AccentFilterFactory"/>
>
>        <!--<filter class="solr.EnglishPorterFilterFactory"
> protected="protwords.txt"/>-->
>      </analyzer>
>      <analyzer type="query">
>      <tokenizer class="solr.StandardTokenizerFactory"/>
>       <filter class="solr.SynonymFilterFactory"
> synonyms="substitutions.txt" ignoreCase="true" expand="false"/>
>  <filter class="solr.StandardFilterFactory"/>
>  <filter class="solr.StopFilterFactory" ignoreCase="false"
> words="citationstopwords.txt"/>
>      <filter class="solr.LowerCaseFilterFactory"/>
>    <filter class="solr.ISOLatin1AccentFilterFactory"/>
>       <!-- <filter class="solr.EnglishPorterFilterFactory"
> protected="protwords.txt"/> -->
>      </analyzer>
>    </fieldType>
>