You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Daniel Löfquist <da...@it.cdon.com> on 2009/10/21 16:04:52 UTC

No search hits for items starting with one-letter words

Hello all,

I have an odd problem. I have a Solr-index containing songs by various artists. When I
perform a search for something that starts with a one-letter word I receive no hits. If
I remove the one-letter word I get hits though.

So for example, if I search for "a hard days night" or "i want you back" I get 0 hits
but if I search for "hard days night" or "want you back" there are hits.

This behaviour doesn't affect items starting with a number. So if a song-title were to
start with a number that's no problem, I will get hits for that.

The fieldtype I'm using for the text-field containing song-title is defined in my
schema.xml like this:

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
   <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
     <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
   </analyzer>
   <analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true"
expand="true"/>
     <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
   </analyzer>
  </fieldType>

Can anyone tell me what may be the source of my problem and how to fix it?

I'm on a deadline so quick answers are greatly appreciated ;-)

Thanks for listening,

//Daniel

Re: No search hits for items starting with one-letter words

Posted by Yonik Seeley <yo...@lucidimagination.com>.
I just tried with Solr 1.4 trunk and it seems to work fine.
"a" is a stopword... but I'm not sure how stopwords could be messing you up.
For matching song titles, you may want to use a field type with no
stopwords though (there are a lot of common words in song titles I
think).

If you've changed your synonym file, it could be matching something in
the query and changing it?
Try the analysis page in the admin interface and see what comes out.

-Yonik
http://www.lucidimagination.com



On Wed, Oct 21, 2009 at 10:04 AM, Daniel Löfquist
<da...@it.cdon.com> wrote:
> Hello all,
>
> I have an odd problem. I have a Solr-index containing songs by various artists. When I
> perform a search for something that starts with a one-letter word I receive no hits. If
> I remove the one-letter word I get hits though.
>
> So for example, if I search for "a hard days night" or "i want you back" I get 0 hits
> but if I search for "hard days night" or "want you back" there are hits.
>
> This behaviour doesn't affect items starting with a number. So if a song-title were to
> start with a number that's no problem, I will get hits for that.
>
> The fieldtype I'm using for the text-field containing song-title is defined in my
> schema.xml like this:
>
> <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
>   <analyzer type="index">
>    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
>     <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
>    <filter class="solr.LowerCaseFilterFactory"/>
>    <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
>    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>   </analyzer>
>   <analyzer type="query">
>    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true"
> expand="true"/>
>     <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
>    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0"/>
>    <filter class="solr.LowerCaseFilterFactory"/>
>    <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
>    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>   </analyzer>
>  </fieldType>
>
> Can anyone tell me what may be the source of my problem and how to fix it?
>
> I'm on a deadline so quick answers are greatly appreciated ;-)
>
> Thanks for listening,
>
> //Daniel
>