You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jan Simon Winkelmann <wi...@newsfactory.de> on 2010/02/24 13:00:02 UTC

Strange search behavior

Hi,

I'm having some problems understanding why certain search queries don't return any results.
I have a field of type "text", which is defined like this:

        <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
            <analyzer type="index">
                <charFilter class="solr.HTMLStripCharFilterFactory"/> 
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                <filter class="solr.ISOLatin1AccentFilterFactory"/> 
                <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
                <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/> 
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.SnowballPorterFilterFactory" language="German" />
                <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
            </analyzer>
            <analyzer type="query">
                <charFilter class="solr.HTMLStripCharFilterFactory"/> 
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                <filter class="solr.ISOLatin1AccentFilterFactory"/>
                <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
                <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
                <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/> 
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.SnowballPorterFilterFactory" language="German" />
                <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
            </analyzer>

I have a total of about 3.2 Million documents indexed, of which a few hundred are in the format of "Tagesergebnisse der Oddset-Spiele vom 18.02.2010". 

My problem is, that if I search for "oddset-spiele", i get no results, but when I search for "oddsetspiele" or "oddset*spiele" i get lots of results. As far as I understand the WordDelimiterFilter converts each phrase into "name:oddset (spiel oddsetspiel)", at least thats what the analyzer says. What I don't get ist hat when I search for "oddset-spiele" I get no results at all.

I would appreciate any help or insight anyone could privide.

Best
Jan

Re: Strange search behavior

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Jan,

If you go to Solr Admin Analysis page and enter your problematic query, what do you see?

 Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/



----- Original Message ----
> From: Jan Simon Winkelmann <wi...@newsfactory.de>
> To: "solr-user@lucene.apache.org" <so...@lucene.apache.org>
> Sent: Wed, February 24, 2010 7:00:02 AM
> Subject: Strange search behavior
> 
> Hi,
> 
> I'm having some problems understanding why certain search queries don't return 
> any results.
> I have a field of type "text", which is defined like this:
> 
>         
> positionIncrementGap="100">
>             
>                 
>                 
>                 
>                 
> words="stopwords.txt" enablePositionIncrements="true" />
>                 
> generateWordParts="1" generateNumberParts="1" catenateWords="1" 
> catenateNumbers="1" catenateAll="0"/> 
>                 
>                 
> language="German" />
>                 
>             
>             
>                 
>                 
>                 
>                 
> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>                 
> words="stopwords.txt"/>
>                 
> generateWordParts="1" generateNumberParts="1" catenateWords="1" 
> catenateNumbers="1" catenateAll="0"/> 
>                 
>                 
> language="German" />
>                 
>             
> 
> I have a total of about 3.2 Million documents indexed, of which a few hundred 
> are in the format of "Tagesergebnisse der Oddset-Spiele vom 18.02.2010". 
> 
> My problem is, that if I search for "oddset-spiele", i get no results, but when 
> I search for "oddsetspiele" or "oddset*spiele" i get lots of results. As far as 
> I understand the WordDelimiterFilter converts each phrase into "name:oddset 
> (spiel oddsetspiel)", at least thats what the analyzer says. What I don't get 
> ist hat when I search for "oddset-spiele" I get no results at all.
> 
> I would appreciate any help or insight anyone could privide.
> 
> Best
> Jan