You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Jim Adams <ja...@gmail.com> on 2009/02/17 23:30:22 UTC

embedded wildcard search not working?

This is a straightforward question, but I haven't been able to figure out
what is up with my application.

I seem to be able to search on trailing wildcards just find.  For example,
fieldName:a* will return documents with apple, ardvaark, etc. in them.  But
if I was to try and search on a field containing 'apple' with 'a*e' I would
return nothing.

My gut is telling me that I should be using a different data type or a
different filter option.  Here is how my text type is defined:

 <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" />
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>

Thanks for your help.
    </fieldType>

Re: embedded wildcard search not working?

Posted by Jim Adams <ja...@gmail.com>.

Some of the wildcards work, but not all of them.  Unsurprisingly, the ones
that seem to work are ones that are wildcards in the 'base' of the word.

Thanks for the tip on the lowercase before stop words.

On Wed, Feb 18, 2009 at 12:35 AM, Otis Gospodnetic <
otis_gospodnetic@yahoo.com> wrote:

> Jim,
>
> Does app*l or even a*p* work?  Perhaps "apple" gets stemmed to something
> that doesn't end in "e", such as "appl"?
> Regarding your config, you probably want to lowercase before removing stop
> words, so you'll want to change the order of those filters a bit.  That's
> not related to your wildcard question.
>
> Otis --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
>
> ________________________________
> From: Jim Adams <ja...@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Wednesday, February 18, 2009 6:30:22 AM
> Subject: embedded wildcard search not working?
>
> This is a straightforward question, but I haven't been able to figure out
> what is up with my application.
>
> I seem to be able to search on trailing wildcards just find.  For example,
> fieldName:a* will return documents with apple, ardvaark, etc. in them.  But
> if I was to try and search on a field containing 'apple' with 'a*e' I would
> return nothing.
>
> My gut is telling me that I should be using a different data type or a
> different filter option.  Here is how my text type is defined:
>
> <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
>       <analyzer type="index">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt"/>
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" />
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.EnglishPorterFilterFactory"
> protected="protwords.txt"/>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.SynonymFilterFactory"
> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt"/>
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.EnglishPorterFilterFactory"
> protected="protwords.txt"/>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>       </analyzer>
>
> Thanks for your help.
>     </fieldType>
>

Re: embedded wildcard search not working?

Posted by Otis Gospodnetic <ot...@yahoo.com>.

Jim,

Does app*l or even a*p* work?  Perhaps "apple" gets stemmed to something that doesn't end in "e", such as "appl"?
Regarding your config, you probably want to lowercase before removing stop words, so you'll want to change the order of those filters a bit.  That's not related to your wildcard question.

Otis --
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch 




________________________________
From: Jim Adams <ja...@gmail.com>
To: solr-user@lucene.apache.org
Sent: Wednesday, February 18, 2009 6:30:22 AM
Subject: embedded wildcard search not working?

This is a straightforward question, but I haven't been able to figure out
what is up with my application.

I seem to be able to search on trailing wildcards just find.  For example,
fieldName:a* will return documents with apple, ardvaark, etc. in them.  But
if I was to try and search on a field containing 'apple' with 'a*e' I would
return nothing.

My gut is telling me that I should be using a different data type or a
different filter option.  Here is how my text type is defined:

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" />
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>

Thanks for your help.
    </fieldType>