You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by SolrLover <bb...@gmail.com> on 2013/08/02 19:27:18 UTC

SOLR matching keywords with / without whitespace

I am trying to match the keywords with / without white space but one of the
case fails always..

For ex:

I am indexing 4 documents

name: wal mart
name: walmart
name: WalMart
name: Walmart

Now searching on name either using
wal mart
walmart
Walmart
WalMart

should return all the above 4 documents but searching using keyword 'wal
mart' returns only the first document and not the remaining 3 documents.

I am using shingle filter factory to create combination of the words during
indexing. Please find below my configuration. Can someone let me know where
I am wrong?

      <fieldType name="shingleString" class="solr.TextField"
omitNorms="true">
          <analyzer type="index">
             <charFilter class="solr.PatternReplaceCharFilterFactory"
              pattern="'+" replacement=""/>
              <tokenizer class="solr.WhitespaceTokenizerFactory"/>
              <filter class="solr.ASCIIFoldingFilterFactory"/>
              <filter class="solr.ShingleFilterFactory" minShingleSize="2"
              maxShingleSize="3" outputUnigrams="true"/>
              <filter class="solr.PatternReplaceFilterFactory" pattern="\W+"
              replacement=""/>
              <filter class="solr.LowerCaseFilterFactory"/>
          </analyzer>
         <analyzer type="query">
             <tokenizer class="solr.WhitespaceTokenizerFactory"/>
             <filter class="solr.ShingleFilterFactory" minShingleSize="2"
             maxShingleSize="99" outputUnigrams="true"/>
             <filter class="solr.PatternReplaceFilterFactory" pattern="\W+"
             replacement=""/>
             <filter class="solr.LowerCaseFilterFactory"/>
         </analyzer>
    </fieldType>
  </types>





--
View this message in context: http://lucene.472066.n3.nabble.com/SOLR-matching-keywords-with-without-whitespace-tp4082244.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SOLR matching keywords with / without whitespace

Posted by Erick Erickson <er...@gmail.com>.
No good way comes immediately to mind. How would Solr know
that 'wal mart' should be concatenated but 'many people' should
not?

You can do this with somewhat with synonyms, but it depends on
knowing ahead of time what all the possibilities are.

Best
Erick


On Fri, Aug 2, 2013 at 1:27 PM, SolrLover <bb...@gmail.com> wrote:

> I am trying to match the keywords with / without white space but one of the
> case fails always..
>
> For ex:
>
> I am indexing 4 documents
>
> name: wal mart
> name: walmart
> name: WalMart
> name: Walmart
>
> Now searching on name either using
> wal mart
> walmart
> Walmart
> WalMart
>
> should return all the above 4 documents but searching using keyword 'wal
> mart' returns only the first document and not the remaining 3 documents.
>
> I am using shingle filter factory to create combination of the words during
> indexing. Please find below my configuration. Can someone let me know where
> I am wrong?
>
>       <fieldType name="shingleString" class="solr.TextField"
> omitNorms="true">
>           <analyzer type="index">
>              <charFilter class="solr.PatternReplaceCharFilterFactory"
>               pattern="'+" replacement=""/>
>               <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>               <filter class="solr.ASCIIFoldingFilterFactory"/>
>               <filter class="solr.ShingleFilterFactory" minShingleSize="2"
>               maxShingleSize="3" outputUnigrams="true"/>
>               <filter class="solr.PatternReplaceFilterFactory"
> pattern="\W+"
>               replacement=""/>
>               <filter class="solr.LowerCaseFilterFactory"/>
>           </analyzer>
>          <analyzer type="query">
>              <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>              <filter class="solr.ShingleFilterFactory" minShingleSize="2"
>              maxShingleSize="99" outputUnigrams="true"/>
>              <filter class="solr.PatternReplaceFilterFactory" pattern="\W+"
>              replacement=""/>
>              <filter class="solr.LowerCaseFilterFactory"/>
>          </analyzer>
>     </fieldType>
>   </types>
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/SOLR-matching-keywords-with-without-whitespace-tp4082244.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

RE: SOLR matching keywords with / without whitespace

Posted by Markus Jelsma <ma...@openindex.io>.
Perhaps it's not the correct tool here but decompounding using a simple dictionary decompounder token filter will fix this problem.

 
 
-----Original message-----
> From:Erick Erickson <er...@gmail.com>
> Sent: Saturday 3rd August 2013 13:33
> To: solr-user@lucene.apache.org
> Subject: Re: SOLR matching keywords with / without whitespace
> 
> No good way comes immediately to mind. How would Solr know
> that 'wal mart' should be concatenated but 'many people' should
> not?
> 
> You can do this with somewhat with synonyms, but it depends on
> knowing ahead of time what all the possibilities are.
> 
> Best
> Erick
> 
> 
> On Fri, Aug 2, 2013 at 1:27 PM, SolrLover <bb...@gmail.com> wrote:
> 
> > I am trying to match the keywords with / without white space but one of the
> > case fails always..
> >
> > For ex:
> >
> > I am indexing 4 documents
> >
> > name: wal mart
> > name: walmart
> > name: WalMart
> > name: Walmart
> >
> > Now searching on name either using
> > wal mart
> > walmart
> > Walmart
> > WalMart
> >
> > should return all the above 4 documents but searching using keyword 'wal
> > mart' returns only the first document and not the remaining 3 documents.
> >
> > I am using shingle filter factory to create combination of the words during
> > indexing. Please find below my configuration. Can someone let me know where
> > I am wrong?
> >
> >       <fieldType name="shingleString" class="solr.TextField"
> > omitNorms="true">
> >           <analyzer type="index">
> >              <charFilter class="solr.PatternReplaceCharFilterFactory"
> >               pattern="'+" replacement=""/>
> >               <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >               <filter class="solr.ASCIIFoldingFilterFactory"/>
> >               <filter class="solr.ShingleFilterFactory" minShingleSize="2"
> >               maxShingleSize="3" outputUnigrams="true"/>
> >               <filter class="solr.PatternReplaceFilterFactory"
> > pattern="\W+"
> >               replacement=""/>
> >               <filter class="solr.LowerCaseFilterFactory"/>
> >           </analyzer>
> >          <analyzer type="query">
> >              <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >              <filter class="solr.ShingleFilterFactory" minShingleSize="2"
> >              maxShingleSize="99" outputUnigrams="true"/>
> >              <filter class="solr.PatternReplaceFilterFactory" pattern="\W+"
> >              replacement=""/>
> >              <filter class="solr.LowerCaseFilterFactory"/>
> >          </analyzer>
> >     </fieldType>
> >   </types>
> >
> >
> >
> >
> >
> > --
> > View this message in context:
> > http://lucene.472066.n3.nabble.com/SOLR-matching-keywords-with-without-whitespace-tp4082244.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>