You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by vibhoreng04 <vi...@gmail.com> on 2013/05/21 23:14:26 UTC

ShingleFilterFactory

Hi All,

I have a use case where I need to Search like this-

"Apple Corporation Limited" should create the pairs like -Apple
Corporation,Corporation Apple,Corporation Limited,Limited Corporation.
Below is the filter I am using-

 <fieldType name="text_shingle" class="solr.TextField" 
positionIncrementGap="100"> 
      <analyzer type="index"> 
        <tokenizer class="solr.WhitespaceTokenizerFactory"/> 
        <filter class="solr.LowerCaseFilterFactory"/> 
        <filter class="solr.PositionFilterFactory" />
        <filter class="solr.ShingleFilterFactory" minShingleSize="2"
maxShingleSize="2" outputUnigrams="false" /> 
      </analyzer> 
      <analyzer type="query"> 
        <tokenizer class="solr.WhitespaceTokenizerFactory"/> 
        <filter class="solr.LowerCaseFilterFactory"/> 
        <filter class="solr.ShingleFilterFactory" outputUnigrams="false"
minShingleSize="2" maxShingleSize="2" outputUnigramIfNoNgram="true" />
		 
      </analyzer> 
    </fieldType>

Below are the issues I am facing-
1)This only allows me to search like a phase query.I mean to say I does
allows "Apple Corporation" but does not allow Apple Corporation.
2)Also I need to know how I can search for -Corporation Apple .
3)I want to restrict single word search like Apple.I mean to say apple
should not give me any results.

Please suggest.
Regards,
Vibhor Jaiswal




--
View this message in context: http://lucene.472066.n3.nabble.com/ShingleFilterFactory-tp4065068.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: ShingleFilterFactory

Posted by Erick Erickson <er...@gmail.com>.
Seems to me like shingles will work for you. To your questionsl
1> not really, phrases are just how you get the single token through
the parser. Escaping the spaces would work, as term1\ term2
2> This is just a standard negation, i.e. q=-field:term1\ term2
3> This works if you specify minShingleSize="2"

The problem is, however, that shingles don't re-arrange your input,
you'll probably have to write a custom filter for that. By that I mean
that the input "apple corporation" would just get shingled as exactly
that token, not "corporation apple".

Best
Erick

On Tue, May 21, 2013 at 5:14 PM, vibhoreng04 <vi...@gmail.com> wrote:
> Hi All,
>
> I have a use case where I need to Search like this-
>
> "Apple Corporation Limited" should create the pairs like -Apple
> Corporation,Corporation Apple,Corporation Limited,Limited Corporation.
> Below is the filter I am using-
>
>  <fieldType name="text_shingle" class="solr.TextField"
> positionIncrementGap="100">
>       <analyzer type="index">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.PositionFilterFactory" />
>         <filter class="solr.ShingleFilterFactory" minShingleSize="2"
> maxShingleSize="2" outputUnigrams="false" />
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.ShingleFilterFactory" outputUnigrams="false"
> minShingleSize="2" maxShingleSize="2" outputUnigramIfNoNgram="true" />
>
>       </analyzer>
>     </fieldType>
>
> Below are the issues I am facing-
> 1)This only allows me to search like a phase query.I mean to say I does
> allows "Apple Corporation" but does not allow Apple Corporation.
> 2)Also I need to know how I can search for -Corporation Apple .
> 3)I want to restrict single word search like Apple.I mean to say apple
> should not give me any results.
>
> Please suggest.
> Regards,
> Vibhor Jaiswal
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/ShingleFilterFactory-tp4065068.html
> Sent from the Solr - User mailing list archive at Nabble.com.