You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by vibhoreng04 <vi...@gmail.com> on 2013/05/21 23:14:26 UTC
ShingleFilterFactory
Hi All,
I have a use case where I need to Search like this-
"Apple Corporation Limited" should create the pairs like -Apple
Corporation,Corporation Apple,Corporation Limited,Limited Corporation.
Below is the filter I am using-
<fieldType name="text_shingle" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PositionFilterFactory" />
<filter class="solr.ShingleFilterFactory" minShingleSize="2"
maxShingleSize="2" outputUnigrams="false" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ShingleFilterFactory" outputUnigrams="false"
minShingleSize="2" maxShingleSize="2" outputUnigramIfNoNgram="true" />
</analyzer>
</fieldType>
Below are the issues I am facing-
1)This only allows me to search like a phase query.I mean to say I does
allows "Apple Corporation" but does not allow Apple Corporation.
2)Also I need to know how I can search for -Corporation Apple .
3)I want to restrict single word search like Apple.I mean to say apple
should not give me any results.
Please suggest.
Regards,
Vibhor Jaiswal
--
View this message in context: http://lucene.472066.n3.nabble.com/ShingleFilterFactory-tp4065068.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: ShingleFilterFactory
Posted by Erick Erickson <er...@gmail.com>.
Seems to me like shingles will work for you. To your questionsl
1> not really, phrases are just how you get the single token through
the parser. Escaping the spaces would work, as term1\ term2
2> This is just a standard negation, i.e. q=-field:term1\ term2
3> This works if you specify minShingleSize="2"
The problem is, however, that shingles don't re-arrange your input,
you'll probably have to write a custom filter for that. By that I mean
that the input "apple corporation" would just get shingled as exactly
that token, not "corporation apple".
Best
Erick
On Tue, May 21, 2013 at 5:14 PM, vibhoreng04 <vi...@gmail.com> wrote:
> Hi All,
>
> I have a use case where I need to Search like this-
>
> "Apple Corporation Limited" should create the pairs like -Apple
> Corporation,Corporation Apple,Corporation Limited,Limited Corporation.
> Below is the filter I am using-
>
> <fieldType name="text_shingle" class="solr.TextField"
> positionIncrementGap="100">
> <analyzer type="index">
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.PositionFilterFactory" />
> <filter class="solr.ShingleFilterFactory" minShingleSize="2"
> maxShingleSize="2" outputUnigrams="false" />
> </analyzer>
> <analyzer type="query">
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.ShingleFilterFactory" outputUnigrams="false"
> minShingleSize="2" maxShingleSize="2" outputUnigramIfNoNgram="true" />
>
> </analyzer>
> </fieldType>
>
> Below are the issues I am facing-
> 1)This only allows me to search like a phase query.I mean to say I does
> allows "Apple Corporation" but does not allow Apple Corporation.
> 2)Also I need to know how I can search for -Corporation Apple .
> 3)I want to restrict single word search like Apple.I mean to say apple
> should not give me any results.
>
> Please suggest.
> Regards,
> Vibhor Jaiswal
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/ShingleFilterFactory-tp4065068.html
> Sent from the Solr - User mailing list archive at Nabble.com.