You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Lukas Kahwe Smith <ml...@pooteeweet.org> on 2010/11/11 22:02:39 UTC
facet+shingle in autosuggest
Hi,
I am using a facet.prefix search with shingle's in my autosuggest:
<fieldType name="shingle" class="solr.TextField" positionIncrementGap="100" stored="false" multiValued="true">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
<filter class="solr.ShingleFilterFactory"
maxShingleSize="3" outputUnigrams="true" outputUnigramIfNoNgram="false" />
</analyzer>
</fieldType>
Now I would like to prevent stop words to appear in the suggestions:
<lst name="autosuggest_shingle">
<int name="member states">52</int>
<int name="member states experiencing">6</int>
<int name="member states in">6</int>
<int name="member states the">5</int>
<int name="member states to">25</int>
<int name="member states with">7</int>
</lst>
Here I would like to filter out the last 4 suggestions really. Is there a way I can sensibly bring in a stop word filter here? Actually in theory the stop words could appear as the first or second word as well.
So I guess when producing shingle's I want to skip any stop word from being part of any shingle.
regards,
Lukas Kahwe Smith
mls@pooteeweet.org
Re: facet+shingle in autosuggest
Posted by Lukas Kahwe Smith <ml...@pooteeweet.org>.
On 11.11.2010, at 17:42, Erick Erickson wrote:
> I don't know all the implications here, but can't you just
> insert the StopwordFilterFactory before the ShingleFilterFactory
> and turn it loose?
havent tried this, but i would suspect that i would then get in trouble with stuff like "united states of america". it would then generate a shingle with "united states america" which in turn wouldnt generate a proper phrase search string.
one option of course would be to restrict the shingles to 2 words and then using the stop word filter would work as expected.
regards,
Lukas Kahwe Smith
mls@pooteeweet.org
Re: facet+shingle in autosuggest
Posted by Erick Erickson <er...@gmail.com>.
I don't know all the implications here, but can't you just
insert the StopwordFilterFactory before the ShingleFilterFactory
and turn it loose?
Best
Erick
On Thu, Nov 11, 2010 at 4:02 PM, Lukas Kahwe Smith <ml...@pooteeweet.org>wrote:
> Hi,
>
> I am using a facet.prefix search with shingle's in my autosuggest:
> <fieldType name="shingle" class="solr.TextField"
> positionIncrementGap="100" stored="false" multiValued="true">
> <analyzer>
> <tokenizer class="solr.StandardTokenizerFactory" />
> <filter class="solr.LowerCaseFilterFactory" />
> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
> <filter class="solr.ShingleFilterFactory"
> maxShingleSize="3" outputUnigrams="true"
> outputUnigramIfNoNgram="false" />
> </analyzer>
> </fieldType>
>
> Now I would like to prevent stop words to appear in the suggestions:
>
> <lst name="autosuggest_shingle">
> <int name="member states">52</int>
> <int name="member states experiencing">6</int>
> <int name="member states in">6</int>
> <int name="member states the">5</int>
> <int name="member states to">25</int>
> <int name="member states with">7</int>
> </lst>
>
> Here I would like to filter out the last 4 suggestions really. Is there a
> way I can sensibly bring in a stop word filter here? Actually in theory the
> stop words could appear as the first or second word as well.
>
> So I guess when producing shingle's I want to skip any stop word from being
> part of any shingle.
>
> regards,
> Lukas Kahwe Smith
> mls@pooteeweet.org
>
>
>
>