You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Lukas Kahwe Smith <ml...@pooteeweet.org> on 2010/11/11 22:02:39 UTC

facet+shingle in autosuggest

Hi,

I am using a facet.prefix search with shingle's in my autosuggest:
    <fieldType name="shingle" class="solr.TextField" positionIncrementGap="100" stored="false" multiValued="true">
      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory" />
        <filter class="solr.LowerCaseFilterFactory" />
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
        <filter class="solr.ShingleFilterFactory"
          maxShingleSize="3" outputUnigrams="true" outputUnigramIfNoNgram="false" />
      </analyzer>
    </fieldType>

Now I would like to prevent stop words to appear in the suggestions:

<lst name="autosuggest_shingle">
<int name="member states">52</int>
<int name="member states experiencing">6</int>
<int name="member states in">6</int>
<int name="member states the">5</int>
<int name="member states to">25</int>
<int name="member states with">7</int>
</lst>

Here I would like to filter out the last 4 suggestions really. Is there a way I can sensibly bring in a stop word filter here? Actually in theory the stop words could appear as the first or second word as well.

So I guess when producing shingle's I want to skip any stop word from being part of any shingle.

regards,
Lukas Kahwe Smith
mls@pooteeweet.org




Re: facet+shingle in autosuggest

Posted by Lukas Kahwe Smith <ml...@pooteeweet.org>.
On 11.11.2010, at 17:42, Erick Erickson wrote:

> I don't know all the implications here, but can't you just
> insert the StopwordFilterFactory before the ShingleFilterFactory
> and turn it loose?


havent tried this, but i would suspect that i would then get in trouble with stuff like "united states of america". it would then generate a shingle with "united states america" which in turn wouldnt generate a proper phrase search string.

one option of course would be to restrict the shingles to 2 words and then using the stop word filter would work as expected.

regards,
Lukas Kahwe Smith
mls@pooteeweet.org




Re: facet+shingle in autosuggest

Posted by Erick Erickson <er...@gmail.com>.
I don't know all the implications here, but can't you just
insert the StopwordFilterFactory before the ShingleFilterFactory
and turn it loose?

Best
Erick

On Thu, Nov 11, 2010 at 4:02 PM, Lukas Kahwe Smith <ml...@pooteeweet.org>wrote:

> Hi,
>
> I am using a facet.prefix search with shingle's in my autosuggest:
>    <fieldType name="shingle" class="solr.TextField"
> positionIncrementGap="100" stored="false" multiValued="true">
>      <analyzer>
>        <tokenizer class="solr.StandardTokenizerFactory" />
>        <filter class="solr.LowerCaseFilterFactory" />
>        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>        <filter class="solr.ShingleFilterFactory"
>          maxShingleSize="3" outputUnigrams="true"
> outputUnigramIfNoNgram="false" />
>      </analyzer>
>    </fieldType>
>
> Now I would like to prevent stop words to appear in the suggestions:
>
> <lst name="autosuggest_shingle">
> <int name="member states">52</int>
> <int name="member states experiencing">6</int>
> <int name="member states in">6</int>
> <int name="member states the">5</int>
> <int name="member states to">25</int>
> <int name="member states with">7</int>
> </lst>
>
> Here I would like to filter out the last 4 suggestions really. Is there a
> way I can sensibly bring in a stop word filter here? Actually in theory the
> stop words could appear as the first or second word as well.
>
> So I guess when producing shingle's I want to skip any stop word from being
> part of any shingle.
>
> regards,
> Lukas Kahwe Smith
> mls@pooteeweet.org
>
>
>
>