You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Rounak Jain <ro...@gmail.com> on 2013/05/03 16:34:28 UTC
Configure Shingle Filter to ignore ngrams made of tokens with same
start and end
Hello,
I was using Shingle Fitler with Suggester to implement an autosuggest
dropdown. The field I'm using with shingle filter has a worddelimiter with
preserveoriginal=1 to tokenize "women's" as "women's" and "womens."
Because of this, when shingle filter is generating word ngrams, apart from
the expected tokens, there's also a "women's womens" tokens. I wanted to
know if there's any way to configure ShingleFilter so that it ignores
tokens with same start and end values.
Thanks,
Rounak
Re: Configure Shingle Filter to ignore ngrams made of tokens with same start and end
Posted by Steve Rowe <sa...@gmail.com>.
An issue exists for this problem: https://issues.apache.org/jira/browse/LUCENE-3475
On May 3, 2013, at 11:00 AM, Walter Underwood <wu...@wunderwood.org> wrote:
> The shingle filter should respect positions. If it doesn't, that is worth filing a bug so we know about it.
>
> wunder
>
> On May 3, 2013, at 10:50 AM, Jack Krupansky wrote:
>
>> In short, no. I don't think you want to use the shingle filter on a token stream that has multiple tokens at the same position, otherwise, you will get confused "suggestions", as you've encountered.
>>
>> -- Jack Krupansky
>>
>> -----Original Message----- From: Rounak Jain
>> Sent: Friday, May 03, 2013 7:34 AM
>> To: solr-user@lucene.apache.org
>> Subject: Configure Shingle Filter to ignore ngrams made of tokens with same start and end
>>
>> Hello,
>>
>> I was using Shingle Fitler with Suggester to implement an autosuggest
>> dropdown. The field I'm using with shingle filter has a worddelimiter with
>> preserveoriginal=1 to tokenize "women's" as "women's" and "womens."
>>
>> Because of this, when shingle filter is generating word ngrams, apart from
>> the expected tokens, there's also a "women's womens" tokens. I wanted to
>> know if there's any way to configure ShingleFilter so that it ignores
>> tokens with same start and end values.
>>
>> Thanks,
>> Rounak
>
>
>
>
Re: Configure Shingle Filter to ignore ngrams made of tokens with same start and end
Posted by Walter Underwood <wu...@wunderwood.org>.
The shingle filter should respect positions. If it doesn't, that is worth filing a bug so we know about it.
wunder
On May 3, 2013, at 10:50 AM, Jack Krupansky wrote:
> In short, no. I don't think you want to use the shingle filter on a token stream that has multiple tokens at the same position, otherwise, you will get confused "suggestions", as you've encountered.
>
> -- Jack Krupansky
>
> -----Original Message----- From: Rounak Jain
> Sent: Friday, May 03, 2013 7:34 AM
> To: solr-user@lucene.apache.org
> Subject: Configure Shingle Filter to ignore ngrams made of tokens with same start and end
>
> Hello,
>
> I was using Shingle Fitler with Suggester to implement an autosuggest
> dropdown. The field I'm using with shingle filter has a worddelimiter with
> preserveoriginal=1 to tokenize "women's" as "women's" and "womens."
>
> Because of this, when shingle filter is generating word ngrams, apart from
> the expected tokens, there's also a "women's womens" tokens. I wanted to
> know if there's any way to configure ShingleFilter so that it ignores
> tokens with same start and end values.
>
> Thanks,
> Rounak
Re: Configure Shingle Filter to ignore ngrams made of tokens with same start and end
Posted by Jack Krupansky <ja...@basetechnology.com>.
In short, no. I don't think you want to use the shingle filter on a token
stream that has multiple tokens at the same position, otherwise, you will
get confused "suggestions", as you've encountered.
-- Jack Krupansky
-----Original Message-----
From: Rounak Jain
Sent: Friday, May 03, 2013 7:34 AM
To: solr-user@lucene.apache.org
Subject: Configure Shingle Filter to ignore ngrams made of tokens with same
start and end
Hello,
I was using Shingle Fitler with Suggester to implement an autosuggest
dropdown. The field I'm using with shingle filter has a worddelimiter with
preserveoriginal=1 to tokenize "women's" as "women's" and "womens."
Because of this, when shingle filter is generating word ngrams, apart from
the expected tokens, there's also a "women's womens" tokens. I wanted to
know if there's any way to configure ShingleFilter so that it ignores
tokens with same start and end values.
Thanks,
Rounak