You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by vitaly bulgakov <bu...@yahoo.com> on 2015/10/15 15:47:44 UTC
Tokenize ShingleFilterFactory results and apply filters to tokens
I want to rephrase my question I asked in another post.
As far as I understand filter ShingleFilterFactory creates shingle as
strings.
But I want to apply more filters (like EdgeNgrams) to each token of a
shingle.
For example from "Home Improvement Service" I have two shingles:
"Home Improvement" and "Improvement Service".
I want to apply EdgeNgram to be able to do exact match to:
"Hom Improvem" and "Improvemen Servi" as new phrases.
Any, help, ideas are welcomed and appreciated.
--
View this message in context: http://lucene.472066.n3.nabble.com/Tokenize-ShingleFilterFactory-results-and-apply-filters-to-tokens-tp4234574.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: Tokenize ShingleFilterFactory results and apply filters to tokens
Posted by Steve Rowe <sa...@gmail.com>.
Hi Vitaliy,
I don’t know of any combination of built-in Lucene/Solr analysis components that would do what you want, but there used to be filter called ShingleMatrixFilter that (if I understand both that filter and what you want correctly), would do what you want, following an EdgeNGramFilter: <https://lucene.apache.org/core/3_6_2/api/all/org/apache/lucene/analysis/shingle/ShingleMatrixFilter.html>
It was deprecated in v3.1 and removed in v4.0 (see <https://issues.apache.org/jira/browse/LUCENE-2920>) because it wasn’t being maintained by the original creator and nobody else understood it :). Uwe Schindler put up a patch that rewrote it and fixed some problems on <https://issues.apache.org/jira/browse/LUCENE-1391>, but that was never finished/committed.
What you want could create a huge number of terms, depending on the # of documents, terms in the field, and term length. What do you want to use these terms for?
Steve
> On Oct 17, 2015, at 10:33 AM, vitaly bulgakov <bu...@yahoo.com> wrote:
>
> /why don't you put EdgeNGramFilter just after ShingleFilter?/
>
> Because it will do Edge Ngrams over a shingle as a string:
> for "Home Improvement" shingle it will do: .... Hom, Home, Home , Home I,
> Home Im, Home Imp ......
>
> But I need:
> ... Hom Imp, Hom Impr ......
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Tokenize-ShingleFilterFactory-results-and-apply-filters-to-tokens-tp4234574p4234872.html
> Sent from the Solr - User mailing list archive at Nabble.com.
Re: Tokenize ShingleFilterFactory results and apply filters to
tokens
Posted by vitaly bulgakov <bu...@yahoo.com>.
/why don't you put EdgeNGramFilter just after ShingleFilter?/
Because it will do Edge Ngrams over a shingle as a string:
for "Home Improvement" shingle it will do: .... Hom, Home, Home , Home I,
Home Im, Home Imp ......
But I need:
... Hom Imp, Hom Impr ......
--
View this message in context: http://lucene.472066.n3.nabble.com/Tokenize-ShingleFilterFactory-results-and-apply-filters-to-tokens-tp4234574p4234872.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: Tokenize ShingleFilterFactory results and apply filters to tokens
Posted by Koji Sekiguchi <ko...@rondhuit.com>.
Hi Vitaly,
I'm not sure I understand you correctly, why don't you put EdgeNGramFilter just after
ShingleFilter? That is:
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.ShingleFilterFactory" minShingleSize="2" maxShingleSize="2"/>
<filter class="solr.EdgeNGramFilterFactory"/>
Koji
On 2015/10/15 22:47, vitaly bulgakov wrote:
> I want to rephrase my question I asked in another post.
> As far as I understand filter ShingleFilterFactory creates shingle as
> strings.
> But I want to apply more filters (like EdgeNgrams) to each token of a
> shingle.
>
> For example from "Home Improvement Service" I have two shingles:
> "Home Improvement" and "Improvement Service".
>
> I want to apply EdgeNgram to be able to do exact match to:
> "Hom Improvem" and "Improvemen Servi" as new phrases.
>
> Any, help, ideas are welcomed and appreciated.
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Tokenize-ShingleFilterFactory-results-and-apply-filters-to-tokens-tp4234574.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
Re: Tokenize ShingleFilterFactory results and apply filters to tokens
Posted by Alexandre Rafalovitch <ar...@gmail.com>.
This sounds like an attempt to create an auto-complete using n-grams
in text. In which case, Ted Sullivan's writing might be of relevance:
http://lucidworks.com/blog/author/tedsullivan/
Regards,
Alex.
----
Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/
On 15 October 2015 at 09:47, vitaly bulgakov <bu...@yahoo.com> wrote:
> I want to rephrase my question I asked in another post.
> As far as I understand filter ShingleFilterFactory creates shingle as
> strings.
> But I want to apply more filters (like EdgeNgrams) to each token of a
> shingle.
>
> For example from "Home Improvement Service" I have two shingles:
> "Home Improvement" and "Improvement Service".
>
> I want to apply EdgeNgram to be able to do exact match to:
> "Hom Improvem" and "Improvemen Servi" as new phrases.
>
> Any, help, ideas are welcomed and appreciated.
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Tokenize-ShingleFilterFactory-results-and-apply-filters-to-tokens-tp4234574.html
> Sent from the Solr - User mailing list archive at Nabble.com.