You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by vit <bu...@yahoo.com> on 2015/10/14 17:58:29 UTC
Can I use tokenizer twice ?
I have Solr 4.2
I need to do the following:
1. white space tokenize
2. create shingles
3. use EdgeNGramFilter for each word in shingles, but not in a shingle as a
string
So can I do this?
*<tokenizer class="solr.WhitespaceTokenizerFactory"/> *
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ShingleFilterFactory" minShingleSize="2"
maxShingleSize="4" outputUnigrams="false" outputUnigramsIfNoShingles="true"
/>
*<tokenizer class="solr.WhitespaceTokenizerFactory"/> *
<filter class="solr.EdgeNGramFilterFactory" minGramSize="3"
maxGramSize="25"/>
--
View this message in context: http://lucene.472066.n3.nabble.com/Can-I-use-tokenizer-twice-tp4234438.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: Can I use tokenizer twice ?
Posted by vitaly bulgakov <bu...@yahoo.com>.
Steve,
/You could achieve what you want by copying to another field and defining a
separate analyzer for each. One would create shingles, and the other edge
ngrams. /
Could you please elaborate this. I am not sure I understand how to do it by
using copyField.
--
View this message in context: http://lucene.472066.n3.nabble.com/Can-I-use-tokenizer-twice-tp4234438p4234503.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: Can I use tokenizer twice ?
Posted by Steve Rowe <sa...@gmail.com>.
Hi,
Analyzers must have exactly one tokenizer, no more and no less.
You could achieve what you want by copying to another field and defining a separate analyzer for each. One would create shingles, and the other edge ngrams.
Steve
> On Oct 14, 2015, at 11:58 AM, vit <bu...@yahoo.com> wrote:
>
> I have Solr 4.2
> I need to do the following:
>
> 1. white space tokenize
> 2. create shingles
> 3. use EdgeNGramFilter for each word in shingles, but not in a shingle as a
> string
>
> So can I do this?
>
> *<tokenizer class="solr.WhitespaceTokenizerFactory"/> *
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.ShingleFilterFactory" minShingleSize="2"
> maxShingleSize="4" outputUnigrams="false" outputUnigramsIfNoShingles="true"
> />
> *<tokenizer class="solr.WhitespaceTokenizerFactory"/> *
> <filter class="solr.EdgeNGramFilterFactory" minGramSize="3"
> maxGramSize="25"/>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Can-I-use-tokenizer-twice-tp4234438.html
> Sent from the Solr - User mailing list archive at Nabble.com.