You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by vit <bu...@yahoo.com> on 2015/10/14 17:58:29 UTC

Can I use tokenizer twice ?

I have Solr 4.2
I need to do the following:

1. white space tokenize
2. create shingles
3. use EdgeNGramFilter for each word in shingles, but not in a shingle as a
string

So can I do this?

*<tokenizer class="solr.WhitespaceTokenizerFactory"/> *
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ShingleFilterFactory" minShingleSize="2"
maxShingleSize="4" outputUnigrams="false" outputUnigramsIfNoShingles="true"
/>
*<tokenizer class="solr.WhitespaceTokenizerFactory"/> *
<filter class="solr.EdgeNGramFilterFactory" minGramSize="3"
maxGramSize="25"/>



--
View this message in context: http://lucene.472066.n3.nabble.com/Can-I-use-tokenizer-twice-tp4234438.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Can I use tokenizer twice ?

Posted by vitaly bulgakov <bu...@yahoo.com>.
Steve,
/You could achieve what you want by copying to another field and defining a
separate analyzer for each.  One would create shingles, and the other edge
ngrams. /  

Could you please elaborate this. I am not sure I understand how to do it by
using copyField.




--
View this message in context: http://lucene.472066.n3.nabble.com/Can-I-use-tokenizer-twice-tp4234438p4234503.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Can I use tokenizer twice ?

Posted by Steve Rowe <sa...@gmail.com>.
Hi,

Analyzers must have exactly one tokenizer, no more and no less.

You could achieve what you want by copying to another field and defining a separate analyzer for each.  One would create shingles, and the other edge ngrams.  

Steve

> On Oct 14, 2015, at 11:58 AM, vit <bu...@yahoo.com> wrote:
> 
> I have Solr 4.2
> I need to do the following:
> 
> 1. white space tokenize
> 2. create shingles
> 3. use EdgeNGramFilter for each word in shingles, but not in a shingle as a
> string
> 
> So can I do this?
> 
> *<tokenizer class="solr.WhitespaceTokenizerFactory"/> *
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.ShingleFilterFactory" minShingleSize="2"
> maxShingleSize="4" outputUnigrams="false" outputUnigramsIfNoShingles="true"
> />
> *<tokenizer class="solr.WhitespaceTokenizerFactory"/> *
> <filter class="solr.EdgeNGramFilterFactory" minGramSize="3"
> maxGramSize="25"/>
> 
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Can-I-use-tokenizer-twice-tp4234438.html
> Sent from the Solr - User mailing list archive at Nabble.com.