You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Ashish P <as...@gmail.com> on 2009/04/10 05:00:27 UTC

multiple tokenizers needed

I want to analyze a text based on pattern ";" and separate on whitespace and
it is a Japanese text so use CJKAnalyzer + tokenizer also.
in short I want to do:
			 <analyzer class="org.apache.lucene.analysis.cjk.CJKAnalyzer">
				<tokenizer class="solr.PatternTokenizerFactory" pattern=";" />
				<tokenizer class="solr.WhitespaceTokenizerFactory" />
				<tokenizer class="org.apache.lucene.analysis.cjk.CJKTokenizer" />
			</analyzer> 
Can anyone please tell me how to achieve this?? Because the above syntax is
not at all possible.
-- 
View this message in context: http://www.nabble.com/multiple-tokenizers-needed-tp22982382p22982382.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: multiple tokenizers needed

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

Or have the indexing client split the data at these delimiters and  
just use the CJKAnalyzer.

	Erik

On Apr 10, 2009, at 7:30 AM, Grant Ingersoll wrote:

> The only thing that comes to mind in a short term way is writing two  
> TokenFilter implementations that wrap the second and third tokenizers
>
> On Apr 9, 2009, at 11:00 PM, Ashish P wrote:
>
>>
>> I want to analyze a text based on pattern ";" and separate on  
>> whitespace and
>> it is a Japanese text so use CJKAnalyzer + tokenizer also.
>> in short I want to do:
>> 			 <analyzer class="org.apache.lucene.analysis.cjk.CJKAnalyzer">
>> 				<tokenizer class="solr.PatternTokenizerFactory" pattern=";" />
>> 				<tokenizer class="solr.WhitespaceTokenizerFactory" />
>> 				<tokenizer class="org.apache.lucene.analysis.cjk.CJKTokenizer" />
>> 			</analyzer>
>> Can anyone please tell me how to achieve this?? Because the above  
>> syntax is
>> not at all possible.
>> -- 
>> View this message in context: http://www.nabble.com/multiple-tokenizers-needed-tp22982382p22982382.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
> using Solr/Lucene:
> http://www.lucidimagination.com/search

Re: multiple tokenizers needed

Posted by Grant Ingersoll <gs...@apache.org>.

The only thing that comes to mind in a short term way is writing two  
TokenFilter implementations that wrap the second and third tokenizers

On Apr 9, 2009, at 11:00 PM, Ashish P wrote:

>
> I want to analyze a text based on pattern ";" and separate on  
> whitespace and
> it is a Japanese text so use CJKAnalyzer + tokenizer also.
> in short I want to do:
> 			 <analyzer class="org.apache.lucene.analysis.cjk.CJKAnalyzer">
> 				<tokenizer class="solr.PatternTokenizerFactory" pattern=";" />
> 				<tokenizer class="solr.WhitespaceTokenizerFactory" />
> 				<tokenizer class="org.apache.lucene.analysis.cjk.CJKTokenizer" />
> 			</analyzer>
> Can anyone please tell me how to achieve this?? Because the above  
> syntax is
> not at all possible.
> -- 
> View this message in context: http://www.nabble.com/multiple-tokenizers-needed-tp22982382p22982382.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search