You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Ashish P <as...@gmail.com> on 2009/04/10 05:00:27 UTC
multiple tokenizers needed
I want to analyze a text based on pattern ";" and separate on whitespace and
it is a Japanese text so use CJKAnalyzer + tokenizer also.
in short I want to do:
<analyzer class="org.apache.lucene.analysis.cjk.CJKAnalyzer">
<tokenizer class="solr.PatternTokenizerFactory" pattern=";" />
<tokenizer class="solr.WhitespaceTokenizerFactory" />
<tokenizer class="org.apache.lucene.analysis.cjk.CJKTokenizer" />
</analyzer>
Can anyone please tell me how to achieve this?? Because the above syntax is
not at all possible.
--
View this message in context: http://www.nabble.com/multiple-tokenizers-needed-tp22982382p22982382.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: multiple tokenizers needed
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
Or have the indexing client split the data at these delimiters and
just use the CJKAnalyzer.
Erik
On Apr 10, 2009, at 7:30 AM, Grant Ingersoll wrote:
> The only thing that comes to mind in a short term way is writing two
> TokenFilter implementations that wrap the second and third tokenizers
>
> On Apr 9, 2009, at 11:00 PM, Ashish P wrote:
>
>>
>> I want to analyze a text based on pattern ";" and separate on
>> whitespace and
>> it is a Japanese text so use CJKAnalyzer + tokenizer also.
>> in short I want to do:
>> <analyzer class="org.apache.lucene.analysis.cjk.CJKAnalyzer">
>> <tokenizer class="solr.PatternTokenizerFactory" pattern=";" />
>> <tokenizer class="solr.WhitespaceTokenizerFactory" />
>> <tokenizer class="org.apache.lucene.analysis.cjk.CJKTokenizer" />
>> </analyzer>
>> Can anyone please tell me how to achieve this?? Because the above
>> syntax is
>> not at all possible.
>> --
>> View this message in context: http://www.nabble.com/multiple-tokenizers-needed-tp22982382p22982382.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
> using Solr/Lucene:
> http://www.lucidimagination.com/search
Re: multiple tokenizers needed
Posted by Grant Ingersoll <gs...@apache.org>.
The only thing that comes to mind in a short term way is writing two
TokenFilter implementations that wrap the second and third tokenizers
On Apr 9, 2009, at 11:00 PM, Ashish P wrote:
>
> I want to analyze a text based on pattern ";" and separate on
> whitespace and
> it is a Japanese text so use CJKAnalyzer + tokenizer also.
> in short I want to do:
> <analyzer class="org.apache.lucene.analysis.cjk.CJKAnalyzer">
> <tokenizer class="solr.PatternTokenizerFactory" pattern=";" />
> <tokenizer class="solr.WhitespaceTokenizerFactory" />
> <tokenizer class="org.apache.lucene.analysis.cjk.CJKTokenizer" />
> </analyzer>
> Can anyone please tell me how to achieve this?? Because the above
> syntax is
> not at all possible.
> --
> View this message in context: http://www.nabble.com/multiple-tokenizers-needed-tp22982382p22982382.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
--------------------------
Grant Ingersoll
http://www.lucidimagination.com/
Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using Solr/Lucene:
http://www.lucidimagination.com/search