You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Ashish P <as...@gmail.com> on 2009/04/28 07:01:45 UTC

half width katakana

I want to convert half width katakana to full width katakana. I tried using
cjk analyzer but not working.
Does cjkAnalyzer do it or is there any other way??
-- 
View this message in context: http://www.nabble.com/half-width-katakana-tp23270186p23270186.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: half width katakana

Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.

Chris Hostetter wrote:
> : The exception is expected if you use CharStream aware Tokenizer without
> : CharFilters.
>
> Koji: i thought all of the casts had been eliminated and replaced with 
> a call to CharReader.get(Reader) ?
>
>   
Yeah, right. After r758137, ClassCastException should be eliminated.

http://svn.apache.org/viewvc?view=rev&revision=758137

And then CharReader.get(Reader) idiom added as hoss suggested:

http://svn.apache.org/viewvc?view=rev&revision=758161

Ashish, what revision/nightly version did you use when you got ClassCast 
Exception?

Koji

Re: half width katakana

Posted by Chris Hostetter <ho...@fucit.org>.

: The exception is expected if you use CharStream aware Tokenizer without
: CharFilters.

Koji: i thought all of the casts had been eliminated and replaced with 
a call to CharReader.get(Reader) ?

: Please see example/solr/conf/schema.xml for the setting of CharFilter and
: CharStreamAware*Tokenizer:


: > Using CharStreamAwareCJKTokenizerFactory is giving me following error,
: > SEVERE: java.lang.ClassCastException: java.io.StringReader cannot be cast to
: > org.apache.solr.analysis.CharStream
: > 
: > May be you are typecasting Reader to subclass.

-Hoss

Re: half width katakana

Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.

The exception is expected if you use CharStream aware Tokenizer without 
CharFilters.
Please see example/solr/conf/schema.xml for the setting of CharFilter and
CharStreamAware*Tokenizer:

    <!-- charFilter + "CharStream aware" WhitespaceTokenizer  -->
    
<!--                                                                       
    <fieldType name="textCharNorm" class="solr.TextField" 
positionIncrementGap="100">                                                                         

      
<analyzer>                                                               
        <charFilter class="solr.MappingCharFilterFactory" 
mapping="mapping-ISOLatin1Accent.txt"/>                                                              

        <tokenizer 
class="solr.CharStreamAwareWhitespaceTokenizerFactory"/>    
      
</analyzer>                                                              
    
</fieldType>                                                               
    -->

Thank you,

Koji


Ashish P wrote:
> Koji san,
>
> Using CharStreamAwareCJKTokenizerFactory is giving me following error,
> SEVERE: java.lang.ClassCastException: java.io.StringReader cannot be cast to
> org.apache.solr.analysis.CharStream
>
> May be you are typecasting Reader to subclass.
> Thanks,
> Ashish
>
>
> Koji Sekiguchi-2 wrote:
>   
>> If you use CharFilter, you should use "CharStream aware" Tokenizer to 
>> correct terms offsets.
>> There are two CharStreamAware*Tokenizer in trunk/Solr 1.4.
>> Probably you want to use CharStreamAwareCJKTokenizer(Factory).
>>
>> Koji
>>
>>
>> Ashish P wrote:
>>     
>>> After this should I be using same cjkAnalyzer or use charFilter??
>>> Thanks,
>>> Ashish
>>>
>>>
>>> Koji Sekiguchi-2 wrote:
>>>   
>>>       
>>>> Ashish P wrote:
>>>>     
>>>>         
>>>>> I want to convert half width katakana to full width katakana. I tried
>>>>> using
>>>>> cjk analyzer but not working.
>>>>> Does cjkAnalyzer do it or is there any other way??
>>>>>   
>>>>>       
>>>>>           
>>>> CharFilter which comes with trunk/Solr 1.4 just covers this type of
>>>> problem.
>>>> If you are using Solr 1.3, try the patch attached below:
>>>>
>>>> https://issues.apache.org/jira/browse/SOLR-822
>>>>
>>>> Koji
>>>>
>>>>
>>>>
>>>>
>>>>     
>>>>         
>>>   
>>>       
>>
>>     
>
>

Re: half width katakana

Posted by Ashish P <as...@gmail.com>.

Koji san,

Using CharStreamAwareCJKTokenizerFactory is giving me following error,
SEVERE: java.lang.ClassCastException: java.io.StringReader cannot be cast to
org.apache.solr.analysis.CharStream

May be you are typecasting Reader to subclass.
Thanks,
Ashish


Koji Sekiguchi-2 wrote:
> 
> If you use CharFilter, you should use "CharStream aware" Tokenizer to 
> correct terms offsets.
> There are two CharStreamAware*Tokenizer in trunk/Solr 1.4.
> Probably you want to use CharStreamAwareCJKTokenizer(Factory).
> 
> Koji
> 
> 
> Ashish P wrote:
>> After this should I be using same cjkAnalyzer or use charFilter??
>> Thanks,
>> Ashish
>>
>>
>> Koji Sekiguchi-2 wrote:
>>   
>>> Ashish P wrote:
>>>     
>>>> I want to convert half width katakana to full width katakana. I tried
>>>> using
>>>> cjk analyzer but not working.
>>>> Does cjkAnalyzer do it or is there any other way??
>>>>   
>>>>       
>>> CharFilter which comes with trunk/Solr 1.4 just covers this type of
>>> problem.
>>> If you are using Solr 1.3, try the patch attached below:
>>>
>>> https://issues.apache.org/jira/browse/SOLR-822
>>>
>>> Koji
>>>
>>>
>>>
>>>
>>>     
>>
>>   
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/half-width-katakana-tp23270186p23272475.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: half width katakana

Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.

If you use CharFilter, you should use "CharStream aware" Tokenizer to 
correct terms offsets.
There are two CharStreamAware*Tokenizer in trunk/Solr 1.4.
Probably you want to use CharStreamAwareCJKTokenizer(Factory).

Koji


Ashish P wrote:
> After this should I be using same cjkAnalyzer or use charFilter??
> Thanks,
> Ashish
>
>
> Koji Sekiguchi-2 wrote:
>   
>> Ashish P wrote:
>>     
>>> I want to convert half width katakana to full width katakana. I tried
>>> using
>>> cjk analyzer but not working.
>>> Does cjkAnalyzer do it or is there any other way??
>>>   
>>>       
>> CharFilter which comes with trunk/Solr 1.4 just covers this type of
>> problem.
>> If you are using Solr 1.3, try the patch attached below:
>>
>> https://issues.apache.org/jira/browse/SOLR-822
>>
>> Koji
>>
>>
>>
>>
>>     
>
>

Re: half width katakana

Posted by Ashish P <as...@gmail.com>.

After this should I be using same cjkAnalyzer or use charFilter??
Thanks,
Ashish


Koji Sekiguchi-2 wrote:
> 
> Ashish P wrote:
>> I want to convert half width katakana to full width katakana. I tried
>> using
>> cjk analyzer but not working.
>> Does cjkAnalyzer do it or is there any other way??
>>   
> 
> CharFilter which comes with trunk/Solr 1.4 just covers this type of
> problem.
> If you are using Solr 1.3, try the patch attached below:
> 
> https://issues.apache.org/jira/browse/SOLR-822
> 
> Koji
> 
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/half-width-katakana-tp23270186p23270453.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: half width katakana

Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.

Ashish P wrote:
> I want to convert half width katakana to full width katakana. I tried using
> cjk analyzer but not working.
> Does cjkAnalyzer do it or is there any other way??
>   

CharFilter which comes with trunk/Solr 1.4 just covers this type of problem.
If you are using Solr 1.3, try the patch attached below:

https://issues.apache.org/jira/browse/SOLR-822

Koji