You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Michael Sokolov <so...@ifactory.com> on 2010/10/11 22:57:46 UTC

configuring custom CharStream in solr

I would like to inject my CharStream (or possibly it could be a CharFilter;
this is all in flux at the moment) into the analysis chain for a field.  Can
I do this in solr using the Analyzer configuration syntax in schema.xml, or
would I need to define my own Analyzer?  The solr wiki describes adding
Tokenizers, but doesn't say anything about CharReaders/Filters.

Thanks for any pointers

-Mike


Re: configuring custom CharStream in solr

Posted by Michael Sokolov <so...@ifactory.com>.
  On 10/11/2010 10:18 PM, Chris Hostetter wrote:
> : OK - I found the answer pecking through the source - apparently the name of
> : the element to configure a CharFilter is<charFilter>  - fancy that :)
>
> there's even an example, right there on the wiki...
>
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#CharFilterFactories
>
>
> -Hoss
I am just bathing myself in wizardly astuteness today

thanks

-Mike

Re: configuring custom CharStream in solr

Posted by Chris Hostetter <ho...@fucit.org>.
: OK - I found the answer pecking through the source - apparently the name of
: the element to configure a CharFilter is <charFilter> - fancy that :)

there's even an example, right there on the wiki...

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#CharFilterFactories


-Hoss

Re: configuring custom CharStream in solr

Posted by Michael Sokolov <so...@ifactory.com>.
  On 10/11/2010 8:38 PM, Michael Sokolov wrote:
>  On 10/11/2010 6:41 PM, Koji Sekiguchi wrote:
>> (10/10/12 5:57), Michael Sokolov wrote:
>>> I would like to inject my CharStream (or possibly it could be a 
>>> CharFilter;
>>> this is all in flux at the moment) into the analysis chain for a 
>>> field.  Can
>>> I do this in solr using the Analyzer configuration syntax in 
>>> schema.xml, or
>>> would I need to define my own Analyzer?  The solr wiki describes adding
>>> Tokenizers, but doesn't say anything about CharReaders/Filters.
>>>
>>> Thanks for any pointers
>>>
>>> -Mike
>>>
>> Hi Mike,
>>
>> You can write your own CharFilterFactory that creates your own
>> CharStream. Please refer existing CharFilterFactories in Solr
>> to see how you can implement it.
>>
>> Koji
>>
> Koji - thanks for your response.  I think I can see my way clear to 
> making a factory class for my stream.  My question was really about 
> how to configure the factory.  I see a number of examples of 
> tokenizers and analyzers configured in the example schema.xml, but no 
> readers.  For example:
>
> <fieldType name="text_ws" class="solr.TextField">
> <analyzer>
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> </analyzer>
> </fieldType>
>
> configures a specific tokenizer.  If I want to configure my 
> CharStream, is there an element for that?  Eg:
>
> <fieldType name="text_ws" class="solr.TextField">
> <analyzer>
> <reader class="com.mycompany.solr.FancyCharReader" />
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> </analyzer>
> </fieldType>
>
> I am guessing that I need to create my own analyzer and hard-code the 
> reader/tokenizer filter chain in there, but it would be nice if there 
> were a syntax like the one I inferred above.
>
> -Mike
OK - I found the answer pecking through the source - apparently the name 
of the element to configure a CharFilter is <charFilter> - fancy that :)

-MIke

Re: configuring custom CharStream in solr

Posted by Michael Sokolov <so...@ifactory.com>.
  On 10/11/2010 6:41 PM, Koji Sekiguchi wrote:
> (10/10/12 5:57), Michael Sokolov wrote:
>> I would like to inject my CharStream (or possibly it could be a 
>> CharFilter;
>> this is all in flux at the moment) into the analysis chain for a 
>> field.  Can
>> I do this in solr using the Analyzer configuration syntax in 
>> schema.xml, or
>> would I need to define my own Analyzer?  The solr wiki describes adding
>> Tokenizers, but doesn't say anything about CharReaders/Filters.
>>
>> Thanks for any pointers
>>
>> -Mike
>>
> Hi Mike,
>
> You can write your own CharFilterFactory that creates your own
> CharStream. Please refer existing CharFilterFactories in Solr
> to see how you can implement it.
>
> Koji
>
Koji - thanks for your response.  I think I can see my way clear to 
making a factory class for my stream.  My question was really about how 
to configure the factory.  I see a number of examples of tokenizers and 
analyzers configured in the example schema.xml, but no readers.  For 
example:

<fieldType name="text_ws" class="solr.TextField">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
</analyzer>
</fieldType>

configures a specific tokenizer.  If I want to configure my CharStream, 
is there an element for that?  Eg:

<fieldType name="text_ws" class="solr.TextField">
<analyzer>
<reader class="com.mycompany.solr.FancyCharReader" />
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
</analyzer>
</fieldType>

I am guessing that I need to create my own analyzer and hard-code the 
reader/tokenizer filter chain in there, but it would be nice if there 
were a syntax like the one I inferred above.

-Mike

Re: configuring custom CharStream in solr

Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.
(10/10/12 5:57), Michael Sokolov wrote:
> I would like to inject my CharStream (or possibly it could be a CharFilter;
> this is all in flux at the moment) into the analysis chain for a field.  Can
> I do this in solr using the Analyzer configuration syntax in schema.xml, or
> would I need to define my own Analyzer?  The solr wiki describes adding
> Tokenizers, but doesn't say anything about CharReaders/Filters.
>
> Thanks for any pointers
>
> -Mike
>
Hi Mike,

You can write your own CharFilterFactory that creates your own
CharStream. Please refer existing CharFilterFactories in Solr
to see how you can implement it.

Koji

-- 
http://www.rondhuit.com/en/