You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Siddhartha Singh Sandhu <sa...@gmail.com> on 2015/09/25 18:58:24 UTC

Using a plugin to filter in schema.xml

Hi,

I wanted to use the twitter-text libraries github implementation to filter
the tokens(hashtags) in my text. I know I can use the Pattern Matching
tokenizer also, but would trust twitter's library more then my own regex to
do the job for me. I wanted to use it in unison with
the solr.WhitespaceTokenizerFactory to get the tokens.

Need help in understanding on how can I do that. Do I have to refactor the
twitter Java library to "extends TokenFilterFactory" or can I use it the
way it is.

Regards,

Sid.

Re: Using a plugin to filter in schema.xml

Posted by Siddhartha Singh Sandhu <sa...@gmail.com>.

I need a go to for writing the custom tokenizer. any suggestions?

On Fri, Sep 25, 2015 at 2:36 PM, Siddhartha Singh Sandhu <
sandhusolr@gmail.com> wrote:

> For sure.
>
> On Fri, Sep 25, 2015 at 1:13 PM, Alexandre Rafalovitch <arafalov@gmail.com
> > wrote:
>
>> I think (I lost the library link) you would need to build a bridge by
>> doing a custom Analyzer or Tokenizer and then using the library under
>> the covers. Would be a nice contribution to open-source if you managed
>> to achieve that.
>>
>> Regards,
>>    Alex.
>> ----
>> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
>> http://www.solr-start.com/
>>
>>
>> On 25 September 2015 at 12:58, Siddhartha Singh Sandhu
>> <sa...@gmail.com> wrote:
>> > Hi,
>> >
>> > I wanted to use the twitter-text libraries github implementation to
>> filter
>> > the tokens(hashtags) in my text. I know I can use the Pattern Matching
>> > tokenizer also, but would trust twitter's library more then my own
>> regex to
>> > do the job for me. I wanted to use it in unison with
>> > the solr.WhitespaceTokenizerFactory to get the tokens.
>> >
>> > Need help in understanding on how can I do that. Do I have to refactor
>> the
>> > twitter Java library to "extends TokenFilterFactory" or can I use it the
>> > way it is.
>> >
>> > Regards,
>> >
>> > Sid.
>>
>
>

Re: Using a plugin to filter in schema.xml

Posted by Siddhartha Singh Sandhu <sa...@gmail.com>.

For sure.

On Fri, Sep 25, 2015 at 1:13 PM, Alexandre Rafalovitch <ar...@gmail.com>
wrote:

> I think (I lost the library link) you would need to build a bridge by
> doing a custom Analyzer or Tokenizer and then using the library under
> the covers. Would be a nice contribution to open-source if you managed
> to achieve that.
>
> Regards,
>    Alex.
> ----
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/
>
>
> On 25 September 2015 at 12:58, Siddhartha Singh Sandhu
> <sa...@gmail.com> wrote:
> > Hi,
> >
> > I wanted to use the twitter-text libraries github implementation to
> filter
> > the tokens(hashtags) in my text. I know I can use the Pattern Matching
> > tokenizer also, but would trust twitter's library more then my own regex
> to
> > do the job for me. I wanted to use it in unison with
> > the solr.WhitespaceTokenizerFactory to get the tokens.
> >
> > Need help in understanding on how can I do that. Do I have to refactor
> the
> > twitter Java library to "extends TokenFilterFactory" or can I use it the
> > way it is.
> >
> > Regards,
> >
> > Sid.
>

Re: Using a plugin to filter in schema.xml

Posted by Alexandre Rafalovitch <ar...@gmail.com>.

I think (I lost the library link) you would need to build a bridge by
doing a custom Analyzer or Tokenizer and then using the library under
the covers. Would be a nice contribution to open-source if you managed
to achieve that.

Regards,
   Alex.
----
Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 25 September 2015 at 12:58, Siddhartha Singh Sandhu
<sa...@gmail.com> wrote:
> Hi,
>
> I wanted to use the twitter-text libraries github implementation to filter
> the tokens(hashtags) in my text. I know I can use the Pattern Matching
> tokenizer also, but would trust twitter's library more then my own regex to
> do the job for me. I wanted to use it in unison with
> the solr.WhitespaceTokenizerFactory to get the tokens.
>
> Need help in understanding on how can I do that. Do I have to refactor the
> twitter Java library to "extends TokenFilterFactory" or can I use it the
> way it is.
>
> Regards,
>
> Sid.