You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Benson Margulies <be...@basistech.com> on 2013/10/28 12:39:06 UTC

Why is there a token filter factory abstraction but not a tokenizer factory abstraction in Lucene?

I'm working on tool that wants to construct analyzers 'at arms length' -- a
bit like from a solr schema -- so that multiple dueling analyzers could be
in their own class loaders at one time. I want to just define a simple
configuration for char filters, tokenizer, and token filter. So it would
be, well, convenient if there were a tokenizer factory at the lucene level
as there is a token filter factory. I can use Solr easily enough for now,
but I'd consider it cleaner if I could define this entirely at the Lucene
level.

Re: Why is there a token filter factory abstraction but not a tokenizer factory abstraction in Lucene?

Posted by Benson Margulies <be...@basistech.com>.
We have been in the habit of naming of classes on the theory that Java
packages are doing work in the namespace.

So, we'd name a class:
com.basistech.<something>.BaseLinguisticsTokenFilterFactory

So that means that our name in the SPI system is just 'BaseLinguistics'.
That seems a bit problematic. I don't suppose there are some guidelines?


On Mon, Oct 28, 2013 at 9:43 AM, Benson Margulies <be...@basistech.com>wrote:

> Just how 'experimental' is the SPI system at this point, if that's a
> reasonable question?
>
>
> On Mon, Oct 28, 2013 at 8:41 AM, Uwe Schindler <uw...@thetaphi.de> wrote:
>
>> Hi Benson,
>>
>> the base factory class and the abstract Tokenizer, TpokenFilter and
>> CharFilter factory classes are all in Lucene's analyzers-commons module
>> (since 4.0). They are no longer part of Solr.
>>
>> Uwe
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: uwe@thetaphi.de
>>
>>
>> > -----Original Message-----
>> > From: Benson Margulies [mailto:benson@basistech.com]
>> > Sent: Monday, October 28, 2013 12:41 PM
>> > To: java-user@lucene.apache.org
>> > Subject: Re: Why is there a token filter factory abstraction but not a
>> tokenizer
>> > factory abstraction in Lucene?
>> >
>> > OK, so, here I go again making a public idiot of myself. Could it be
>> that the
>> > tokenizer factory is 'relatively recent' as in since 4.1?
>> >
>> >
>> >
>> >
>> > On Mon, Oct 28, 2013 at 7:39 AM, Benson Margulies
>> > <be...@basistech.com>wrote:
>> >
>> > > I'm working on tool that wants to construct analyzers 'at arms length'
>> > > -- a bit like from a solr schema -- so that multiple dueling analyzers
>> > > could be in their own class loaders at one time. I want to just define
>> > > a simple configuration for char filters, tokenizer, and token filter.
>> > > So it would be, well, convenient if there were a tokenizer factory at
>> > > the lucene level as there is a token filter factory. I can use Solr
>> > > easily enough for now, but I'd consider it cleaner if I could define
>> > > this entirely at the Lucene level.
>> > >
>> > >
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

Re: Why is there a token filter factory abstraction but not a tokenizer factory abstraction in Lucene?

Posted by Benson Margulies <be...@basistech.com>.
Just how 'experimental' is the SPI system at this point, if that's a
reasonable question?


On Mon, Oct 28, 2013 at 8:41 AM, Uwe Schindler <uw...@thetaphi.de> wrote:

> Hi Benson,
>
> the base factory class and the abstract Tokenizer, TpokenFilter and
> CharFilter factory classes are all in Lucene's analyzers-commons module
> (since 4.0). They are no longer part of Solr.
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
> > -----Original Message-----
> > From: Benson Margulies [mailto:benson@basistech.com]
> > Sent: Monday, October 28, 2013 12:41 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: Why is there a token filter factory abstraction but not a
> tokenizer
> > factory abstraction in Lucene?
> >
> > OK, so, here I go again making a public idiot of myself. Could it be
> that the
> > tokenizer factory is 'relatively recent' as in since 4.1?
> >
> >
> >
> >
> > On Mon, Oct 28, 2013 at 7:39 AM, Benson Margulies
> > <be...@basistech.com>wrote:
> >
> > > I'm working on tool that wants to construct analyzers 'at arms length'
> > > -- a bit like from a solr schema -- so that multiple dueling analyzers
> > > could be in their own class loaders at one time. I want to just define
> > > a simple configuration for char filters, tokenizer, and token filter.
> > > So it would be, well, convenient if there were a tokenizer factory at
> > > the lucene level as there is a token filter factory. I can use Solr
> > > easily enough for now, but I'd consider it cleaner if I could define
> > > this entirely at the Lucene level.
> > >
> > >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

RE: Why is there a token filter factory abstraction but not a tokenizer factory abstraction in Lucene?

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi Benson,

the base factory class and the abstract Tokenizer, TpokenFilter and CharFilter factory classes are all in Lucene's analyzers-commons module (since 4.0). They are no longer part of Solr.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Benson Margulies [mailto:benson@basistech.com]
> Sent: Monday, October 28, 2013 12:41 PM
> To: java-user@lucene.apache.org
> Subject: Re: Why is there a token filter factory abstraction but not a tokenizer
> factory abstraction in Lucene?
> 
> OK, so, here I go again making a public idiot of myself. Could it be that the
> tokenizer factory is 'relatively recent' as in since 4.1?
> 
> 
> 
> 
> On Mon, Oct 28, 2013 at 7:39 AM, Benson Margulies
> <be...@basistech.com>wrote:
> 
> > I'm working on tool that wants to construct analyzers 'at arms length'
> > -- a bit like from a solr schema -- so that multiple dueling analyzers
> > could be in their own class loaders at one time. I want to just define
> > a simple configuration for char filters, tokenizer, and token filter.
> > So it would be, well, convenient if there were a tokenizer factory at
> > the lucene level as there is a token filter factory. I can use Solr
> > easily enough for now, but I'd consider it cleaner if I could define
> > this entirely at the Lucene level.
> >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Why is there a token filter factory abstraction but not a tokenizer factory abstraction in Lucene?

Posted by Benson Margulies <be...@basistech.com>.
OK, so, here I go again making a public idiot of myself. Could it be that
the tokenizer factory is 'relatively recent' as in since 4.1?




On Mon, Oct 28, 2013 at 7:39 AM, Benson Margulies <be...@basistech.com>wrote:

> I'm working on tool that wants to construct analyzers 'at arms length' --
> a bit like from a solr schema -- so that multiple dueling analyzers could
> be in their own class loaders at one time. I want to just define a simple
> configuration for char filters, tokenizer, and token filter. So it would
> be, well, convenient if there were a tokenizer factory at the lucene level
> as there is a token filter factory. I can use Solr easily enough for now,
> but I'd consider it cleaner if I could define this entirely at the Lucene
> level.
>
>