You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Alexandre Rafalovitch <ar...@gmail.com> on 2014/07/12 15:59:52 UTC

Hints on constructing/running Solr analyzer chains standalone

Hello,

I am interested in creating and running Solr analyzer chains outside
of normal process (no live Solr). Just construct a chain, feed it
tokens and see what happens.

I would appreciate any hints on what that takes and whether there are
any hidden/weird dependencies (e.g. for resource discoveries). I tried
tracing through FieldAnalysis calls, but can't actually seem to find
the point where the actual analysis is done. Just getting lost in sets
of NamedList<NamedList<... all alike.

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Hints on constructing/running Solr analyzer chains standalone

Posted by "david.w.smiley@gmail.com" <da...@gmail.com>.

That sounds like a wonderful project, Alexandre — I’ve always wanted such a
capability!

I suggest approaching this very pragmatically based on minimizing the time
to get something useful, which means leveraging as much as is available
already — that means solr’s existing analysis UI screen.  I suggest
modifying the FieldAnalysisRequestHandler could take optional input of a
provided XML fieldType definition in the request instead of using the live
schema.  It would create a new temporary SolrSchema based on the provided
data, then re-use the rest of its field analyzing code based on that
schema.   Disclaimer: I have yet to look at FieldAnalysisRequestHandler.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Sat, Jul 12, 2014 at 1:16 PM, Alexandre Rafalovitch <ar...@gmail.com>
wrote:

> I don't want to read the schema.xml, but I do want to create factories
> using the same parameters they use in schema. So, it looks like I need
> to play around with ResourceLoaders and maybe SPI loaders, so things
> like wordlists get loaded.
>
> Starting from FieldAnalyzer turned out to be a dead-end because it was
> using pre-initialized field definitions. But starting again from Test
> cases seem to be somewhat more productive.
>
> The idea for the project is to give a web UI where a user can quickly
> put one or more analyzer stacks together and see how it/they perform
> against text (multiple texts). A bit similar to FieldAnalyzer but
> allow to have multiple stacks side-by-side and NOT needing to reload
> the core to add new ones. Then, generate the XML definition, ready for
> pasting in. That's the target anyway.
>
> Regards,
>    Alex.
> Personal: http://www.outerthoughts.com/ and @arafalov
> Solr resources: http://www.solr-start.com/ and @solrstart
> Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
>
>

RE: Hints on constructing/running Solr analyzer chains standalone

Posted by Benson Margulies <bi...@gmail.com>.

Uwe, the last time I looked, Solr was perfectly cheerful about using
analysis components that did not advertise themselves via the factory SPI
system.  So someone might want to go further than calling the available
methods.
On Jul 12, 2014 7:24 PM, "Uwe Schindler" <uw...@thetaphi.de> wrote:

> The factories are part of Lucene, Solr is just using them. To list of
> available factories (in classpath) use
> (Tokenizer|TokenFilter|CharFilter)Factory.availableXxxxx() methods (to
> list all their names). You can invoke them using the corresponding
> forName() method and build an Analyzer from them. The latter has to be done
> manually, there is no general simple thing like Solr's chains. But that is
> quite easy to implement (if you really need an Analyzer instance). To just
> build a TokenStream for analysis, the factories is all you need (in fact
> Solr's chain just calls the factories in order... and returns it as
> TokenStreamComponents).
> You don't need to deal with SPI, just make the factories available in
> classpath, Lucene finds them automatically.
>
> For loading resources, use Lucene's ResourceLoader, which gets passed to
> the Factory's method inform() method. You only *need* to pass one, if and
> only if the factory implements ResourceLoaderAware. There are several
> ResourceLoaders available, Solr has its own very complicated one, but the
> default Lucene ones are: ClasspathResourceLoader, FilesystemResourceLoader.
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
> > -----Original Message-----
> > From: Alexandre Rafalovitch [mailto:arafalov@gmail.com]
> > Sent: Saturday, July 12, 2014 7:17 PM
> > To: dev@lucene.apache.org
> > Subject: Re: Hints on constructing/running Solr analyzer chains
> standalone
> >
> > I don't want to read the schema.xml, but I do want to create factories
> using
> > the same parameters they use in schema. So, it looks like I need to play
> > around with ResourceLoaders and maybe SPI loaders, so things like
> wordlists
> > get loaded.
> >
> > Starting from FieldAnalyzer turned out to be a dead-end because it was
> using
> > pre-initialized field definitions. But starting again from Test cases
> seem to be
> > somewhat more productive.
> >
> > The idea for the project is to give a web UI where a user can quickly
> put one
> > or more analyzer stacks together and see how it/they perform against text
> > (multiple texts). A bit similar to FieldAnalyzer but allow to have
> multiple
> > stacks side-by-side and NOT needing to reload the core to add new ones.
> > Then, generate the XML definition, ready for pasting in. That's the
> target
> > anyway.
> >
> > Regards,
> >    Alex.
> > Personal: http://www.outerthoughts.com/ and @arafalov Solr resources:
> > http://www.solr-start.com/ and @solrstart Solr popularizers community:
> > https://www.linkedin.com/groups?gid=6713853
> >
> >
> > On Sat, Jul 12, 2014 at 11:34 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
> > > Hi,
> > >
> > >
> > >> Hmmmm, I think it's reasonably straightforward to construct what is
> > >> implied by a Solr analysis chain in Lucene, would that do? Or do you
> > >> want to read a schema.xml file outside Solr?
> > >>
> > >> If the former, then you can pretty much skip the Solr code entirely.
> > >
> > > Read this:
> > >
> > http://lucene.apache.org/core/4_9_0/core/org/apache/lucene/analysis/pa
> > > ckage-summary.html#package_description
> > >
> > > To do analysis, Solr is not needed at all, unless you want to read
> > schema.xml files. If you want to do this, that is quite easy using the
> > IndexSchema class. You can then get the analyzer from the field type or
> field
> > name. How to use the analyzer is described above and unrelated to Solr.
> > >
> > > Uwe
> > >
> > >> On Sat, Jul 12, 2014 at 6:59 AM, Alexandre Rafalovitch
> > >> <ar...@gmail.com>
> > >> wrote:
> > >> > Hello,
> > >> >
> > >> > I am interested in creating and running Solr analyzer chains
> > >> > outside of normal process (no live Solr). Just construct a chain,
> > >> > feed it tokens and see what happens.
> > >> >
> > >> > I would appreciate any hints on what that takes and whether there
> > >> > are any hidden/weird dependencies (e.g. for resource discoveries).
> > >> > I tried tracing through FieldAnalysis calls, but can't actually
> > >> > seem to find the point where the actual analysis is done. Just
> > >> > getting lost in sets of NamedList<NamedList<... all alike.
> > >> >
> > >> > Regards,
> > >> >    Alex.
> > >> > Personal: http://www.outerthoughts.com/ and @arafalov Solr
> > resources:
> > >> > http://www.solr-start.com/ and @solrstart Solr popularizers
> > community:
> > >> > https://www.linkedin.com/groups?gid=6713853
> > >> >
> > >> > -------------------------------------------------------------------
> > >> > -- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For
> > >> > additional commands, e-mail: dev-help@lucene.apache.org
> > >> >
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For
> > >> additional commands, e-mail: dev-help@lucene.apache.org
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For
> > > additional commands, e-mail: dev-help@lucene.apache.org
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional
> > commands, e-mail: dev-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

RE: Hints on constructing/running Solr analyzer chains standalone

Posted by Uwe Schindler <uw...@thetaphi.de>.

The factories are part of Lucene, Solr is just using them. To list of available factories (in classpath) use
(Tokenizer|TokenFilter|CharFilter)Factory.availableXxxxx() methods (to list all their names). You can invoke them using the corresponding forName() method and build an Analyzer from them. The latter has to be done manually, there is no general simple thing like Solr's chains. But that is quite easy to implement (if you really need an Analyzer instance). To just build a TokenStream for analysis, the factories is all you need (in fact Solr's chain just calls the factories in order... and returns it as TokenStreamComponents).
You don't need to deal with SPI, just make the factories available in classpath, Lucene finds them automatically.

For loading resources, use Lucene's ResourceLoader, which gets passed to the Factory's method inform() method. You only *need* to pass one, if and only if the factory implements ResourceLoaderAware. There are several ResourceLoaders available, Solr has its own very complicated one, but the default Lucene ones are: ClasspathResourceLoader, FilesystemResourceLoader.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Alexandre Rafalovitch [mailto:arafalov@gmail.com]
> Sent: Saturday, July 12, 2014 7:17 PM
> To: dev@lucene.apache.org
> Subject: Re: Hints on constructing/running Solr analyzer chains standalone
> 
> I don't want to read the schema.xml, but I do want to create factories using
> the same parameters they use in schema. So, it looks like I need to play
> around with ResourceLoaders and maybe SPI loaders, so things like wordlists
> get loaded.
> 
> Starting from FieldAnalyzer turned out to be a dead-end because it was using
> pre-initialized field definitions. But starting again from Test cases seem to be
> somewhat more productive.
> 
> The idea for the project is to give a web UI where a user can quickly put one
> or more analyzer stacks together and see how it/they perform against text
> (multiple texts). A bit similar to FieldAnalyzer but allow to have multiple
> stacks side-by-side and NOT needing to reload the core to add new ones.
> Then, generate the XML definition, ready for pasting in. That's the target
> anyway.
> 
> Regards,
>    Alex.
> Personal: http://www.outerthoughts.com/ and @arafalov Solr resources:
> http://www.solr-start.com/ and @solrstart Solr popularizers community:
> https://www.linkedin.com/groups?gid=6713853
> 
> 
> On Sat, Jul 12, 2014 at 11:34 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
> > Hi,
> >
> >
> >> Hmmmm, I think it's reasonably straightforward to construct what is
> >> implied by a Solr analysis chain in Lucene, would that do? Or do you
> >> want to read a schema.xml file outside Solr?
> >>
> >> If the former, then you can pretty much skip the Solr code entirely.
> >
> > Read this:
> >
> http://lucene.apache.org/core/4_9_0/core/org/apache/lucene/analysis/pa
> > ckage-summary.html#package_description
> >
> > To do analysis, Solr is not needed at all, unless you want to read
> schema.xml files. If you want to do this, that is quite easy using the
> IndexSchema class. You can then get the analyzer from the field type or field
> name. How to use the analyzer is described above and unrelated to Solr.
> >
> > Uwe
> >
> >> On Sat, Jul 12, 2014 at 6:59 AM, Alexandre Rafalovitch
> >> <ar...@gmail.com>
> >> wrote:
> >> > Hello,
> >> >
> >> > I am interested in creating and running Solr analyzer chains
> >> > outside of normal process (no live Solr). Just construct a chain,
> >> > feed it tokens and see what happens.
> >> >
> >> > I would appreciate any hints on what that takes and whether there
> >> > are any hidden/weird dependencies (e.g. for resource discoveries).
> >> > I tried tracing through FieldAnalysis calls, but can't actually
> >> > seem to find the point where the actual analysis is done. Just
> >> > getting lost in sets of NamedList<NamedList<... all alike.
> >> >
> >> > Regards,
> >> >    Alex.
> >> > Personal: http://www.outerthoughts.com/ and @arafalov Solr
> resources:
> >> > http://www.solr-start.com/ and @solrstart Solr popularizers
> community:
> >> > https://www.linkedin.com/groups?gid=6713853
> >> >
> >> > -------------------------------------------------------------------
> >> > -- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For
> >> > additional commands, e-mail: dev-help@lucene.apache.org
> >> >
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For
> >> additional commands, e-mail: dev-help@lucene.apache.org
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For
> > additional commands, e-mail: dev-help@lucene.apache.org
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional
> commands, e-mail: dev-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Hints on constructing/running Solr analyzer chains standalone

Posted by Alexandre Rafalovitch <ar...@gmail.com>.

Right,

For the first cut, I am not planning to let people to edit things like
synonyms files. Just select from a pre-existing list/dropdown. And
I'll most probably start by providing a bunch of fixed stacks taken
from example schemas. So, none of the Solr's flexibility is requited,
just need to wire it all up correctly.

One of the limiting factors is that the factories are NOT
self-describing. So, I can't figure out what parameter is allowed,
what form it takes and what it's description is. So, probably will
have to hard-code that somewhere.

And if that turns out to be too hard.... Well, let's just say I have a
very long list of cool projects and am looking for the most impact for
my time investment. But from digging into the sources yesterday, the
backend looks quite doable. The front-end is - of course - always more
of a challenge.

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On Sun, Jul 13, 2014 at 10:55 AM, Erick Erickson
<er...@gmail.com> wrote:
> Hmmm, sounds pretty cool!
>
> I wonder if it would be sufficient, for the first cut anyway, to let the user
> specify whatever was necessary to bypass all the ResourceLoader stuff,
> why make the user put the files in a place Solr knows about? Instead, for
> development, it might be sufficient (and less error prone) to require them
> to give the UI the information.
>
> Of course _you're_ the one doing the work, so whatever you think best.....
>
> Erick
>
> On Sat, Jul 12, 2014 at 10:16 AM, Alexandre Rafalovitch
> <ar...@gmail.com> wrote:
>> I don't want to read the schema.xml, but I do want to create factories
>> using the same parameters they use in schema. So, it looks like I need
>> to play around with ResourceLoaders and maybe SPI loaders, so things
>> like wordlists get loaded.
>>
>> Starting from FieldAnalyzer turned out to be a dead-end because it was
>> using pre-initialized field definitions. But starting again from Test
>> cases seem to be somewhat more productive.
>>
>> The idea for the project is to give a web UI where a user can quickly
>> put one or more analyzer stacks together and see how it/they perform
>> against text (multiple texts). A bit similar to FieldAnalyzer but
>> allow to have multiple stacks side-by-side and NOT needing to reload
>> the core to add new ones. Then, generate the XML definition, ready for
>> pasting in. That's the target anyway.
>>
>> Regards,
>>    Alex.
>> Personal: http://www.outerthoughts.com/ and @arafalov
>> Solr resources: http://www.solr-start.com/ and @solrstart
>> Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
>>
>>
>> On Sat, Jul 12, 2014 at 11:34 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
>>> Hi,
>>>
>>>
>>>> Hmmmm, I think it's reasonably straightforward to construct what is implied
>>>> by a Solr analysis chain in Lucene, would that do? Or do you want to read a
>>>> schema.xml file outside Solr?
>>>>
>>>> If the former, then you can pretty much skip the Solr code entirely.
>>>
>>> Read this: http://lucene.apache.org/core/4_9_0/core/org/apache/lucene/analysis/package-summary.html#package_description
>>>
>>> To do analysis, Solr is not needed at all, unless you want to read schema.xml files. If you want to do this, that is quite easy using the IndexSchema class. You can then get the analyzer from the field type or field name. How to use the analyzer is described above and unrelated to Solr.
>>>
>>> Uwe
>>>
>>>> On Sat, Jul 12, 2014 at 6:59 AM, Alexandre Rafalovitch <ar...@gmail.com>
>>>> wrote:
>>>> > Hello,
>>>> >
>>>> > I am interested in creating and running Solr analyzer chains outside
>>>> > of normal process (no live Solr). Just construct a chain, feed it
>>>> > tokens and see what happens.
>>>> >
>>>> > I would appreciate any hints on what that takes and whether there are
>>>> > any hidden/weird dependencies (e.g. for resource discoveries). I tried
>>>> > tracing through FieldAnalysis calls, but can't actually seem to find
>>>> > the point where the actual analysis is done. Just getting lost in sets
>>>> > of NamedList<NamedList<... all alike.
>>>> >
>>>> > Regards,
>>>> >    Alex.
>>>> > Personal: http://www.outerthoughts.com/ and @arafalov Solr resources:
>>>> > http://www.solr-start.com/ and @solrstart Solr popularizers community:
>>>> > https://www.linkedin.com/groups?gid=6713853
>>>> >
>>>> > ---------------------------------------------------------------------
>>>> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For
>>>> > additional commands, e-mail: dev-help@lucene.apache.org
>>>> >
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional
>>>> commands, e-mail: dev-help@lucene.apache.org
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Hints on constructing/running Solr analyzer chains standalone

Posted by Erick Erickson <er...@gmail.com>.

Hmmm, sounds pretty cool!

I wonder if it would be sufficient, for the first cut anyway, to let the user
specify whatever was necessary to bypass all the ResourceLoader stuff,
why make the user put the files in a place Solr knows about? Instead, for
development, it might be sufficient (and less error prone) to require them
to give the UI the information.

Of course _you're_ the one doing the work, so whatever you think best.....

Erick

On Sat, Jul 12, 2014 at 10:16 AM, Alexandre Rafalovitch
<ar...@gmail.com> wrote:
> I don't want to read the schema.xml, but I do want to create factories
> using the same parameters they use in schema. So, it looks like I need
> to play around with ResourceLoaders and maybe SPI loaders, so things
> like wordlists get loaded.
>
> Starting from FieldAnalyzer turned out to be a dead-end because it was
> using pre-initialized field definitions. But starting again from Test
> cases seem to be somewhat more productive.
>
> The idea for the project is to give a web UI where a user can quickly
> put one or more analyzer stacks together and see how it/they perform
> against text (multiple texts). A bit similar to FieldAnalyzer but
> allow to have multiple stacks side-by-side and NOT needing to reload
> the core to add new ones. Then, generate the XML definition, ready for
> pasting in. That's the target anyway.
>
> Regards,
>    Alex.
> Personal: http://www.outerthoughts.com/ and @arafalov
> Solr resources: http://www.solr-start.com/ and @solrstart
> Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
>
>
> On Sat, Jul 12, 2014 at 11:34 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
>> Hi,
>>
>>
>>> Hmmmm, I think it's reasonably straightforward to construct what is implied
>>> by a Solr analysis chain in Lucene, would that do? Or do you want to read a
>>> schema.xml file outside Solr?
>>>
>>> If the former, then you can pretty much skip the Solr code entirely.
>>
>> Read this: http://lucene.apache.org/core/4_9_0/core/org/apache/lucene/analysis/package-summary.html#package_description
>>
>> To do analysis, Solr is not needed at all, unless you want to read schema.xml files. If you want to do this, that is quite easy using the IndexSchema class. You can then get the analyzer from the field type or field name. How to use the analyzer is described above and unrelated to Solr.
>>
>> Uwe
>>
>>> On Sat, Jul 12, 2014 at 6:59 AM, Alexandre Rafalovitch <ar...@gmail.com>
>>> wrote:
>>> > Hello,
>>> >
>>> > I am interested in creating and running Solr analyzer chains outside
>>> > of normal process (no live Solr). Just construct a chain, feed it
>>> > tokens and see what happens.
>>> >
>>> > I would appreciate any hints on what that takes and whether there are
>>> > any hidden/weird dependencies (e.g. for resource discoveries). I tried
>>> > tracing through FieldAnalysis calls, but can't actually seem to find
>>> > the point where the actual analysis is done. Just getting lost in sets
>>> > of NamedList<NamedList<... all alike.
>>> >
>>> > Regards,
>>> >    Alex.
>>> > Personal: http://www.outerthoughts.com/ and @arafalov Solr resources:
>>> > http://www.solr-start.com/ and @solrstart Solr popularizers community:
>>> > https://www.linkedin.com/groups?gid=6713853
>>> >
>>> > ---------------------------------------------------------------------
>>> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For
>>> > additional commands, e-mail: dev-help@lucene.apache.org
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional
>>> commands, e-mail: dev-help@lucene.apache.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Hints on constructing/running Solr analyzer chains standalone

Posted by Jack Krupansky <ja...@basetechnology.com>.

I've been through all that code in Solr, and it sounds like you'd have to 
replicate its function. Wow, that's a truly ambitious task! Good Luck!

I'm sure that a fair amount of it could be refactored dramatically to be a 
lot simpler since Solr evolved piecemeal over the years, but... that's 
another monumental task.

And it would indeed be great to have a field type editor and field type API 
for the Solr Admin UI/API itself.

As Uwe indicated, the factories are already in Lucene, so all you need to do 
is generate their parameters from the field type filter parameters. But... 
for a friendly development tool you would probably like a lot more friendly 
parameter checking and error reporting than the raw exceptions (and weak 
validation) found in the traditional Solr/Lucene factories. Again, a lot of 
that could be refactored since it has evolved over the years, but... that's 
another monumental task. Still, Solr would so much the better for it.

And self-describing (and self-documenting) filter factories would be a 
fantastic improvement to Solr.

-- Jack Krupansky

-----Original Message----- 
From: Alexandre Rafalovitch
Sent: Saturday, July 12, 2014 1:16 PM
To: dev@lucene.apache.org
Subject: Re: Hints on constructing/running Solr analyzer chains standalone

I don't want to read the schema.xml, but I do want to create factories
using the same parameters they use in schema. So, it looks like I need
to play around with ResourceLoaders and maybe SPI loaders, so things
like wordlists get loaded.

Starting from FieldAnalyzer turned out to be a dead-end because it was
using pre-initialized field definitions. But starting again from Test
cases seem to be somewhat more productive.

The idea for the project is to give a web UI where a user can quickly
put one or more analyzer stacks together and see how it/they perform
against text (multiple texts). A bit similar to FieldAnalyzer but
allow to have multiple stacks side-by-side and NOT needing to reload
the core to add new ones. Then, generate the XML definition, ready for
pasting in. That's the target anyway.

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853

On Sat, Jul 12, 2014 at 11:34 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
> Hi,
>
>
>> Hmmmm, I think it's reasonably straightforward to construct what is 
>> implied
>> by a Solr analysis chain in Lucene, would that do? Or do you want to read 
>> a
>> schema.xml file outside Solr?
>>
>> If the former, then you can pretty much skip the Solr code entirely.
>
> Read this: 
> http://lucene.apache.org/core/4_9_0/core/org/apache/lucene/analysis/package-summary.html#package_description
>
> To do analysis, Solr is not needed at all, unless you want to read 
> schema.xml files. If you want to do this, that is quite easy using the 
> IndexSchema class. You can then get the analyzer from the field type or 
> field name. How to use the analyzer is described above and unrelated to 
> Solr.
>
> Uwe
>
>> On Sat, Jul 12, 2014 at 6:59 AM, Alexandre Rafalovitch 
>> <ar...@gmail.com>
>> wrote:
>> > Hello,
>> >
>> > I am interested in creating and running Solr analyzer chains outside
>> > of normal process (no live Solr). Just construct a chain, feed it
>> > tokens and see what happens.
>> >
>> > I would appreciate any hints on what that takes and whether there are
>> > any hidden/weird dependencies (e.g. for resource discoveries). I tried
>> > tracing through FieldAnalysis calls, but can't actually seem to find
>> > the point where the actual analysis is done. Just getting lost in sets
>> > of NamedList<NamedList<... all alike.
>> >
>> > Regards,
>> >    Alex.
>> > Personal: http://www.outerthoughts.com/ and @arafalov Solr resources:
>> > http://www.solr-start.com/ and @solrstart Solr popularizers community:
>> > https://www.linkedin.com/groups?gid=6713853
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For
>> > additional commands, e-mail: dev-help@lucene.apache.org
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional
>> commands, e-mail: dev-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Hints on constructing/running Solr analyzer chains standalone

Posted by Alexandre Rafalovitch <ar...@gmail.com>.

I don't want to read the schema.xml, but I do want to create factories
using the same parameters they use in schema. So, it looks like I need
to play around with ResourceLoaders and maybe SPI loaders, so things
like wordlists get loaded.

Starting from FieldAnalyzer turned out to be a dead-end because it was
using pre-initialized field definitions. But starting again from Test
cases seem to be somewhat more productive.

The idea for the project is to give a web UI where a user can quickly
put one or more analyzer stacks together and see how it/they perform
against text (multiple texts). A bit similar to FieldAnalyzer but
allow to have multiple stacks side-by-side and NOT needing to reload
the core to add new ones. Then, generate the XML definition, ready for
pasting in. That's the target anyway.

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On Sat, Jul 12, 2014 at 11:34 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
> Hi,
>
>
>> Hmmmm, I think it's reasonably straightforward to construct what is implied
>> by a Solr analysis chain in Lucene, would that do? Or do you want to read a
>> schema.xml file outside Solr?
>>
>> If the former, then you can pretty much skip the Solr code entirely.
>
> Read this: http://lucene.apache.org/core/4_9_0/core/org/apache/lucene/analysis/package-summary.html#package_description
>
> To do analysis, Solr is not needed at all, unless you want to read schema.xml files. If you want to do this, that is quite easy using the IndexSchema class. You can then get the analyzer from the field type or field name. How to use the analyzer is described above and unrelated to Solr.
>
> Uwe
>
>> On Sat, Jul 12, 2014 at 6:59 AM, Alexandre Rafalovitch <ar...@gmail.com>
>> wrote:
>> > Hello,
>> >
>> > I am interested in creating and running Solr analyzer chains outside
>> > of normal process (no live Solr). Just construct a chain, feed it
>> > tokens and see what happens.
>> >
>> > I would appreciate any hints on what that takes and whether there are
>> > any hidden/weird dependencies (e.g. for resource discoveries). I tried
>> > tracing through FieldAnalysis calls, but can't actually seem to find
>> > the point where the actual analysis is done. Just getting lost in sets
>> > of NamedList<NamedList<... all alike.
>> >
>> > Regards,
>> >    Alex.
>> > Personal: http://www.outerthoughts.com/ and @arafalov Solr resources:
>> > http://www.solr-start.com/ and @solrstart Solr popularizers community:
>> > https://www.linkedin.com/groups?gid=6713853
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For
>> > additional commands, e-mail: dev-help@lucene.apache.org
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional
>> commands, e-mail: dev-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

RE: Hints on constructing/running Solr analyzer chains standalone

Posted by Uwe Schindler <uw...@thetaphi.de>.

Hi,


> Hmmmm, I think it's reasonably straightforward to construct what is implied
> by a Solr analysis chain in Lucene, would that do? Or do you want to read a
> schema.xml file outside Solr?
> 
> If the former, then you can pretty much skip the Solr code entirely.

Read this: http://lucene.apache.org/core/4_9_0/core/org/apache/lucene/analysis/package-summary.html#package_description

To do analysis, Solr is not needed at all, unless you want to read schema.xml files. If you want to do this, that is quite easy using the IndexSchema class. You can then get the analyzer from the field type or field name. How to use the analyzer is described above and unrelated to Solr.

Uwe

> On Sat, Jul 12, 2014 at 6:59 AM, Alexandre Rafalovitch <ar...@gmail.com>
> wrote:
> > Hello,
> >
> > I am interested in creating and running Solr analyzer chains outside
> > of normal process (no live Solr). Just construct a chain, feed it
> > tokens and see what happens.
> >
> > I would appreciate any hints on what that takes and whether there are
> > any hidden/weird dependencies (e.g. for resource discoveries). I tried
> > tracing through FieldAnalysis calls, but can't actually seem to find
> > the point where the actual analysis is done. Just getting lost in sets
> > of NamedList<NamedList<... all alike.
> >
> > Regards,
> >    Alex.
> > Personal: http://www.outerthoughts.com/ and @arafalov Solr resources:
> > http://www.solr-start.com/ and @solrstart Solr popularizers community:
> > https://www.linkedin.com/groups?gid=6713853
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For
> > additional commands, e-mail: dev-help@lucene.apache.org
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional
> commands, e-mail: dev-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Hints on constructing/running Solr analyzer chains standalone

Posted by Erick Erickson <er...@gmail.com>.

Hmmmm, I think it's reasonably straightforward to construct what
is implied by a Solr analysis chain in Lucene, would that do? Or
do you want to read a schema.xml file outside Solr?

If the former, then you can pretty much skip the Solr code entirely.

FWIW,
Erick

On Sat, Jul 12, 2014 at 6:59 AM, Alexandre Rafalovitch
<ar...@gmail.com> wrote:
> Hello,
>
> I am interested in creating and running Solr analyzer chains outside
> of normal process (no live Solr). Just construct a chain, feed it
> tokens and see what happens.
>
> I would appreciate any hints on what that takes and whether there are
> any hidden/weird dependencies (e.g. for resource discoveries). I tried
> tracing through FieldAnalysis calls, but can't actually seem to find
> the point where the actual analysis is done. Just getting lost in sets
> of NamedList<NamedList<... all alike.
>
> Regards,
>    Alex.
> Personal: http://www.outerthoughts.com/ and @arafalov
> Solr resources: http://www.solr-start.com/ and @solrstart
> Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Hints on constructing/running Solr analyzer chains standalone

Posted by Alexandre Rafalovitch <ar...@gmail.com>.

Uhm. That's where I did start. :-(

Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853

On Sat, Jul 12, 2014 at 9:50 PM, Jack Krupansky <ja...@basetechnology.com> wrote:
> Tracing through indexing or query parsing is... a challenge. Start with
> something simpler like the analysis admin API.
>
> See:
> http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/handler/FieldAnalysisRequestHandler.html
>
> -- Jack Krupansky
>
> -----Original Message----- From: Alexandre Rafalovitch
> Sent: Saturday, July 12, 2014 9:59 AM
> To: dev@lucene.apache.org
> Subject: Hints on constructing/running Solr analyzer chains standalone
>
>
> Hello,
>
> I am interested in creating and running Solr analyzer chains outside
> of normal process (no live Solr). Just construct a chain, feed it
> tokens and see what happens.
>
> I would appreciate any hints on what that takes and whether there are
> any hidden/weird dependencies (e.g. for resource discoveries). I tried
> tracing through FieldAnalysis calls, but can't actually seem to find
> the point where the actual analysis is done. Just getting lost in sets
> of NamedList<NamedList<... all alike.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Hints on constructing/running Solr analyzer chains standalone

Posted by Jack Krupansky <ja...@basetechnology.com>.

Tracing through indexing or query parsing is... a challenge. Start with 
something simpler like the analysis admin API.

See:
http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/handler/FieldAnalysisRequestHandler.html

-- Jack Krupansky

-----Original Message----- 
From: Alexandre Rafalovitch
Sent: Saturday, July 12, 2014 9:59 AM
To: dev@lucene.apache.org
Subject: Hints on constructing/running Solr analyzer chains standalone

Hello,

I am interested in creating and running Solr analyzer chains outside
of normal process (no live Solr). Just construct a chain, feed it
tokens and see what happens.

I would appreciate any hints on what that takes and whether there are
any hidden/weird dependencies (e.g. for resource discoveries). I tried
tracing through FieldAnalysis calls, but can't actually seem to find
the point where the actual analysis is done. Just getting lost in sets
of NamedList<NamedList<... all alike.

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org