You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Toke Eskildsen <te...@statsbiblioteket.dk> on 2016/11/23 09:14:05 UTC

Re: Future of FieldCache in Solr

On Wed, 2016-10-26 at 13:05 +0000, Adrien Grand wrote:
> But we seem to still care a lot about uninverting, which does
> not make sense to me since everybody should have moved to doc values
> already?

I might have missed something here, but doesn't such a switch mean that
it will no longer be possible to facet on Text fields?

We facet on 3 Text fields in our core index: Title, Author and
Location. All of these fields use a KeywordTokenizer and multiple steps
of normalising the input. It seems like quite an obvious setup, so I
would guess that it is not uncommon.

How would this scenario be supported if uninversion is removed?

- Toke Eskildsen, State and University Library, Denmark

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Future of FieldCache in Solr

Posted by Ryan Josal <rj...@gmail.com>.

We use Toke's scenario in a couple places too. We are capable writing a URP
that does it, but it feels dirty, and replacing config with code seems like
something it makes sense to avoid.

Having top level support of some kind of analysis in URP or somewhere else
can also help in other situations where you want some analysis before it
goes in a non-TextField.

Ryan

On Thu, Nov 24, 2016 at 00:53 Toke Eskildsen <te...@statsbiblioteket.dk> wrote:

> On Wed, 2016-11-23 at 13:23 +0000, David Smiley wrote:
> > This is supported at the Lucene level via SortedSetDocValues.  Solr
> > doesn't yet support this for its TextField
> > -- https://issues.apache.org/jira/browse/SOLR-8362
> >  however you could work around this with an URP or copyField
>
> copyfield does not help here as that copies the raw values. We need the
> normalised values for display when we do faceting.
>
> >  or perhaps subclassing TextField so that you can tokenize the text a
> > second time to generate a list of SortedSetDocValuesField.  Probably
> > least painless is to use another field.
>
> So to facet on the normalised (analyzed really) values on a Text field
> in a post-FieldCache Solr, I would need to write an URP or some other
> custom code. I can manage that or just do the normalisation as part of
> the pre-processing.
>
> Question is if my scenario (using analyzers for facet terms) is wide-
> spread? If so, I find this increase in implementation requirements
> problematic.
>
>
> I don't care for FieldCache as such - SOLR-8362 would be a better
> solution for the scenario I describe. Or maybe an URP that makes it
> easy to provide a list of analyzers? I am simply looking for a way
> that a random end-user can easily do faceting on analyzed terms,
> leveraging all the nice build-in filters in Solr.
>
> - Toke Eskildsen, State and University Library, Denmark
>

Re: Future of FieldCache in Solr

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.

On Wed, 2016-11-23 at 13:23 +0000, David Smiley wrote:
> This is supported at the Lucene level via SortedSetDocValues.  Solr
> doesn't yet support this for its TextField
> -- https://issues.apache.org/jira/browse/SOLR-8362
>  however you could work around this with an URP or copyField

copyfield does not help here as that copies the raw values. We need the
normalised values for display when we do faceting.

>  or perhaps subclassing TextField so that you can tokenize the text a
> second time to generate a list of SortedSetDocValuesField.  Probably
> least painless is to use another field.

So to facet on the normalised (analyzed really) values on a Text field
in a post-FieldCache Solr, I would need to write an URP or some other
custom code. I can manage that or just do the normalisation as part of
the pre-processing.

Question is if my scenario (using analyzers for facet terms) is wide-
spread? If so, I find this increase in implementation requirements
problematic.

I don't care for FieldCache as such - SOLR-8362 would be a better
solution for the scenario I describe. Or maybe an URP that makes it
easy to provide a list of analyzers? I am simply looking for a way
that a random end-user can easily do faceting on analyzed terms,
leveraging all the nice build-in filters in Solr.

- Toke Eskildsen, State and University Library, Denmark

Re: Future of FieldCache in Solr

Posted by David Smiley <da...@gmail.com>.

This is supported at the Lucene level via SortedSetDocValues.  Solr doesn't
yet support this for its TextField --
https://issues.apache.org/jira/browse/SOLR-8362   however you could work
around this with an URP or copyField or perhaps subclassing TextField so
that you can tokenize the text a second time to generate a list
of SortedSetDocValuesField.  Probably least painless is to use another
field.
~ David

On Wed, Nov 23, 2016 at 4:14 AM Toke Eskildsen <te...@statsbiblioteket.dk>
wrote:

> On Wed, 2016-10-26 at 13:05 +0000, Adrien Grand wrote:
> > But we seem to still care a lot about uninverting, which does
> > not make sense to me since everybody should have moved to doc values
> > already?
>
> I might have missed something here, but doesn't such a switch mean that
> it will no longer be possible to facet on Text fields?
>
> We facet on 3 Text fields in our core index: Title, Author and
> Location. All of these fields use a KeywordTokenizer and multiple steps
> of normalising the input. It seems like quite an obvious setup, so I
> would guess that it is not uncommon.
>
> How would this scenario be supported if uninversion is removed?
>
>
> - Toke Eskildsen, State and University Library, Denmark
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
> --
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com