You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Adrien Grand <jp...@gmail.com> on 2016/10/26 13:05:11 UTC

Future of FieldCache in Solr

Hi all,

I'm sending this email as there seem to be different expectations about the
future of FieldCache depending on who you are asking. To me FieldCache (and
uninverting in general) is a legacy feature that has been superseded by doc
values since version 4.0, when we introduced them. But we seem to still
care a lot about uninverting, which does not make sense to me since
everybody should have moved to doc values already?

For the record, doc values have many benefits over FieldCache like better
compression (table compression, gcd compression, prefix compression of the
terms dicts, etc.), faster reopens, the fact that data is stored off-heap
and in master they also have better support for sparse fields, which is
something that FieldCache cannot do due to the fact that it is built in
random doc ID order (or it would have to generate more garbage and slow
down reopens even more).

The documentation (
https://cwiki.apache.org/confluence/display/solr/DocValues) and some online
resources (eg.
https://support.lucidworks.com/hc/en-us/articles/201839163-When-to-use-DocValues-in-Solr)
are already recommending to use doc values for sorting, faceting and
function queries. I think it's time to schedule the entire removal of
FieldCache from Solr?

Re: Future of FieldCache in Solr

Posted by David Smiley <da...@gmail.com>.

I'm +1 to phase the FieldCache (UninvertedField) out in some release ahead,
like Solr 7.  The upgrade process is to switch to DV in a 6x release first.

On Wed, Oct 26, 2016 at 10:52 AM Adrien Grand <jp...@gmail.com> wrote:

> Le mer. 26 oct. 2016 à 16:23, Yonik Seeley <ys...@gmail.com> a écrit :
>
> Docvalues benefits is the reason we recommend them by default (and
> non-text fields now do have docvalues by default).
> They do have some drawbacks however:
>  - Require reindexing
>
>
> I don't think that one is an issue if the schema examples enable doc
> values by default.
>
>
>  - Take up more index space
>
>
> If doc values are using X GB of disk space, then it means FieldCache would
> use *at least* as much *memory*. It sounds pretty weird to me to not be
> willing to put on disk something that would reside in memory otherwise.
>
>  - Slower than FieldCache
>
>
> It depends what we are talking about. While facets on a static index might
> be slightly faster, FieldCache makes reopens much slower.
>
>
> So although the majority will be better served by docvalues, I don't
> think there should be a rush to remove the option of using the
> FieldCache.
>
>
> Doc values have been out for more than 4 years, I don't think I am rushing
> anything. FieldCache has existed for a very long time, so it does not look
> too terrible, but when you think about it, wouldn't you think it is crazy
> if we decided to build an inverted index in memory from stored fields on
> the first time that a field is searched on?
>
> Finally something that annoys me too is that it makes points harder to
> integrate since it is expected that a field that is indexed with points
> instead of the inverted index should be uninvertable too.
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Future of FieldCache in Solr

Posted by Adrien Grand <jp...@gmail.com>.

Le jeu. 27 oct. 2016 à 20:45, Yonik Seeley <ys...@gmail.com> a écrit :

> One might as well complain that it took until 2016 for Lucene to get
> proper numeric index support.
> This is volunteer development, and Tomas has been the only person to
> find time to work on Points support.
>

I agree about the volunteer aspect. I wish this was the reason why points
are not integrated already but to me the problem is also the fact that old
features never go away, which makes new features harder to integrate than
they should.


> We have many users that depend on us,
> and we've already made it hard enough for people to move, and too many
> people are stuck back on v4.x.
>

I agree that there are times when we could have provided a better story
around backward compatibility with little effort. But it is a general issue
that users want innovation and backward compatibility, which are
conflicting requirements. If we decide to never remove the old features
then we will be stuck at some point. I am concerned we are slowly moving
towards that direction. We do not need to remove them now, but we should at
least schedule the removal of some of them for 8.0.

Re: Future of FieldCache in Solr

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Thu, Oct 27, 2016 at 6:21 PM, Yonik Seeley <ys...@gmail.com> wrote:

> I fee like I helped develop NumericField (although Uwe was the primary
> author)!

Sorry, you are correct: thank you for that!  I had indeed forgotten
that you helped
improve on Uwe's numeric fields, originally.

> all I remember is an honest technical opinion about if it should be baked
> into the index format

Yes, that is exactly what I am referring too.  Your comment stated
that we either
commit numerics in a buggy state (so users don't get back a NumericField when
they load their document at search time), or we don't even add a NumericField
at all (an even worse API for direct Lucene users).  Both options made Lucene's
numerics harder to use.

So of course we compromised, and the Uwe's numeric fields did go into core,
in the buggy state.

Fortunately we finally managed to fix that bug but iirc that took several
years.

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Future of FieldCache in Solr

Posted by Michael McCandless <lu...@mikemccandless.com>.

Well said Mark, that is exactly the design of the Apache model, and I
agree in general it's healthy: it means only conservative-ish changes
happen in a project.

Mike McCandless

http://blog.mikemccandless.com

On Thu, Oct 27, 2016 at 10:02 PM, Mark Miller <ma...@gmail.com> wrote:
> Apache is not designed to handle accusations of a history of behavior of
> poor opinions when driving code forward in any meaningful way.
>
> Instead we have technical discussions per issue and the power of the veto.
> The threat that we should just to work together rather than attacking one
> another.
>
> Some people may want to plow forward in any given area at any given time.
> And it's great when progress happens. But we have given dozens of people the
> power of veto, and that's pretty much the rules. If it acts as a brake
> sometimes, IMO, that is exactly the design. A lot of people here like to
> think they know what should happen despite opposing views. I think our
> system is designed with the understanding the truth is often in the middle.
>
> Discussion and veto power are not attached to activity either. If someone
> wants to participate on a JIRA issue, they are in the club, regardless of
> how they choose to develop.
>
> It's like a political system. Choose deadlock or consensus, and stop
> worrying about opposing conspiracy theories. True or not means little in how
> things are decided.
>
> I can nitpick on a lot of the choices and motivations of a lot of people
> here. But it would be useless for forward progress (detrimental even) and
> perpetuate what has been a huge culture decline in these projects.
>
> - Mark
>
> On Thu, Oct 27, 2016 at 6:22 PM Yonik Seeley <ys...@gmail.com> wrote:
>>
>> (splitting this off)
>>
>> > Your threat to veto the original addition of Uwe's NumericFields to
>> > Lucene's core stands out in my (long) memory as another.
>>
>> ??? I seriously question that long memory.  Or perhaps just the color
>> of the glasses you're viewing the world through.
>>
>> I fee like I helped develop NumericField (although Uwe was the primary
>> author)! IIRC, I wrote the first draft of the code that enabled
>> variable precision steps.
>>
>>
>> https://issues.apache.org/jira/browse/LUCENE-1470?focusedCommentId=12671495&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12671495
>>
>> http://markmail.org/message/vcwwxwciwf7ztrfg
>>
>> And this is the JIRA issue to actually move it to core... all I
>> remember is an honest technical opinion about if it should be baked
>> into the index format (and certainly no vetoes or even opinions
>> against it being in "core"):
>> https://issues.apache.org/jira/browse/LUCENE-1673
>>
>>
>> Luckily, I'm in good company... I'm not the only person to be accused
>> of nefariously obstructing Lucene and only participating in Lucene
>> issues to slow it down or make it harder to use.
>> If one looks hard enough for something, they will start seeing it.
>>
>> -Yonik
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
> --
> - Mark
> about.me/markrmiller

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org