You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Robert Muir (JIRA)" <ji...@apache.org> on 2017/11/01 23:56:00 UTC
[jira] [Created] (LUCENE-8031) DOCS_ONLY fields set incorrect
length norms
Robert Muir created LUCENE-8031:
-----------------------------------
Summary: DOCS_ONLY fields set incorrect length norms
Key: LUCENE-8031
URL: https://issues.apache.org/jira/browse/LUCENE-8031
Project: Lucene - Core
Issue Type: Bug
Reporter: Robert Muir
Priority: Major
Term frequencies are discarded in the DOCS_ONLY case from the postings list but they still count against the length normalization, which looks like it may screw stuff up.
I ran some quick experiments on LUCENE-8025, by encoding fieldInvertState.getUniqueTermCount() and it seemed worth fixing (e.g. 20% or 30% improvement potentially). Happy to do testing for real, if we want to fix.
But this seems tricky, today you can downgrade to DOCS_ONLY on the fly, and its hard for me to think about that case (i think its generally screwed up besides this, but still).
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org