You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Mike (JIRA)" <ji...@apache.org> on 2016/10/10 16:46:20 UTC
[jira] [Created] (LUCENE-7488) Consider tracking modification time
of external file fields for faster reloading
Mike created LUCENE-7488:
----------------------------
Summary: Consider tracking modification time of external file fields for faster reloading
Key: LUCENE-7488
URL: https://issues.apache.org/jira/browse/LUCENE-7488
Project: Lucene - Core
Issue Type: Improvement
Components: core/index
Affects Versions: 4.10.4
Environment: Linux
Reporter: Mike
I have an index of about 4M legal documents that has pagerank boosting configured as an external file field. The external file is about 100MB in size and has one row per document in the index. Each row indicates the pagerank score of a document. When we open new searchers, this file has to get reloaded, and it creates a noticeable delay for our users -- takes several seconds to reload.
An idea to fix this came up in [a recent discussion|https://www.mail-archive.com/solr-user@lucene.apache.org/msg125521.html]: Could the file only be reloaded if it has changed on disk? In other words, when new searchers are opened, could they check the modtime of the file, and avoid reloading it if the file hasn't changed?
In our configuration, this would be a big improvement. We only change the pagerank file once/week because computing it is intensive and new documents don't tend to have a big impact. At the same time, because we're regularly adding new documents, we do hundreds of commits per day, all of which have a delay as the (largish) external file field is reloaded.
Is this a reasonable improvement to request?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org