You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by William Pierce <ev...@hotmail.com> on 2009/08/15 16:38:19 UTC
Advice on updating solr indexes
Folks:
In our app we index approx 50 M documents every so often. One of the fields
in each document is called "CompScore" which is a score that our back-end
computes for each document. The computation of this score is heavy-weight
and is done only approximately once every few days. When documents are
retrieved during a search we return results sorted by the Solr score first
and then the CompScore.
The issue we have this: Every week or so when the back-end routines run to
compute "CompScore" we need to delete and insert these 50 M documents into
the index. This happens even though the a majority of the documents have
not changed.
I think there is no way in Solr to simply update a field in the index.
If others have encountered a similar issue, I'd be interested in hearing
about their solutions!
Best,
- Bill
Re: Advice on updating solr indexes
Posted by Lance Norskog <go...@gmail.com>.
There is a special-purpose feature that solves exactly this problem: it
assigns the score for a particular field from a file which contains every
known value of the field and a matching float.
Doing a quick scan of the code, this seems to be how it works: the
declaration of the field in schema.xml contains the fact that its score is
derived from an external file. It can only be done on fields defined 'float'
(not 'sfloat'). Each query may give a file name and the field name. This
file should be sorted by the values of the field. It will be loaded and
cached and a future query may give only the field name.
Again, this description may not be completely right. (The float/sfloat thing
might be wrong, for example.) The parameters for this feature are buried in
the Solr source. There is no mention of this feature in the wiki.
The files are:
http://svn.apache.org/repos/asf/lucene/solr/trunk/src/java/org/apache/solr/search/function/FileFloatSource.java
and
http://svn.apache.org/repos/asf/lucene/solr/trunk/src/java/org/apache/solr/schema/ExternalFileField.java
I have not tried any of this. Should you try this feature and get it
working, please document it on the wiki :) Also, if there are any bugs or
gotchas, please post a Jira issue.
--
Lance Norskog
goksron@gmail.com
On Sat, Aug 15, 2009 at 7:38 AM, William Pierce <ev...@hotmail.com>wrote:
> Folks:
>
> In our app we index approx 50 M documents every so often. One of the
> fields in each document is called "CompScore" which is a score that our
> back-end computes for each document. The computation of this score is
> heavy-weight and is done only approximately once every few days. When
> documents are retrieved during a search we return results sorted by the Solr
> score first and then the CompScore.
>
> The issue we have this: Every week or so when the back-end routines run to
> compute "CompScore" we need to delete and insert these 50 M documents into
> the index. This happens even though the a majority of the documents have
> not changed.
>
> I think there is no way in Solr to simply update a field in the index.
>
> If others have encountered a similar issue, I'd be interested in hearing
> about their solutions!
>
> Best,
>
> - Bill
>