You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by William Pierce <ev...@hotmail.com> on 2009/08/15 16:38:19 UTC

Advice on updating solr indexes

Folks:

In our app we index approx 50 M documents every so often.  One of the fields 
in each document is called "CompScore" which is a score that our back-end 
computes for each document.  The computation of this score is heavy-weight 
and is done only approximately once every few days.    When documents are 
retrieved during a search we return results sorted by the Solr score first 
and then the CompScore.

The issue we have this:  Every week or so when the back-end routines run to 
compute "CompScore"  we need to delete and insert these 50 M documents into 
the index.   This happens even though the a majority of the documents have 
not changed.

I think there is no way in Solr to simply update a field in the index.

If others have encountered a similar issue,  I'd be interested in hearing 
about their solutions!

Best,

- Bill 


Re: Advice on updating solr indexes

Posted by Lance Norskog <go...@gmail.com>.
There is a special-purpose feature that solves exactly this problem: it
assigns the score for a particular field from a file which contains every
known value of the field and a matching float.

Doing a quick scan of the code, this seems to be how it works: the
declaration of the field in schema.xml contains the fact that its score is
derived from an external file. It can only be done on fields defined 'float'
(not 'sfloat'). Each query may give a file name and the field name. This
file should be sorted by the values of the field. It will be loaded and
cached and a future query may give only the field name.
Again, this description may not be completely right. (The float/sfloat thing
might be wrong, for example.) The parameters for this feature are buried in
the Solr source. There is no mention of this feature in the wiki.

The files are:

http://svn.apache.org/repos/asf/lucene/solr/trunk/src/java/org/apache/solr/search/function/FileFloatSource.java
and

http://svn.apache.org/repos/asf/lucene/solr/trunk/src/java/org/apache/solr/schema/ExternalFileField.java

I have not tried any of this. Should you try this feature and get it
working, please  document it on the wiki :)  Also, if there are any bugs or
gotchas, please post a Jira issue.
 --
Lance Norskog
goksron@gmail.com

On Sat, Aug 15, 2009 at 7:38 AM, William Pierce <ev...@hotmail.com>wrote:

> Folks:
>
> In our app we index approx 50 M documents every so often.  One of the
> fields in each document is called "CompScore" which is a score that our
> back-end computes for each document.  The computation of this score is
> heavy-weight and is done only approximately once every few days.    When
> documents are retrieved during a search we return results sorted by the Solr
> score first and then the CompScore.
>
> The issue we have this:  Every week or so when the back-end routines run to
> compute "CompScore"  we need to delete and insert these 50 M documents into
> the index.   This happens even though the a majority of the documents have
> not changed.
>
> I think there is no way in Solr to simply update a field in the index.
>
> If others have encountered a similar issue,  I'd be interested in hearing
> about their solutions!
>
> Best,
>
> - Bill
>