You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jim Murphy <ji...@pobox.com> on 2008/07/30 16:03:58 UTC

Question about ValueSource and large datasets

I'm looking to incorporate an external calculation in Solr/Lucene search
results.  I'd like to write queries that filter and sort on the value of
this "virtual field".  The value of the field is actually calculated at
runtime based on a remote call to an external system.  My Solr queries will
include termqueries to match keywords - nothing special, but I'd like to
filter and order results based on the virtual field as well.  

I started looking at a custom Field Type + ValueSource.  I add a field of
this "virtual field type" to the schema, and have the custom ValueSource
wired in to the field type.  I used the FileFloatSource example as
inspiration - seems ok - but 2 questions:

1. How do I query for my virtual field?  My ValueSource never seems to be
activated not matter what I query for.  Here is the relevant parts of my
schema - see any issues?  Any hints on what the query string should be?

<fieldType name="dynamic_value" class="com.aiderss.schema.DynamicValueField"
keyField="link" defVal="1.0" stored="false" indexed="false"
valType="float"/>
...
<field name="calculatedValue" type="dynamic_value" indexed="true"
stored="false" />


2.  How can I limit the number of external calls I need to make.  If I use
FunctionQuery syntax then my ValueSource is used.  But, a BIG but, I notice
that it is queried for field values for every document in the index.  My
index is 100 million documents but typical result size is on the order of
tens.  I'd like to perform the external call on those tens not on the entire
index every time.

        ValueSource DocValues getValues(IndexReader reader) throws
IOException
        {
            final float[] arr = getCachedFloats(reader);
            return new DocValues()
            {
                public float floatVal(int doc) { ...called 100 million
times... }
                ...
            
I like this approach a lot but I'm getting the feeling that I want to hook
later in the query process - after the initial query (matching kleywords) is
done and the document set is reduced from 100 million to tens.  

Do I really want a filter query of some kind?  Or some other layer of
filtering? 

Thanks in advance,

Jim

-- 
View this message in context: http://www.nabble.com/Question-about-ValueSource-and-large-datasets-tp18733993p18733993.html
Sent from the Solr - User mailing list archive at Nabble.com.