You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Britske <gb...@gmail.com> on 2007/11/05 13:00:56 UTC

where to hook in to SOLR to read field-label from functionquery

My question sounds strange I know, but I'll try to explain:

Say I have a custom functionquery MinFloatFunction which takes as its
arguments an array of valuesources. 

MinFloatFunction(ValueSource[] sources)

In my case all these valuesources are the values of a collection of fields.
What I need is to get the value and the fieldname of the lowest scoring
field provided in the above array. Obviously the result of the function is
the value of the lowest scoring fieldname but is there any way to get to the
fieldname of that lowest scoring field?? (with or without extending Solr a
little bit)

This not-so-standard need comes from the following:
My index consists of 'products' which can have a lot of variants (up to 5000
per product). Each of these variants can have their own price and a number
of characteristics (the latter which I dont need to filter /sort or search
by). Moreover, every search should only return any given product 0 or 1
time. (So 2 variants of the same product can never be returned in the same
search). 

For this I designed a schema in which each 'row' in the index represents a
product (indepdent of variants) (which takes care of the 1 variant max) and
every variant is represented as 2 fields in this row:

variant_p_*                 <-- represents price (stored / indexed)
variant_source_*          <-- represents the other fields dependent on the
variant (stored / multivalued)

Here for example variant_p_xyz and variant_source_xyz belong togehter. 

The specific usecase now is that sometimes a user would be satisfied in a
range of variants and wants the lowest price over all those variants.
to return for each product the variant with the smallest price alongside its
characteristics I need the name of the lowest scoring field (say,
variant_p_xyz) so that I can give back the contents of variant_source_xyz. 

sure, other routes would be possible and I'm open for suggestions, but at
least the following routes don't work: 

- give back all the fields to the client and let the client het the min over
the fields, etc. --> to much data over the line.
- store the minima of a certain range of variant_p_* values alongside the
cooresponding variant_source_* at INDEX-time, when I have all the
variant-fields ready in the client. --> the collections over which I need to
take the minima are not known a priori. 

Any help is highly appreciated! 

Cheers,
Geert-Jan
-- 
View this message in context: http://www.nabble.com/where-to-hook-in-to-SOLR-to-read-field-label-from-functionquery-tf4751109.html#a13585389
Sent from the Solr - User mailing list archive at Nabble.com.


Re: where to hook in to SOLR to read field-label from functionquery

Posted by Britske <gb...@gmail.com>.


hossman wrote:
> 
> 
> : Say I have a custom functionquery MinFloatFunction which takes as its
> : arguments an array of valuesources. 
> : 
> : MinFloatFunction(ValueSource[] sources)
> : 
> : In my case all these valuesources are the values of a collection of
> fields.
> 
> a ValueSource isn't required to be field specifc (it may already be the 
> mathematical combination of other multiple fields) so there is no generic 
> way to get the "field name" form a ValueSource ... but you could define 
> your MinFloatFunction only accept FieldCacheSource[] as input ... hmmm, 
> ecept that FieldCacheSource doesn't expose the field name.  so instead you 
> write...
> 
>   public class MyFieldCacheSource extends FieldCacheSource {
>     public MyFieldCacheSource(String field) {
>       super(field);
>     }
>     public String getField() {
>       return field;
>     }
>   }
>   public class MinFloatFunction ... {
>     public MinFloatFunction(MyFieldCacheSource[] values);
>   }
> 
Thanks for this. I'm goign to look into this a little further. 


hossman wrote:
> 
> 
> : For this I designed a schema in which each 'row' in the index represents
> a
> : product (indepdent of variants) (which takes care of the 1 variant max)
> and
> : every variant is represented as 2 fields in this row:
> : 
> : variant_p_*                 <-- represents price (stored / indexed)
> : variant_source_*          <-- represents the other fields dependent on
> the
> : variant (stored / multivalued)
> 
> Note: if you have a lot of varients you may wind up with the same problem 
> as described here...
> 
> http://www.nabble.com/sorting-on-dynamic-fields---good%2C-bad%2C-neither--tf4694098.html
> 
> ...because of the underlying FieldCache usage in FieldCacheValueSource
> 
> 
> -Hoss
> 
> 
> 

Hmmm. thanks for pointing me to that one ( i guess ;-) I totally
underestimated the memory-requirements of the underlying Lucene Field-cache
implementation. 
Having the option to sort on about 10.000 variantfields with about 400.000
docs will consume about 16 GB max. Definitly not doable in my situation. A
LRU-implementation of the lucene field-cache would help big time in this
situation to at least not get OOM-errors.  Perhaps , you know of any
existing implementations? 

Thanks a lot, 
Geert-Jan
-- 
View this message in context: http://www.nabble.com/where-to-hook-in-to-SOLR-to-read-field-label-from-functionquery-tf4751109.html#a13682698
Sent from the Solr - User mailing list archive at Nabble.com.


Re: where to hook in to SOLR to read field-label from functionquery

Posted by Chris Hostetter <ho...@fucit.org>.
: Say I have a custom functionquery MinFloatFunction which takes as its
: arguments an array of valuesources. 
: 
: MinFloatFunction(ValueSource[] sources)
: 
: In my case all these valuesources are the values of a collection of fields.

a ValueSource isn't required to be field specifc (it may already be the 
mathematical combination of other multiple fields) so there is no generic 
way to get the "field name" form a ValueSource ... but you could define 
your MinFloatFunction only accept FieldCacheSource[] as input ... hmmm, 
ecept that FieldCacheSource doesn't expose the field name.  so instead you 
write...

  public class MyFieldCacheSource extends FieldCacheSource {
    public MyFieldCacheSource(String field) {
      super(field);
    }
    public String getField() {
      return field;
    }
  }
  public class MinFloatFunction ... {
    public MinFloatFunction(MyFieldCacheSource[] values);
  }


: For this I designed a schema in which each 'row' in the index represents a
: product (indepdent of variants) (which takes care of the 1 variant max) and
: every variant is represented as 2 fields in this row:
: 
: variant_p_*                 <-- represents price (stored / indexed)
: variant_source_*          <-- represents the other fields dependent on the
: variant (stored / multivalued)

Note: if you have a lot of varients you may wind up with the same problem 
as described here...

http://www.nabble.com/sorting-on-dynamic-fields---good%2C-bad%2C-neither--tf4694098.html

...because of the underlying FieldCache usage in FieldCacheValueSource


-Hoss