You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Gintautas Sulskus <gi...@gmail.com> on 2018/01/30 14:05:19 UTC

Computing record score depending on its association with other records

Hi,

I have two collections. The first collection 'items' stores associations
between items and their features. The second collection 'features' stores
importance score for each feature.

   items: item_id    - one-to-many - feature_id
features: feature_id - one-to-one  - importance_score_int

The following describes a simplified scenario of what I would like to
achieve using Solr (6.5) queries and/or Streaming Expressions.

I would like to select the first two items from the 'items' collection
and rank them by their features' importance score.

Suppose we have two items i1 and i2. The first item has two features f1 and
f2 and the second item i2 has only one feature f1:
i1, f1
i1, f2
i2, f1

The score is computed by a function f(...) that simply returns the average
of feature importance scores. Provided the scores are as stated below, i2
would be ranked first with a score of 2/2=1 and i2 would come second with
the score of (2 - (-1))/2=0.5:
f1 - 2
f2 - (-1)

The natural flow would be to gather features for each item, compute the
average of their scores and then associate that average with a
corresponding item id.

Any pointers are very much welcome!

Thanks,
Gintas

Re: Computing record score depending on its association with other records

Posted by Gintautas Sulskus <gi...@gmail.com>.

Yes, that is correct. Collection 'features' stores mapping between features
and their scores.
For simplicity, I tried to keep the level of detail about these collections
to a minimum.

Both collections contain thousands of records and are updated by (lily)
hbase-indexer. Therefore storing scores/weights in the model resource is
not feasible.

Ideally, I would like to keep these data collection separate and perform
cross-collection queries. If such an approach is not feasible, then I could
possibly merge the two collections into one.
This would make matters simpler but not ideal.

Gintas

On Tue, Jan 30, 2018 at 5:49 PM, Alessandro Benedetti <a....@sease.io>
wrote:

> Accordingly to what I understood the feature weight is present in your
> second
> collection.
> You should express the feature weight in the model resource ( not even in
> the original collection)
> Is actually necessary for the feature weight to be in a separate Solr
> collection ?
>
>
>
> -----
> ---------------
> Alessandro Benedetti
> Search Consultant, R&D Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>

Re: Computing record score depending on its association with other records

Posted by Alessandro Benedetti <a....@sease.io>.

Accordingly to what I understood the feature weight is present in your second
collection.
You should express the feature weight in the model resource ( not even in
the original collection)
Is actually necessary for the feature weight to be in a separate Solr
collection ?



-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Computing record score depending on its association with other records

Posted by Gintautas Sulskus <gi...@gmail.com>.

Thanks, Alessandro, for your reply.

Indeed, LTR looks like what I need.

However, all of the LRT examples that I have found use a single collection
as a data source.
My data spans across two collections. Does LTR support this somehow or
should I 'denormalise' the data and merge both collections?
My concern is that the denormalisation will lead to a significant increase
in size on the drive.

Best,
Gintas


On Tue, Jan 30, 2018 at 2:30 PM, Alessandro Benedetti <a....@sease.io>
wrote:

> Hi Ginsul,
> let's try to wrap it up :
>
> 1) you have an item win N binary features ( given the fact that you
> represent the document with a list of feature Ids ( and no values) I would
> assume that it means that when the feature is in the list, it has a value
> of
> 1 for the item
>
> 2) you want to score (or maybe re-rank ? ) your documents giving the score
> you defined
>
> You could solve this problem with a number of possible customizations.
> Starting from an easy one, you could try to use the LTR re-ranker[1] .
>
> Specifically you can define your set of feature( and that should be
> possible
> using the component out of the box) and then a linear model ( and you
> already have the weights for the features so you don't need to train it).
>
> This can be close to what you want but you may want to customize a bit (
> given the fact that you may want to average the weight).
> For example you could define an extension of the linear model that does the
> average of the score ect ect...
>
>
> [1] https://lucene.apache.org/solr/guide/6_6/learning-to-rank.html
>
>
>
>
> -----
> ---------------
> Alessandro Benedetti
> Search Consultant, R&D Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>

Re: Computing record score depending on its association with other records

Posted by Alessandro Benedetti <a....@sease.io>.

Hi Ginsul,
let's try to wrap it up :

1) you have an item win N binary features ( given the fact that you
represent the document with a list of feature Ids ( and no values) I would
assume that it means that when the feature is in the list, it has a value of
1 for the item

2) you want to score (or maybe re-rank ? ) your documents giving the score
you defined

You could solve this problem with a number of possible customizations.
Starting from an easy one, you could try to use the LTR re-ranker[1] .

Specifically you can define your set of feature( and that should be possible
using the component out of the box) and then a linear model ( and you
already have the weights for the features so you don't need to train it).

This can be close to what you want but you may want to customize a bit (
given the fact that you may want to average the weight).
For example you could define an extension of the linear model that does the
average of the score ect ect...


[1] https://lucene.apache.org/solr/guide/6_6/learning-to-rank.html




-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html