You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Richard Hodsdon <ho...@gmail.com> on 2011/06/02 12:26:10 UTC

Sorting algorithm

Hi,

I want to do a similar sorting function query to the way reddit handles its
ranking.
I have the date stored in a 
<fieldType name="tdate" class="solr.TrieDateField" omitNorms="true"
precisionStep="6" positionIncrementGap="0"/>

I also have the number of twitter, facebook and reads from our site stored.
below is the pseudo code that I want to work out.

var t = (CreationDate - 1131428803) / 1000;
var x = FacebookCount + TwitterCount + VoteCount - DownVoteCount;
var y = 0;
if (x > 0) {
   y = 1;
} else if (x == 0) {
  y = 0;
} else if (x < 0) {
  y = -1;
}
var z = 1;
var absX = Math.abs(x);
if (absX >= 1) {
  z = absX;
}
var ranking = (Math.log(z) / Math.LN10) + ((y * t) / 45000);

I have no Java experience so I cannot re-write it as a custom function.
This is my current query I am trying to use.

http://127.0.0.1:8983/solr/select?q.alt=*:*&fq=content_type:news&start=0&rows=10&wt=json&indent=on&omitHeader=true
&fl=id,name,excerpt,timestamp,domain,source,facebook,twitter,read,imageheight
&defType=dismax
&tt=div(sub(_val_:timestamp,1131428803),1000)
&xx=sub(sum(facebook,twitter,read),0)
&yy=map(query($xx),1,99999999,1,map(query($xx),0,0,0,map(query($xx),-99999999,-1,-1,0)))
&zz=map(abs(query($xx)),-999999999,0,1)
&sort=sum(div(log(query($zz)),ln(10)),div(product(query($yy),query($tt)),45000))
desc

Currently I am getting errors relating to my date field when trying to
convert it from the TrieDate to timestamp with the _val_:MyDateField.

Also I wanted to know if their is another way to do this? If my query is
even correct.

Thanks in advance

Richard


--
View this message in context: http://lucene.472066.n3.nabble.com/Sorting-algorithm-tp3014549p3014549.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Sorting algorithm

Posted by Richard Hodsdon <ho...@gmail.com>.
Hi Tomás

Thanks, that makes a lot of sense, and your math is sound.

It is working well. An if() function would be great, and it seems its coming
soon.

Richard

--
View this message in context: http://lucene.472066.n3.nabble.com/Sorting-algorithm-tp3014549p3019077.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Sorting algorithm

Posted by Erick Erickson <er...@gmail.com>.
It hasn't been committed yet, but you may want to track this JIRA:

https://issues.apache.org/jira/browse/SOLR-2136

I happened to notice it over on the dev list, it's about adding if ()
to function queries.

Best
Erick

2011/6/2 Tomás Fernández Löbbe <to...@gmail.com>:
> OK, then (everything that's available at index time, I'll say it's
> constant):
> (Math.log(z) / Math.LN10) (not sure what you mean with Math.LN10) is
> constant, I'll call it c1
>
> ((y * t) / 45000) = (y/4500)*t --> y/4500 is constant, I'll call it c2.
>
> c1+(c2 * t) = c1 + (c2 * (CreationDate - now) / 1000) --> c2 / 1000 is also
> constant, I'll call it c3.
>
> Then, your ranking formula is: c1 + (c3 * (creationDate - now)).
>
> In solr, this will be: &sort=sum(c1,product(c3,ms(creationDate, NOW))).
>
> I haven't tried it but if my arithmetics are correct (I'm a little bit rusty
> with that), that should work and should be faster than doing the whole thing
> at query time. Of course, "c1" and "c3" must be indexed as fields.
>
> Regards,
>
> Tomás
> On Thu, Jun 2, 2011 at 9:46 AM, Richard Hodsdon
> <ho...@gmail.com>wrote:
>
>> Thanks for the response,
>>
>> You are correct, but my pseudo code was not.
>> this line
>> var t = (CreationDate - 1131428803) / 1000;
>> should be
>> var t = (CreationDate - now()) / 1000;
>>
>> This will cause the items ranking to depreciate over time.
>>
>> Richard
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Sorting-algorithm-tp3014549p3014961.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>

Re: Sorting algorithm

Posted by Tomás Fernández Löbbe <to...@gmail.com>.
OK, then (everything that's available at index time, I'll say it's
constant):
(Math.log(z) / Math.LN10) (not sure what you mean with Math.LN10) is
constant, I'll call it c1

((y * t) / 45000) = (y/4500)*t --> y/4500 is constant, I'll call it c2.

c1+(c2 * t) = c1 + (c2 * (CreationDate - now) / 1000) --> c2 / 1000 is also
constant, I'll call it c3.

Then, your ranking formula is: c1 + (c3 * (creationDate - now)).

In solr, this will be: &sort=sum(c1,product(c3,ms(creationDate, NOW))).

I haven't tried it but if my arithmetics are correct (I'm a little bit rusty
with that), that should work and should be faster than doing the whole thing
at query time. Of course, "c1" and "c3" must be indexed as fields.

Regards,

Tomás
On Thu, Jun 2, 2011 at 9:46 AM, Richard Hodsdon
<ho...@gmail.com>wrote:

> Thanks for the response,
>
> You are correct, but my pseudo code was not.
> this line
> var t = (CreationDate - 1131428803) / 1000;
> should be
> var t = (CreationDate - now()) / 1000;
>
> This will cause the items ranking to depreciate over time.
>
> Richard
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Sorting-algorithm-tp3014549p3014961.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Sorting algorithm

Posted by Richard Hodsdon <ho...@gmail.com>.
Thanks for the response,

You are correct, but my pseudo code was not.
this line
var t = (CreationDate - 1131428803) / 1000; 
should be 
var t = (CreationDate - now()) / 1000; 

This will cause the items ranking to depreciate over time.

Richard


--
View this message in context: http://lucene.472066.n3.nabble.com/Sorting-algorithm-tp3014549p3014961.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Sorting algorithm

Posted by Tomás Fernández Löbbe <to...@gmail.com>.
Hi Richard, all your data seem to be available at indexing time, am I
correct? Why don't you do the math at index time and just index the result
on a field, on which you can sort later at query time?


On Thu, Jun 2, 2011 at 7:26 AM, Richard Hodsdon
<ho...@gmail.com>wrote:

> Hi,
>
> I want to do a similar sorting function query to the way reddit handles its
> ranking.
> I have the date stored in a
> <fieldType name="tdate" class="solr.TrieDateField" omitNorms="true"
> precisionStep="6" positionIncrementGap="0"/>
>
> I also have the number of twitter, facebook and reads from our site stored.
> below is the pseudo code that I want to work out.
>
> var t = (CreationDate - 1131428803) / 1000;
> var x = FacebookCount + TwitterCount + VoteCount - DownVoteCount;
> var y = 0;
> if (x > 0) {
>   y = 1;
> } else if (x == 0) {
>  y = 0;
> } else if (x < 0) {
>  y = -1;
> }
> var z = 1;
> var absX = Math.abs(x);
> if (absX >= 1) {
>  z = absX;
> }
> var ranking = (Math.log(z) / Math.LN10) + ((y * t) / 45000);
>
> I have no Java experience so I cannot re-write it as a custom function.
> This is my current query I am trying to use.
>
>
> http://127.0.0.1:8983/solr/select?q.alt=*:*&fq=content_type:news&start=0&rows=10&wt=json&indent=on&omitHeader=true
>
> &fl=id,name,excerpt,timestamp,domain,source,facebook,twitter,read,imageheight
> &defType=dismax
> &tt=div(sub(_val_:timestamp,1131428803),1000)
> &xx=sub(sum(facebook,twitter,read),0)
>
> &yy=map(query($xx),1,99999999,1,map(query($xx),0,0,0,map(query($xx),-99999999,-1,-1,0)))
> &zz=map(abs(query($xx)),-999999999,0,1)
>
> &sort=sum(div(log(query($zz)),ln(10)),div(product(query($yy),query($tt)),45000))
> desc
>
> Currently I am getting errors relating to my date field when trying to
> convert it from the TrieDate to timestamp with the _val_:MyDateField.
>
> Also I wanted to know if their is another way to do this? If my query is
> even correct.
>
> Thanks in advance
>
> Richard
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Sorting-algorithm-tp3014549p3014549.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>