You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucenenet.apache.org by Nic Wise <Ni...@bbc.com> on 2008/01/15 18:45:35 UTC

Bubbling up "newer" records

Hi there

 

Long time user (and reader), first time poster J

 

Just wondering if anyone knows of a way to bubble up (ie, increase the
score on) items which are newer - either via putting a date field in the
document, or some kind of timestamp / tick count.

 

I have found some references to doing it in the Java version, but I
can't find many of the the classes (FieldScoreQuery, CustomScoreQuery)
in the .NET version - I assume they are new in the Java version. I
looked into replacing the Query, Weight, Scorer set, modelling it off
the Lucene.Net.Search.Spans stuff.... But nothing so far.

 

Am I even looking in the right place? Is it even possible?

 

I'm after something with a long, flat tail (so I guess I'm going to have
to write something custom regardless) - eg stuff which is 1 day old gets
a boost of 10, 1 week old is 5, 1 month is 2, over three months is 0 etc
(with a floating scale on those - think "the long tail" diagram
(http://en.wikipedia.org/wiki/The_Long_Tail), but have it hit 0 when it
flattens out.)

 

Anyone done this? 

 

Thanks heaps.

 

Nic Wise

Lead .NET Developer, TopGear.com redevelopment.

BBC Worldwide. 
This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
 
Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this

This e-mail has been sent by one of the following wholly-owned subsidiaries of the BBC:
 
BBC Worldwide, Registration Number: 1420028 England, Registered Address: Woodlands, 80 Wood Lane, London W12 0TT
BBC World, Registration Number: 04514407 England, Registered Address: Woodlands, 80 Wood Lane, London W12 0TT
BBC World Distribution Limited, Registration Number: 04514408, Registered Address: Woodlands, 80 Wood Lane, London W12 0TT

RE: Bubbling up "newer" records

Posted by Michael Garski <mg...@myspace.com>.
Nic,

You could also accomplish this in a hit collector - reading the value of a stored field and adjusting the score as necessary.  We take that approach here for a few searches.  If you inherit from TopDocCollector you can modify the score before collecting the hit and it will sort the results for you.

Michael

-----Original Message-----
From: Nic Wise [mailto:Nic.Wise@bbc.com] 
Sent: Wednesday, January 16, 2008 3:43 AM
To: lucene-net-user@incubator.apache.org
Subject: RE: Bubbling up "newer" records

Thanks! Do you have a URL or some sample code for how to write a custom
sort function? I am wanting it to influence the results (as a boost
does), not really sort by this one field.

Eg, if I start out with a relevance of 0.1 and 0.9, if the 0.1 one was
put in today, it may end up being 0.7, and if the 0.9 one was 3 months
old, it may be down-graded to 0.72 - so it's not just a pure sort....

I tried a quick google (and will continue once I have this new laptop
built up), but couldn't find much. Is there a snippit somewhere?

Thanks heaps!

-----Original Message-----
From: DIGY [mailto:digydigy@gmail.com] 
Sent: 15 January 2008 20:18
To: lucene-net-user@incubator.apache.org
Subject: RE: Bubbling up "newer" records

Hi Nic,

CustomScoreQuery or FieldScoreQuery is introduced with Lucene 2.2. 
If you don't want to wait for Lucene.Net 2.2, writing a custom sort
function
and sorting the returned results -for ex.,using the "timestamp" field
and
the Score() method of Hits class- may be a quick solution.

DIGY

-----Original Message-----
From: Nic Wise [mailto:Nic.Wise@bbc.com] 
Sent: Tuesday, January 15, 2008 7:46 PM
To: lucene-net-user@incubator.apache.org
Subject: Bubbling up "newer" records

Hi there

 

Long time user (and reader), first time poster J

 

Just wondering if anyone knows of a way to bubble up (ie, increase the
score on) items which are newer - either via putting a date field in the
document, or some kind of timestamp / tick count.

 

I have found some references to doing it in the Java version, but I
can't find many of the the classes (FieldScoreQuery, CustomScoreQuery)
in the .NET version - I assume they are new in the Java version. I
looked into replacing the Query, Weight, Scorer set, modelling it off
the Lucene.Net.Search.Spans stuff.... But nothing so far.

 

Am I even looking in the right place? Is it even possible?

 

I'm after something with a long, flat tail (so I guess I'm going to have
to write something custom regardless) - eg stuff which is 1 day old gets
a boost of 10, 1 week old is 5, 1 month is 2, over three months is 0 etc
(with a floating scale on those - think "the long tail" diagram
(http://en.wikipedia.org/wiki/The_Long_Tail), but have it hit 0 when it
flattens out.)

 

Anyone done this? 

 

Thanks heaps.

 

Nic Wise

Lead .NET Developer, TopGear.com redevelopment.

BBC Worldwide. 
This e-mail (and any attachments) is confidential and may contain
personal
views which are not the views of the BBC unless specifically stated. If
you
have received it in error, please delete it from your system. Do not
use,
copy or disclose the information in any way nor act in reliance on it
and
notify the sender immediately.
 
Please note that the BBC monitors e-mails sent or received. Further
communication will signify your consent to this

This e-mail has been sent by one of the following wholly-owned
subsidiaries
of the BBC:
 
BBC Worldwide, Registration Number: 1420028 England, Registered Address:
Woodlands, 80 Wood Lane, London W12 0TT
BBC World, Registration Number: 04514407 England, Registered Address:
Woodlands, 80 Wood Lane, London W12 0TT
BBC World Distribution Limited, Registration Number: 04514408,
Registered
Address: Woodlands, 80 Wood Lane, London W12 0TT 
This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
 
Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this

This e-mail has been sent by one of the following wholly-owned subsidiaries of the BBC:
 
BBC Worldwide, Registration Number: 1420028 England, Registered Address: Woodlands, 80 Wood Lane, London W12 0TT
BBC World, Registration Number: 04514407 England, Registered Address: Woodlands, 80 Wood Lane, London W12 0TT
BBC World Distribution Limited, Registration Number: 04514408, Registered Address: Woodlands, 80 Wood Lane, London W12 0TT

RE: Bubbling up "newer" records

Posted by Nic Wise <Ni...@bbc.com>.
Hi everyone

I did pretty much what you recommended, DIGY. Thanks - I didn't think of
loading it all into a collection and sorting it, too used to that being
expensive to do (which it is from a DB, but not from Lucene it appears!)

We used a linier (sp?) gradient from the boost. I'd prefer to use a
curve, but this works, and it's easy.

I've appended my code below, incase it's of use for someone else. Some
notes from it:

We deal with an IMetadataDocument, which is really just a wrapper around
the Lucene document's fields, with the score added.

There are a few things (eg CurrentTicks) which are inited twice. This is
because we call them from unit tests.

CalcScore can return the score * [2..0.5]

Hope it's of help to someone!


Thanks!

Nic


        /// <summary>
        /// Perform a search against the index.
        /// </summary>
        /// <param name="searchTerms"></param>
        /// <returns></returns>
        private IList<IMetadataDocument> PerformSearch(string
searchTerms)
        {
            IndexReader reader = GetIndexReader();

            Query query = queryParser.Parse(searchTerms);

            Hits hitsFound = searcher.Search(query);

            IList<IMetadataDocument> sortedDocuments =
BubbleUpMoreCurrentDocuments(hitsFound);

            return sortedDocuments;
        }


       private long CurrentTicks = DateTime.Now.Ticks;
       private long TicksSixMonthsAgo =
DateTime.Now.AddMonths(-6).Ticks;

        private IList<IMetadataDocument>
BubbleUpMoreCurrentDocuments(Hits hitsFound)
        {
            CurrentTicks = DateTime.Now.Ticks;
            TicksSixMonthsAgo = DateTime.Now.AddMonths(-6).Ticks;


            List<IMetadataDocument> docs = new
List<IMetadataDocument>();

            for (int i = 0; i < hitsFound.Length(); i++)
            {
                docs.Add(InterfaceFromLuceneDocument(hitsFound.Doc(i),
CalcScore(hitsFound.Doc(i).Get("datecreated"), hitsFound.Score(i))));
            }

            docs.Sort(CompareDocuments);

            return docs;
        }

        private static int CompareDocuments(IMetadataDocument x,
IMetadataDocument y)
        {

            // note negative values - we want it smalled to biggest

            if (x == null)
            {
                if (y == null)
                {
                    // If x is null and y is null, they're
                    // equal. 
                    return 0;
                }
                else
                {
                    // If x is null and y is not null, y
                    // is greater. 
                    return 1;
                }
            }
            else
            {
                // If x is not null...
                //
                if (y == null)
                // ...and y is null, x is greater.
                {
                    return -1;
                }
                else
                {

                    return -x.SearchScore.CompareTo(y.SearchScore);
                    
                }
            }
        }

        

        //Your custom Score Function
        public float CalcScore(string FieldValue, float OriginalScore)
        {
            long fieldTicks = Int64.Parse(FieldValue);

            float scoreModifier = 0;
            float MinScoreModifier = 0.5f;
            float MaxScoreModifier = 2;


            if (fieldTicks < TicksSixMonthsAgo)
            {
                scoreModifier = MinScoreModifier;
            } else if (fieldTicks > CurrentTicks)
            {
                scoreModifier = MaxScoreModifier;
            } else
            {
                long fieldTickOffset = CurrentTicks - fieldTicks;
                long tickRange = CurrentTicks - TicksSixMonthsAgo;

                // General formula: gradient * x + offset
                // gradient = - (range of score (which is 2 to 0.5, so
1.5) / range of ticks ( which is 0..SixMonths) 
                // x is the field tick offset - how far back our value
is from the current time (or 0 on the X axis)
                // offset is the maximum we can go

                scoreModifier = (-((
(MaxScoreModifier-MinScoreModifier)/tickRange)*fieldTickOffset)) +
MaxScoreModifier;


            }

            // if we had 0.5 as the original, things close to now return
close to 2
            // and thigns close (or more than) 6 months old return close
to 0.25
            return scoreModifier * OriginalScore;

        }




-----Original Message-----
From: DIGY [mailto:digydigy@gmail.com] 
Sent: 16 January 2008 17:47
To: lucene-net-user@incubator.apache.org
Subject: RE: Bubbling up "newer" records

Hi Nic,

What I meant was something like below

DIGY

<snip> 
This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
 
Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this

This e-mail has been sent by one of the following wholly-owned subsidiaries of the BBC:
 
BBC Worldwide, Registration Number: 1420028 England, Registered Address: Woodlands, 80 Wood Lane, London W12 0TT
BBC World, Registration Number: 04514407 England, Registered Address: Woodlands, 80 Wood Lane, London W12 0TT
BBC World Distribution Limited, Registration Number: 04514408, Registered Address: Woodlands, 80 Wood Lane, London W12 0TT

RE: Bubbling up "newer" records

Posted by DIGY <di...@gmail.com>.
Hi Nic,

What I meant was something like below

DIGY


	  public
System.Collections.Generic.List<Lucene.Net.Documents.Document>
Sort(Lucene.Net.Search.Hits Hits)
        {
            System.Collections.Generic.List<Lucene.Net.Documents.Document>
retList = new List<Lucene.Net.Documents.Document>();
            System.Collections.Generic.List<float> scores = new
List<float>();
            
            for (int i = 0; i < Hits.Length(); i++)
            {
                scores.Add(Hits.Score(i));
                retList.Add(Hits.Doc(i));
            }

            //BUBBLE SORT Q(n)=n*n
            //This is the one of the worst sorting algorithms you can ever
find. Replace it with an inteligent one.
            bool anotherPassNeeded = true;
            while (anotherPassNeeded)
            {
                anotherPassNeeded = false;
                for (int i = 0; i < retList.Count-1; i++)
                {
                    if(
 
CalcScore(retList[i].GetField("field2").StringValue() , scores[i] ) <
 
CalcScore(retList[i+1].GetField("field2").StringValue() , scores[i+1]) )
                    {
                        anotherPassNeeded = true;

                        float fTemp = scores[i];
                        scores[i] = scores[i + 1];
                        scores[i + 1] = fTemp;

                        Lucene.Net.Documents.Document doc = retList[i];
                        retList[i] = retList[i + 1];
                        retList[i + 1] = doc;
                    }
                }
            }
            return retList;
        }


	  //Your custom Score Function
        float CalcScore(string FieldValue, float OriginalScore)
        {
            char lastChar = FieldValue[FieldValue.Length - 1];
            return OriginalScore * lastChar;
        }

-----Original Message-----
From: Nic Wise [mailto:Nic.Wise@bbc.com] 
Sent: Wednesday, January 16, 2008 1:43 PM
To: lucene-net-user@incubator.apache.org
Subject: RE: Bubbling up "newer" records

Thanks! Do you have a URL or some sample code for how to write a custom
sort function? I am wanting it to influence the results (as a boost
does), not really sort by this one field.

Eg, if I start out with a relevance of 0.1 and 0.9, if the 0.1 one was
put in today, it may end up being 0.7, and if the 0.9 one was 3 months
old, it may be down-graded to 0.72 - so it's not just a pure sort....

I tried a quick google (and will continue once I have this new laptop
built up), but couldn't find much. Is there a snippit somewhere?

Thanks heaps!

-----Original Message-----
From: DIGY [mailto:digydigy@gmail.com] 
Sent: 15 January 2008 20:18
To: lucene-net-user@incubator.apache.org
Subject: RE: Bubbling up "newer" records

Hi Nic,

CustomScoreQuery or FieldScoreQuery is introduced with Lucene 2.2. 
If you don't want to wait for Lucene.Net 2.2, writing a custom sort
function
and sorting the returned results -for ex.,using the "timestamp" field
and
the Score() method of Hits class- may be a quick solution.

DIGY

-----Original Message-----
From: Nic Wise [mailto:Nic.Wise@bbc.com] 
Sent: Tuesday, January 15, 2008 7:46 PM
To: lucene-net-user@incubator.apache.org
Subject: Bubbling up "newer" records

Hi there

 

Long time user (and reader), first time poster J

 

Just wondering if anyone knows of a way to bubble up (ie, increase the
score on) items which are newer - either via putting a date field in the
document, or some kind of timestamp / tick count.

 

I have found some references to doing it in the Java version, but I
can't find many of the the classes (FieldScoreQuery, CustomScoreQuery)
in the .NET version - I assume they are new in the Java version. I
looked into replacing the Query, Weight, Scorer set, modelling it off
the Lucene.Net.Search.Spans stuff.... But nothing so far.

 

Am I even looking in the right place? Is it even possible?

 

I'm after something with a long, flat tail (so I guess I'm going to have
to write something custom regardless) - eg stuff which is 1 day old gets
a boost of 10, 1 week old is 5, 1 month is 2, over three months is 0 etc
(with a floating scale on those - think "the long tail" diagram
(http://en.wikipedia.org/wiki/The_Long_Tail), but have it hit 0 when it
flattens out.)

 

Anyone done this? 

 

Thanks heaps.

 

Nic Wise

Lead .NET Developer, TopGear.com redevelopment.

BBC Worldwide. 
This e-mail (and any attachments) is confidential and may contain
personal
views which are not the views of the BBC unless specifically stated. If
you
have received it in error, please delete it from your system. Do not
use,
copy or disclose the information in any way nor act in reliance on it
and
notify the sender immediately.
 
Please note that the BBC monitors e-mails sent or received. Further
communication will signify your consent to this

This e-mail has been sent by one of the following wholly-owned
subsidiaries
of the BBC:
 
BBC Worldwide, Registration Number: 1420028 England, Registered Address:
Woodlands, 80 Wood Lane, London W12 0TT
BBC World, Registration Number: 04514407 England, Registered Address:
Woodlands, 80 Wood Lane, London W12 0TT
BBC World Distribution Limited, Registration Number: 04514408,
Registered
Address: Woodlands, 80 Wood Lane, London W12 0TT 
This e-mail (and any attachments) is confidential and may contain personal
views which are not the views of the BBC unless specifically stated. If you
have received it in error, please delete it from your system. Do not use,
copy or disclose the information in any way nor act in reliance on it and
notify the sender immediately.
 
Please note that the BBC monitors e-mails sent or received. Further
communication will signify your consent to this

This e-mail has been sent by one of the following wholly-owned subsidiaries
of the BBC:
 
BBC Worldwide, Registration Number: 1420028 England, Registered Address:
Woodlands, 80 Wood Lane, London W12 0TT
BBC World, Registration Number: 04514407 England, Registered Address:
Woodlands, 80 Wood Lane, London W12 0TT
BBC World Distribution Limited, Registration Number: 04514408, Registered
Address: Woodlands, 80 Wood Lane, London W12 0TT


RE: Bubbling up "newer" records

Posted by Nic Wise <Ni...@bbc.com>.
Thanks! Do you have a URL or some sample code for how to write a custom
sort function? I am wanting it to influence the results (as a boost
does), not really sort by this one field.

Eg, if I start out with a relevance of 0.1 and 0.9, if the 0.1 one was
put in today, it may end up being 0.7, and if the 0.9 one was 3 months
old, it may be down-graded to 0.72 - so it's not just a pure sort....

I tried a quick google (and will continue once I have this new laptop
built up), but couldn't find much. Is there a snippit somewhere?

Thanks heaps!

-----Original Message-----
From: DIGY [mailto:digydigy@gmail.com] 
Sent: 15 January 2008 20:18
To: lucene-net-user@incubator.apache.org
Subject: RE: Bubbling up "newer" records

Hi Nic,

CustomScoreQuery or FieldScoreQuery is introduced with Lucene 2.2. 
If you don't want to wait for Lucene.Net 2.2, writing a custom sort
function
and sorting the returned results -for ex.,using the "timestamp" field
and
the Score() method of Hits class- may be a quick solution.

DIGY

-----Original Message-----
From: Nic Wise [mailto:Nic.Wise@bbc.com] 
Sent: Tuesday, January 15, 2008 7:46 PM
To: lucene-net-user@incubator.apache.org
Subject: Bubbling up "newer" records

Hi there

 

Long time user (and reader), first time poster J

 

Just wondering if anyone knows of a way to bubble up (ie, increase the
score on) items which are newer - either via putting a date field in the
document, or some kind of timestamp / tick count.

 

I have found some references to doing it in the Java version, but I
can't find many of the the classes (FieldScoreQuery, CustomScoreQuery)
in the .NET version - I assume they are new in the Java version. I
looked into replacing the Query, Weight, Scorer set, modelling it off
the Lucene.Net.Search.Spans stuff.... But nothing so far.

 

Am I even looking in the right place? Is it even possible?

 

I'm after something with a long, flat tail (so I guess I'm going to have
to write something custom regardless) - eg stuff which is 1 day old gets
a boost of 10, 1 week old is 5, 1 month is 2, over three months is 0 etc
(with a floating scale on those - think "the long tail" diagram
(http://en.wikipedia.org/wiki/The_Long_Tail), but have it hit 0 when it
flattens out.)

 

Anyone done this? 

 

Thanks heaps.

 

Nic Wise

Lead .NET Developer, TopGear.com redevelopment.

BBC Worldwide. 
This e-mail (and any attachments) is confidential and may contain
personal
views which are not the views of the BBC unless specifically stated. If
you
have received it in error, please delete it from your system. Do not
use,
copy or disclose the information in any way nor act in reliance on it
and
notify the sender immediately.
 
Please note that the BBC monitors e-mails sent or received. Further
communication will signify your consent to this

This e-mail has been sent by one of the following wholly-owned
subsidiaries
of the BBC:
 
BBC Worldwide, Registration Number: 1420028 England, Registered Address:
Woodlands, 80 Wood Lane, London W12 0TT
BBC World, Registration Number: 04514407 England, Registered Address:
Woodlands, 80 Wood Lane, London W12 0TT
BBC World Distribution Limited, Registration Number: 04514408,
Registered
Address: Woodlands, 80 Wood Lane, London W12 0TT 
This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
 
Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this

This e-mail has been sent by one of the following wholly-owned subsidiaries of the BBC:
 
BBC Worldwide, Registration Number: 1420028 England, Registered Address: Woodlands, 80 Wood Lane, London W12 0TT
BBC World, Registration Number: 04514407 England, Registered Address: Woodlands, 80 Wood Lane, London W12 0TT
BBC World Distribution Limited, Registration Number: 04514408, Registered Address: Woodlands, 80 Wood Lane, London W12 0TT

RE: Bubbling up "newer" records

Posted by DIGY <di...@gmail.com>.
Hi Nic,

CustomScoreQuery or FieldScoreQuery is introduced with Lucene 2.2. 
If you don't want to wait for Lucene.Net 2.2, writing a custom sort function
and sorting the returned results -for ex.,using the "timestamp" field and
the Score() method of Hits class- may be a quick solution.

DIGY

-----Original Message-----
From: Nic Wise [mailto:Nic.Wise@bbc.com] 
Sent: Tuesday, January 15, 2008 7:46 PM
To: lucene-net-user@incubator.apache.org
Subject: Bubbling up "newer" records

Hi there

 

Long time user (and reader), first time poster J

 

Just wondering if anyone knows of a way to bubble up (ie, increase the
score on) items which are newer - either via putting a date field in the
document, or some kind of timestamp / tick count.

 

I have found some references to doing it in the Java version, but I
can't find many of the the classes (FieldScoreQuery, CustomScoreQuery)
in the .NET version - I assume they are new in the Java version. I
looked into replacing the Query, Weight, Scorer set, modelling it off
the Lucene.Net.Search.Spans stuff.... But nothing so far.

 

Am I even looking in the right place? Is it even possible?

 

I'm after something with a long, flat tail (so I guess I'm going to have
to write something custom regardless) - eg stuff which is 1 day old gets
a boost of 10, 1 week old is 5, 1 month is 2, over three months is 0 etc
(with a floating scale on those - think "the long tail" diagram
(http://en.wikipedia.org/wiki/The_Long_Tail), but have it hit 0 when it
flattens out.)

 

Anyone done this? 

 

Thanks heaps.

 

Nic Wise

Lead .NET Developer, TopGear.com redevelopment.

BBC Worldwide. 
This e-mail (and any attachments) is confidential and may contain personal
views which are not the views of the BBC unless specifically stated. If you
have received it in error, please delete it from your system. Do not use,
copy or disclose the information in any way nor act in reliance on it and
notify the sender immediately.
 
Please note that the BBC monitors e-mails sent or received. Further
communication will signify your consent to this

This e-mail has been sent by one of the following wholly-owned subsidiaries
of the BBC:
 
BBC Worldwide, Registration Number: 1420028 England, Registered Address:
Woodlands, 80 Wood Lane, London W12 0TT
BBC World, Registration Number: 04514407 England, Registered Address:
Woodlands, 80 Wood Lane, London W12 0TT
BBC World Distribution Limited, Registration Number: 04514408, Registered
Address: Woodlands, 80 Wood Lane, London W12 0TT