You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Sujit Pal <su...@comcast.net> on 2011/05/05 22:03:06 UTC

Custom sorting based on external (database) data

Hi,

Sorry for the possible double post, I wrote this up but had the
incorrect sender address, so I am guessing that my previous one is going
to be rejected by the list moderation daemon.

I am trying to figure out options for the following problem. I am on
Solr 1.4.1 (Lucene 2.9.1).

I have search results which are going to be ranked by the user (using a
thumbs up/down) and would translate to a score between -1 and +1. 

This data is stored in a database table (
unique_id
thumbs_up
thumbs_down
num_calls

as the thumbs up/down component is clicked.

We want to be able to sort the results by the following score =
(thumbs_up - thumbs_down) / (num_calls). The unique_id field refers to
the one referenced as <uniqueId> in the schema.xml.

Based on the following conversation:
http://www.mail-archive.com/solr-user@lucene.apache.org/msg06322.html 

...my understanding is that I need to:

1) subclass FieldType to create my own RankFieldType. 
2) In this class I override the getSortField() method to return my
custom FieldSortComparatorSource object.
3) Build the custom FieldSortComparatorSource object which returns a
custom FieldSortComparator object in newComparator().
4) Configure the field type of class RankFieldType (rank_t), and a field
(called rank) of field type rank_t in schema.xml of type RankFieldType.
5) use sort=rank+desc to do the sort.

My question is: is there a simpler/more performant way? The number of
database lookups seems like its going to be pretty high with this
approach. And its hard to believe that my problem is new, so I am
guessing this is either part of some Solr configuration I am missing, or
there is some other (possibly simpler) approach I am overlooking.

Pointers to documentation or code (or even keywords I could google)
would be much appreciated.

TIA for all your help,

Sujit



Re: Total Documents Failed : How to find out why

Posted by Erick Erickson <er...@gmail.com>.
OK, then your log is probably just coming out to the console. you can
start it as "java -jar start.jar > file.log 2>&1" and keep a permanent
record of the log if you're on windows....

Best
Erick

On Mon, May 9, 2011 at 7:32 AM, Rohit <ro...@in-rev.com> wrote:
> Hi Erick,
>
> Thats exactly how I am starting solr.
>
> Regards,
> Rohit
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: 09 May 2011 16:57
> To: solr-user@lucene.apache.org
> Subject: Re: Total Documents Failed : How to find out why
>
> First you need to find your logs. That folder should not
> be empty regardless of whether DIH is working correctly
> or not.
>
> I'm assuming here that you're just doing the "java -jar star.jar"
> in the example directory, if this isn't the case how are you
> starting Solr/Jetty?
>
> Best
> Erick
>
> On Mon, May 9, 2011 at 3:26 AM, Rohit <ro...@in-rev.com> wrote:
>> Hi,
>>
>> I am running the solr index and post indexing I get these results, how can
> I
>> know which documents failed and why?
>>
>> <str name="Total Requests made to DataSource">1</str>
>> <str name="Total Rows Fetched">5170850</str>
>> <str name="Total Documents Skipped">0</str>
>> <str name="Full Dump Started">2011-05-08 23:40:09</str>
>> <str name="">Indexing completed. Added/Updated: 2972300 documents. Deleted
> 0
>> documents.</str>
>> <str name="Committed">2011-05-09 00:13:48</str>
>> <str name="Optimized">2011-05-09 00:13:48</str>
>> <str name="Total Documents Processed">2972300</str>
>> <str name="Total Documents Failed">2198550</str>
>> <str name="Time taken ">0:33:40.945</str>
>>
>> Running solr on jetty right now and the console shows no error, also "
>> \Solr\example\logs " folder is empty.
>>
>> Thanks,
>> Rohit
>>
>>
>>
>>
>
>

RE: Total Documents Failed : How to find out why

Posted by Rohit <ro...@in-rev.com>.
Hi Erick,

Thats exactly how I am starting solr.

Regards,
Rohit

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: 09 May 2011 16:57
To: solr-user@lucene.apache.org
Subject: Re: Total Documents Failed : How to find out why

First you need to find your logs. That folder should not
be empty regardless of whether DIH is working correctly
or not.

I'm assuming here that you're just doing the "java -jar star.jar"
in the example directory, if this isn't the case how are you
starting Solr/Jetty?

Best
Erick

On Mon, May 9, 2011 at 3:26 AM, Rohit <ro...@in-rev.com> wrote:
> Hi,
>
> I am running the solr index and post indexing I get these results, how can
I
> know which documents failed and why?
>
> <str name="Total Requests made to DataSource">1</str>
> <str name="Total Rows Fetched">5170850</str>
> <str name="Total Documents Skipped">0</str>
> <str name="Full Dump Started">2011-05-08 23:40:09</str>
> <str name="">Indexing completed. Added/Updated: 2972300 documents. Deleted
0
> documents.</str>
> <str name="Committed">2011-05-09 00:13:48</str>
> <str name="Optimized">2011-05-09 00:13:48</str>
> <str name="Total Documents Processed">2972300</str>
> <str name="Total Documents Failed">2198550</str>
> <str name="Time taken ">0:33:40.945</str>
>
> Running solr on jetty right now and the console shows no error, also "
> \Solr\example\logs " folder is empty.
>
> Thanks,
> Rohit
>
>
>
>


Re: Total Documents Failed : How to find out why

Posted by Erick Erickson <er...@gmail.com>.
First you need to find your logs. That folder should not
be empty regardless of whether DIH is working correctly
or not.

I'm assuming here that you're just doing the "java -jar star.jar"
in the example directory, if this isn't the case how are you
starting Solr/Jetty?

Best
Erick

On Mon, May 9, 2011 at 3:26 AM, Rohit <ro...@in-rev.com> wrote:
> Hi,
>
> I am running the solr index and post indexing I get these results, how can I
> know which documents failed and why?
>
> <str name="Total Requests made to DataSource">1</str>
> <str name="Total Rows Fetched">5170850</str>
> <str name="Total Documents Skipped">0</str>
> <str name="Full Dump Started">2011-05-08 23:40:09</str>
> <str name="">Indexing completed. Added/Updated: 2972300 documents. Deleted 0
> documents.</str>
> <str name="Committed">2011-05-09 00:13:48</str>
> <str name="Optimized">2011-05-09 00:13:48</str>
> <str name="Total Documents Processed">2972300</str>
> <str name="Total Documents Failed">2198550</str>
> <str name="Time taken ">0:33:40.945</str>
>
> Running solr on jetty right now and the console shows no error, also "
> \Solr\example\logs " folder is empty.
>
> Thanks,
> Rohit
>
>
>
>

Re: Total Documents Failed : How to find out why

Posted by Chris Hostetter <ho...@fucit.org>.
: Subject: Total Documents Failed : How to find out why
: References: <96...@web121717.mail.ne1.yahoo.com>
:  <13...@lysdexic.healthline.com>
: In-Reply-To: <13...@lysdexic.healthline.com>


http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is "hidden" in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.


-Hoss

Total Documents Failed : How to find out why

Posted by Rohit <ro...@in-rev.com>.
Hi,

I am running the solr index and post indexing I get these results, how can I
know which documents failed and why?

<str name="Total Requests made to DataSource">1</str>
<str name="Total Rows Fetched">5170850</str>
<str name="Total Documents Skipped">0</str>
<str name="Full Dump Started">2011-05-08 23:40:09</str>
<str name="">Indexing completed. Added/Updated: 2972300 documents. Deleted 0
documents.</str>
<str name="Committed">2011-05-09 00:13:48</str>
<str name="Optimized">2011-05-09 00:13:48</str>
<str name="Total Documents Processed">2972300</str>
<str name="Total Documents Failed">2198550</str>
<str name="Time taken ">0:33:40.945</str>

Running solr on jetty right now and the console shows no error, also "
\Solr\example\logs " folder is empty.

Thanks,
Rohit




Re: Custom sorting based on external (database) data

Posted by Sujit Pal <su...@comcast.net>.
Thank you Ahmet, looks like we could use this. Basically we would do
periodic dumps of the (unique_id|computed_score) sorted by score and
write it out to this file followed by a commit.

Found some more info here, for the benefit of others looking for
something similar:
http://dev.tailsweep.com/solr-external-scoring/ 

On Thu, 2011-05-05 at 13:12 -0700, Ahmet Arslan wrote:
> 
> --- On Thu, 5/5/11, Sujit Pal <su...@comcast.net> wrote:
> 
> > From: Sujit Pal <su...@comcast.net>
> > Subject: Custom sorting based on external (database) data
> > To: "solr-user" <so...@lucene.apache.org>
> > Date: Thursday, May 5, 2011, 11:03 PM
> > Hi,
> > 
> > Sorry for the possible double post, I wrote this up but had
> > the
> > incorrect sender address, so I am guessing that my previous
> > one is going
> > to be rejected by the list moderation daemon.
> > 
> > I am trying to figure out options for the following
> > problem. I am on
> > Solr 1.4.1 (Lucene 2.9.1).
> > 
> > I have search results which are going to be ranked by the
> > user (using a
> > thumbs up/down) and would translate to a score between -1
> > and +1. 
> > 
> > This data is stored in a database table (
> > unique_id
> > thumbs_up
> > thumbs_down
> > num_calls
> > 
> > as the thumbs up/down component is clicked.
> > 
> > We want to be able to sort the results by the following
> > score =
> > (thumbs_up - thumbs_down) / (num_calls). The unique_id
> > field refers to
> > the one referenced as <uniqueId> in the schema.xml.
> > 
> > Based on the following conversation:
> > http://www.mail-archive.com/solr-user@lucene.apache.org/msg06322.html
> > 
> > 
> > ...my understanding is that I need to:
> > 
> > 1) subclass FieldType to create my own RankFieldType. 
> > 2) In this class I override the getSortField() method to
> > return my
> > custom FieldSortComparatorSource object.
> > 3) Build the custom FieldSortComparatorSource object which
> > returns a
> > custom FieldSortComparator object in newComparator().
> > 4) Configure the field type of class RankFieldType
> > (rank_t), and a field
> > (called rank) of field type rank_t in schema.xml of type
> > RankFieldType.
> > 5) use sort=rank+desc to do the sort.
> > 
> > My question is: is there a simpler/more performant way? The
> > number of
> > database lookups seems like its going to be pretty high
> > with this
> > approach. And its hard to believe that my problem is new,
> > so I am
> > guessing this is either part of some Solr configuration I
> > am missing, or
> > there is some other (possibly simpler) approach I am
> > overlooking.
> > 
> > Pointers to documentation or code (or even keywords I could
> > google)
> > would be much appreciated.
> 
> Looks like it can be done with 
> http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html 
> and 
> http://wiki.apache.org/solr/FunctionQuery
> 
> You can dump your table into three text files. Issue a commit to load these changes.
> 
> Sort by function query is available in Solr3.1 though.


Re: Custom sorting based on external (database) data

Posted by Ahmet Arslan <io...@yahoo.com>.

--- On Thu, 5/5/11, Sujit Pal <su...@comcast.net> wrote:

> From: Sujit Pal <su...@comcast.net>
> Subject: Custom sorting based on external (database) data
> To: "solr-user" <so...@lucene.apache.org>
> Date: Thursday, May 5, 2011, 11:03 PM
> Hi,
> 
> Sorry for the possible double post, I wrote this up but had
> the
> incorrect sender address, so I am guessing that my previous
> one is going
> to be rejected by the list moderation daemon.
> 
> I am trying to figure out options for the following
> problem. I am on
> Solr 1.4.1 (Lucene 2.9.1).
> 
> I have search results which are going to be ranked by the
> user (using a
> thumbs up/down) and would translate to a score between -1
> and +1. 
> 
> This data is stored in a database table (
> unique_id
> thumbs_up
> thumbs_down
> num_calls
> 
> as the thumbs up/down component is clicked.
> 
> We want to be able to sort the results by the following
> score =
> (thumbs_up - thumbs_down) / (num_calls). The unique_id
> field refers to
> the one referenced as <uniqueId> in the schema.xml.
> 
> Based on the following conversation:
> http://www.mail-archive.com/solr-user@lucene.apache.org/msg06322.html
> 
> 
> ...my understanding is that I need to:
> 
> 1) subclass FieldType to create my own RankFieldType. 
> 2) In this class I override the getSortField() method to
> return my
> custom FieldSortComparatorSource object.
> 3) Build the custom FieldSortComparatorSource object which
> returns a
> custom FieldSortComparator object in newComparator().
> 4) Configure the field type of class RankFieldType
> (rank_t), and a field
> (called rank) of field type rank_t in schema.xml of type
> RankFieldType.
> 5) use sort=rank+desc to do the sort.
> 
> My question is: is there a simpler/more performant way? The
> number of
> database lookups seems like its going to be pretty high
> with this
> approach. And its hard to believe that my problem is new,
> so I am
> guessing this is either part of some Solr configuration I
> am missing, or
> there is some other (possibly simpler) approach I am
> overlooking.
> 
> Pointers to documentation or code (or even keywords I could
> google)
> would be much appreciated.

Looks like it can be done with 
http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html 
and 
http://wiki.apache.org/solr/FunctionQuery

You can dump your table into three text files. Issue a commit to load these changes.

Sort by function query is available in Solr3.1 though.