You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Bing Li <lb...@gmail.com> on 2012/01/21 22:33:37 UTC

How to Sort By a PageRank-Like Complicated Strategy?

Dear all,

I am using SolrJ to implement a system that needs to provide users with
searching services. I have some questions about Solr searching as follows.

As I know, Lucene retrieves data according to the degree of keyword
matching on text field (partial matching).

But, if I search data by string field (complete matching), how does Lucene
sort the retrieved data?

If I want to add new sorting ways, Solr's function query seems to support
this feature.

However, for a complicated ranking strategy, such PageRank, can Solr
provide an interface for me to do that?

My ranking ways are more complicated than PageRank. Now I have to load all
of matched data from Solr first by keyword and rank them again in my ways
before showing to users. It is correct?

Thanks so much!
Bing

Re: How to Sort By a PageRank-Like Complicated Strategy?

Posted by Ahmet Arslan <io...@yahoo.com>.
> As I learned, big data, such as Lucene index, was not
> suitable to be
> updated frequently. 

Some people use ExternalFileField for PageRank-like fields.

http://lucidworks.lucidimagination.com/display/solr/Solr+Field+Types#SolrFieldTypes-WorkingwithExternalFiles

Lucene supports parent/child documents, may be that can be used too.

http://blog.mikemccandless.com/2012/01/tochildblockjoinquery-in-lucene.html

Re: How to Sort By a PageRank-Like Complicated Strategy?

Posted by Bing Li <lb...@gmail.com>.
Dear Shashi,

As I learned, big data, such as Lucene index, was not suitable to be
updated frequently. Frequent updating must affect the performance and
consistency when Lucene index must be replicated in a large scale cluster.
It is expected such a search engine must work in a write-once & read-many
environment, right? That's what HDFS (Hadoop Distributed File System)
provides. According to my experience, it is really slow when updating a
Lucene Index.

Why did you say I could update Lucene index frequently?

Thanks so much!
Bing

On Mon, Jan 23, 2012 at 11:02 PM, Shashi Kant <sk...@sloan.mit.edu> wrote:

> You can update the document in the index quite frequently. IDNK what
> your requirement is, another option would be to boost query time.
>
> On Sun, Jan 22, 2012 at 5:51 AM, Bing Li <lb...@gmail.com> wrote:
> > Dear Shashi,
> >
> > Thanks so much for your reply!
> >
> > However, I think the value of PageRank is not a static one. It must
> update
> > on the fly. As I know, Lucene index is not suitable to be updated too
> > frequently. If so, how to deal with that?
> >
> > Best regards,
> > Bing
> >
> >
> > On Sun, Jan 22, 2012 at 12:43 PM, Shashi Kant <sk...@sloan.mit.edu>
> wrote:
> >>
> >> Lucene has a mechanism to "boost" up/down documents using your custom
> >> ranking algorithm. So if you come up with something like Pagerank
> >> you might do something like doc.SetBoost(myboost), before writing to
> >> index.
> >>
> >>
> >>
> >> On Sat, Jan 21, 2012 at 5:07 PM, Bing Li <lb...@gmail.com> wrote:
> >> > Hi, Kai,
> >> >
> >> > Thanks so much for your reply!
> >> >
> >> > If the retrieving is done on a string field, not a text field, a
> >> > complete
> >> > matching approach should be used according to my understanding, right?
> >> > If
> >> > so, how does Lucene rank the retrieved data?
> >> >
> >> > Best regards,
> >> > Bing
> >> >
> >> > On Sun, Jan 22, 2012 at 5:56 AM, Kai Lu <lu...@gmail.com> wrote:
> >> >
> >> >> Solr is kind of retrieval step, you can customize the score formula
> in
> >> >> Lucene. But it supposes not to be too complicated, like it's better
> can
> >> >> be
> >> >> factorization. It also regards to the stored information, like
> >> >> TF,DF,position, etc. You can do 2nd phase rerank to the top N data
> you
> >> >> have
> >> >> got.
> >> >>
> >> >> Sent from my iPad
> >> >>
> >> >> On Jan 21, 2012, at 1:33 PM, Bing Li <lb...@gmail.com> wrote:
> >> >>
> >> >> > Dear all,
> >> >> >
> >> >> > I am using SolrJ to implement a system that needs to provide users
> >> >> > with
> >> >> > searching services. I have some questions about Solr searching as
> >> >> follows.
> >> >> >
> >> >> > As I know, Lucene retrieves data according to the degree of keyword
> >> >> > matching on text field (partial matching).
> >> >> >
> >> >> > But, if I search data by string field (complete matching), how does
> >> >> Lucene
> >> >> > sort the retrieved data?
> >> >> >
> >> >> > If I want to add new sorting ways, Solr's function query seems to
> >> >> > support
> >> >> > this feature.
> >> >> >
> >> >> > However, for a complicated ranking strategy, such PageRank, can
> Solr
> >> >> > provide an interface for me to do that?
> >> >> >
> >> >> > My ranking ways are more complicated than PageRank. Now I have to
> >> >> > load
> >> >> all
> >> >> > of matched data from Solr first by keyword and rank them again in
> my
> >> >> > ways
> >> >> > before showing to users. It is correct?
> >> >> >
> >> >> > Thanks so much!
> >> >> > Bing
> >> >>
> >
> >
>

Re: How to Sort By a PageRank-Like Complicated Strategy?

Posted by Shashi Kant <sk...@sloan.mit.edu>.
You can update the document in the index quite frequently. IDNK what
your requirement is, another option would be to boost query time.

On Sun, Jan 22, 2012 at 5:51 AM, Bing Li <lb...@gmail.com> wrote:
> Dear Shashi,
>
> Thanks so much for your reply!
>
> However, I think the value of PageRank is not a static one. It must update
> on the fly. As I know, Lucene index is not suitable to be updated too
> frequently. If so, how to deal with that?
>
> Best regards,
> Bing
>
>
> On Sun, Jan 22, 2012 at 12:43 PM, Shashi Kant <sk...@sloan.mit.edu> wrote:
>>
>> Lucene has a mechanism to "boost" up/down documents using your custom
>> ranking algorithm. So if you come up with something like Pagerank
>> you might do something like doc.SetBoost(myboost), before writing to
>> index.
>>
>>
>>
>> On Sat, Jan 21, 2012 at 5:07 PM, Bing Li <lb...@gmail.com> wrote:
>> > Hi, Kai,
>> >
>> > Thanks so much for your reply!
>> >
>> > If the retrieving is done on a string field, not a text field, a
>> > complete
>> > matching approach should be used according to my understanding, right?
>> > If
>> > so, how does Lucene rank the retrieved data?
>> >
>> > Best regards,
>> > Bing
>> >
>> > On Sun, Jan 22, 2012 at 5:56 AM, Kai Lu <lu...@gmail.com> wrote:
>> >
>> >> Solr is kind of retrieval step, you can customize the score formula in
>> >> Lucene. But it supposes not to be too complicated, like it's better can
>> >> be
>> >> factorization. It also regards to the stored information, like
>> >> TF,DF,position, etc. You can do 2nd phase rerank to the top N data you
>> >> have
>> >> got.
>> >>
>> >> Sent from my iPad
>> >>
>> >> On Jan 21, 2012, at 1:33 PM, Bing Li <lb...@gmail.com> wrote:
>> >>
>> >> > Dear all,
>> >> >
>> >> > I am using SolrJ to implement a system that needs to provide users
>> >> > with
>> >> > searching services. I have some questions about Solr searching as
>> >> follows.
>> >> >
>> >> > As I know, Lucene retrieves data according to the degree of keyword
>> >> > matching on text field (partial matching).
>> >> >
>> >> > But, if I search data by string field (complete matching), how does
>> >> Lucene
>> >> > sort the retrieved data?
>> >> >
>> >> > If I want to add new sorting ways, Solr's function query seems to
>> >> > support
>> >> > this feature.
>> >> >
>> >> > However, for a complicated ranking strategy, such PageRank, can Solr
>> >> > provide an interface for me to do that?
>> >> >
>> >> > My ranking ways are more complicated than PageRank. Now I have to
>> >> > load
>> >> all
>> >> > of matched data from Solr first by keyword and rank them again in my
>> >> > ways
>> >> > before showing to users. It is correct?
>> >> >
>> >> > Thanks so much!
>> >> > Bing
>> >>
>
>

Re: How to Sort By a PageRank-Like Complicated Strategy?

Posted by Bing Li <lb...@gmail.com>.
Dear Shashi,

Thanks so much for your reply!

However, I think the value of PageRank is not a static one. It must update
on the fly. As I know, Lucene index is not suitable to be updated too
frequently. If so, how to deal with that?

Best regards,
Bing


On Sun, Jan 22, 2012 at 12:43 PM, Shashi Kant <sk...@sloan.mit.edu> wrote:

> Lucene has a mechanism to "boost" up/down documents using your custom
> ranking algorithm. So if you come up with something like Pagerank
> you might do something like doc.SetBoost(myboost), before writing to index.
>
>
>
> On Sat, Jan 21, 2012 at 5:07 PM, Bing Li <lb...@gmail.com> wrote:
> > Hi, Kai,
> >
> > Thanks so much for your reply!
> >
> > If the retrieving is done on a string field, not a text field, a complete
> > matching approach should be used according to my understanding, right? If
> > so, how does Lucene rank the retrieved data?
> >
> > Best regards,
> > Bing
> >
> > On Sun, Jan 22, 2012 at 5:56 AM, Kai Lu <lu...@gmail.com> wrote:
> >
> >> Solr is kind of retrieval step, you can customize the score formula in
> >> Lucene. But it supposes not to be too complicated, like it's better can
> be
> >> factorization. It also regards to the stored information, like
> >> TF,DF,position, etc. You can do 2nd phase rerank to the top N data you
> have
> >> got.
> >>
> >> Sent from my iPad
> >>
> >> On Jan 21, 2012, at 1:33 PM, Bing Li <lb...@gmail.com> wrote:
> >>
> >> > Dear all,
> >> >
> >> > I am using SolrJ to implement a system that needs to provide users
> with
> >> > searching services. I have some questions about Solr searching as
> >> follows.
> >> >
> >> > As I know, Lucene retrieves data according to the degree of keyword
> >> > matching on text field (partial matching).
> >> >
> >> > But, if I search data by string field (complete matching), how does
> >> Lucene
> >> > sort the retrieved data?
> >> >
> >> > If I want to add new sorting ways, Solr's function query seems to
> support
> >> > this feature.
> >> >
> >> > However, for a complicated ranking strategy, such PageRank, can Solr
> >> > provide an interface for me to do that?
> >> >
> >> > My ranking ways are more complicated than PageRank. Now I have to load
> >> all
> >> > of matched data from Solr first by keyword and rank them again in my
> ways
> >> > before showing to users. It is correct?
> >> >
> >> > Thanks so much!
> >> > Bing
> >>
>

Re: How to Sort By a PageRank-Like Complicated Strategy?

Posted by Shashi Kant <sk...@sloan.mit.edu>.
Lucene has a mechanism to "boost" up/down documents using your custom
ranking algorithm. So if you come up with something like Pagerank
you might do something like doc.SetBoost(myboost), before writing to index.



On Sat, Jan 21, 2012 at 5:07 PM, Bing Li <lb...@gmail.com> wrote:
> Hi, Kai,
>
> Thanks so much for your reply!
>
> If the retrieving is done on a string field, not a text field, a complete
> matching approach should be used according to my understanding, right? If
> so, how does Lucene rank the retrieved data?
>
> Best regards,
> Bing
>
> On Sun, Jan 22, 2012 at 5:56 AM, Kai Lu <lu...@gmail.com> wrote:
>
>> Solr is kind of retrieval step, you can customize the score formula in
>> Lucene. But it supposes not to be too complicated, like it's better can be
>> factorization. It also regards to the stored information, like
>> TF,DF,position, etc. You can do 2nd phase rerank to the top N data you have
>> got.
>>
>> Sent from my iPad
>>
>> On Jan 21, 2012, at 1:33 PM, Bing Li <lb...@gmail.com> wrote:
>>
>> > Dear all,
>> >
>> > I am using SolrJ to implement a system that needs to provide users with
>> > searching services. I have some questions about Solr searching as
>> follows.
>> >
>> > As I know, Lucene retrieves data according to the degree of keyword
>> > matching on text field (partial matching).
>> >
>> > But, if I search data by string field (complete matching), how does
>> Lucene
>> > sort the retrieved data?
>> >
>> > If I want to add new sorting ways, Solr's function query seems to support
>> > this feature.
>> >
>> > However, for a complicated ranking strategy, such PageRank, can Solr
>> > provide an interface for me to do that?
>> >
>> > My ranking ways are more complicated than PageRank. Now I have to load
>> all
>> > of matched data from Solr first by keyword and rank them again in my ways
>> > before showing to users. It is correct?
>> >
>> > Thanks so much!
>> > Bing
>>

Re: How to Sort By a PageRank-Like Complicated Strategy?

Posted by Bing Li <lb...@gmail.com>.
Hi, Kai,

Thanks so much for your reply!

If the retrieving is done on a string field, not a text field, a complete
matching approach should be used according to my understanding, right? If
so, how does Lucene rank the retrieved data?

Best regards,
Bing

On Sun, Jan 22, 2012 at 5:56 AM, Kai Lu <lu...@gmail.com> wrote:

> Solr is kind of retrieval step, you can customize the score formula in
> Lucene. But it supposes not to be too complicated, like it's better can be
> factorization. It also regards to the stored information, like
> TF,DF,position, etc. You can do 2nd phase rerank to the top N data you have
> got.
>
> Sent from my iPad
>
> On Jan 21, 2012, at 1:33 PM, Bing Li <lb...@gmail.com> wrote:
>
> > Dear all,
> >
> > I am using SolrJ to implement a system that needs to provide users with
> > searching services. I have some questions about Solr searching as
> follows.
> >
> > As I know, Lucene retrieves data according to the degree of keyword
> > matching on text field (partial matching).
> >
> > But, if I search data by string field (complete matching), how does
> Lucene
> > sort the retrieved data?
> >
> > If I want to add new sorting ways, Solr's function query seems to support
> > this feature.
> >
> > However, for a complicated ranking strategy, such PageRank, can Solr
> > provide an interface for me to do that?
> >
> > My ranking ways are more complicated than PageRank. Now I have to load
> all
> > of matched data from Solr first by keyword and rank them again in my ways
> > before showing to users. It is correct?
> >
> > Thanks so much!
> > Bing
>

Re: How to Sort By a PageRank-Like Complicated Strategy?

Posted by Kai Lu <lu...@gmail.com>.
Solr is kind of retrieval step, you can customize the score formula in Lucene. But it supposes not to be too complicated, like it's better can be factorization. It also regards to the stored information, like TF,DF,position, etc. You can do 2nd phase rerank to the top N data you have got.

Sent from my iPad

On Jan 21, 2012, at 1:33 PM, Bing Li <lb...@gmail.com> wrote:

> Dear all,
> 
> I am using SolrJ to implement a system that needs to provide users with
> searching services. I have some questions about Solr searching as follows.
> 
> As I know, Lucene retrieves data according to the degree of keyword
> matching on text field (partial matching).
> 
> But, if I search data by string field (complete matching), how does Lucene
> sort the retrieved data?
> 
> If I want to add new sorting ways, Solr's function query seems to support
> this feature.
> 
> However, for a complicated ranking strategy, such PageRank, can Solr
> provide an interface for me to do that?
> 
> My ranking ways are more complicated than PageRank. Now I have to load all
> of matched data from Solr first by keyword and rank them again in my ways
> before showing to users. It is correct?
> 
> Thanks so much!
> Bing