You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Victor Lee <vi...@yahoo.com> on 2005/12/20 04:48:18 UTC
Does Search Result Show Similar Pages Like Google?
Hi,
Does Nutch's search result show "similar pages" like Google? I went to Modzex.com which is using Nutch but I don't see "similar pages" in its search result.
Many thanks.
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
Re: Does Search Result Show Similar Pages Like Google?
Posted by Doug Cutting <cu...@nutch.org>.
Victor Lee wrote:
> Does Nutch's search result show "similar pages" like Google? I went to Modzex.com which is using Nutch but I don't see "similar pages" in its search result.
One could use the Lucene "more-like-this" library to implement this:
http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/similarity/
Doug
Re: Does Search Result Show Similar Pages Like Google?
Posted by Jérôme Charron <je...@gmail.com>.
take a look at the clustering-carrot2 plugin.
Regards
Jérôme
On 12/21/05, Daqing Zhao <de...@gmail.com> wrote:
>
> I think clustering the documents would be a solution and just recommend
> other documents in the same cluster. Is there a clustering algorithm in
> nutch? May be very expensive to calculate.
>
> Daqing Zhao
>
>
> On 12/20/05, Victor Lee <vi...@yahoo.com> wrote:
> >
> > Getting the term vector should be easy, but when you said calculation,
> is
> > it a simple comparision of all term vectors, or is it whole another
> beast?
> >
> > Stefan Groschupf <sg...@media-style.com> wrote: No, nutch has not such a
> > functionality.
> > The quick and dirty solution to implement this would extracting the
> > term vector from the original document, calculate (there would be
> > different algorithms) somehow the most important terms for this
> > document and just do a query with these terms.
> > HTH
> > Stefan
> > P.S. Contributions are every-time welcome. :)
> > Am 20.12.2005 um 04:48 schrieb Victor Lee:
> >
> > > Hi,
> > > Does Nutch's search result show "similar pages" like Google? I
> > > went to Modzex.com which is using Nutch but I don't see "similar
> > > pages" in its search result.
> > >
> > > Many thanks.
> > >
> > >
> > > __________________________________________________
> > > Do You Yahoo!?
> > > Tired of spam? Yahoo! Mail has the best spam protection around
> > > http://mail.yahoo.com
> >
> >
> >
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam? Yahoo! Mail has the best spam protection around
> > http://mail.yahoo.com
> >
>
>
--
http://motrech.free.fr/
http://www.frutch.org/
Re: Does Search Result Show Similar Pages Like Google?
Posted by Stefan Groschupf <sg...@media-style.com>.
Real clustering is for a web search engine impossible except of you
have unlimited hardware resources.
However as Jerome suggest there is a search result clustering plugin.
If you are more family with Math and algorithms you will find this
article interesting:
http://www.stanford.edu/~taherh/papers/scalable-clustering.pdf
But as mentioned calculating the most important words of a document
and make a new query is the way that is the best solution for now.
Stefan
Am 21.12.2005 um 00:48 schrieb Daqing Zhao:
> I think clustering the documents would be a solution and just
> recommend
> other documents in the same cluster. Is there a clustering
> algorithm in
> nutch? May be very expensive to calculate.
>
> Daqing Zhao
>
>
> On 12/20/05, Victor Lee <vi...@yahoo.com> wrote:
>>
>> Getting the term vector should be easy, but when you said
>> calculation, is
>> it a simple comparision of all term vectors, or is it whole
>> another beast?
>>
>> Stefan Groschupf <sg...@media-style.com> wrote: No, nutch has not such a
>> functionality.
>> The quick and dirty solution to implement this would extracting the
>> term vector from the original document, calculate (there would be
>> different algorithms) somehow the most important terms for this
>> document and just do a query with these terms.
>> HTH
>> Stefan
>> P.S. Contributions are every-time welcome. :)
>> Am 20.12.2005 um 04:48 schrieb Victor Lee:
>>
>>> Hi,
>>> Does Nutch's search result show "similar pages" like Google? I
>>> went to Modzex.com which is using Nutch but I don't see "similar
>>> pages" in its search result.
>>>
>>> Many thanks.
>>>
>>>
>>> __________________________________________________
>>> Do You Yahoo!?
>>> Tired of spam? Yahoo! Mail has the best spam protection around
>>> http://mail.yahoo.com
>>
>>
>>
>>
>> __________________________________________________
>> Do You Yahoo!?
>> Tired of spam? Yahoo! Mail has the best spam protection around
>> http://mail.yahoo.com
>>
---------------------------------------------------------------
company: http://www.media-style.com
forum: http://www.text-mining.org
blog: http://www.find23.net
Re: Does Search Result Show Similar Pages Like Google?
Posted by Daqing Zhao <de...@gmail.com>.
I think clustering the documents would be a solution and just recommend
other documents in the same cluster. Is there a clustering algorithm in
nutch? May be very expensive to calculate.
Daqing Zhao
On 12/20/05, Victor Lee <vi...@yahoo.com> wrote:
>
> Getting the term vector should be easy, but when you said calculation, is
> it a simple comparision of all term vectors, or is it whole another beast?
>
> Stefan Groschupf <sg...@media-style.com> wrote: No, nutch has not such a
> functionality.
> The quick and dirty solution to implement this would extracting the
> term vector from the original document, calculate (there would be
> different algorithms) somehow the most important terms for this
> document and just do a query with these terms.
> HTH
> Stefan
> P.S. Contributions are every-time welcome. :)
> Am 20.12.2005 um 04:48 schrieb Victor Lee:
>
> > Hi,
> > Does Nutch's search result show "similar pages" like Google? I
> > went to Modzex.com which is using Nutch but I don't see "similar
> > pages" in its search result.
> >
> > Many thanks.
> >
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam? Yahoo! Mail has the best spam protection around
> > http://mail.yahoo.com
>
>
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
>
Re: Does Search Result Show Similar Pages Like Google?
Posted by Victor Lee <vi...@yahoo.com>.
Getting the term vector should be easy, but when you said calculation, is it a simple comparision of all term vectors, or is it whole another beast?
Stefan Groschupf <sg...@media-style.com> wrote: No, nutch has not such a functionality.
The quick and dirty solution to implement this would extracting the
term vector from the original document, calculate (there would be
different algorithms) somehow the most important terms for this
document and just do a query with these terms.
HTH
Stefan
P.S. Contributions are every-time welcome. :)
Am 20.12.2005 um 04:48 schrieb Victor Lee:
> Hi,
> Does Nutch's search result show "similar pages" like Google? I
> went to Modzex.com which is using Nutch but I don't see "similar
> pages" in its search result.
>
> Many thanks.
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
Re: Does Search Result Show Similar Pages Like Google?
Posted by Stefan Groschupf <sg...@media-style.com>.
No, nutch has not such a functionality.
The quick and dirty solution to implement this would extracting the
term vector from the original document, calculate (there would be
different algorithms) somehow the most important terms for this
document and just do a query with these terms.
HTH
Stefan
P.S. Contributions are every-time welcome. :)
Am 20.12.2005 um 04:48 schrieb Victor Lee:
> Hi,
> Does Nutch's search result show "similar pages" like Google? I
> went to Modzex.com which is using Nutch but I don't see "similar
> pages" in its search result.
>
> Many thanks.
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com