You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by ahmed ghouzia <gh...@yahoo.com> on 2006/05/26 08:44:37 UTC

Re: Info on scoring/indexing and pagerank

Dear g

This is the best and complete book about search engines theories and designs
MINING THE WEB
DISCOVERING KNOWLEDGE FROM HYPERTEXT  DATA

Author: Soumen Chakrabarti

It contains also references to good resources

Yours
ghouzia

"Insurance Squared Inc." <gc...@insurancesquared.com> wrote: Hi All,

Two general questions:

- I'm wondering if there are any good sources of written information on 
actually writing a search engine script.  Things like scoring, indexing, 
that kind of stuff.  I bought the lucene book, but that's lucene 
specific technical info.  Looking for something at the software 
designer/technical manager level.
- It seems the pagerank method used by Google is patented, so we can't 
really use that.  AFAIK, nutch/lucene uses the log of the number of 
inbound links in the scoring which is better than nothing,but not ideal 
since it's based on straight volume of links.  Is there a happy medium 
that could be use?  i.e some definition of authority sites (high 
pagerank for example) passing along a greater weight with their links?  
Or some other sort of quality instead of quantity calculation?

Thanks,
g.



__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com