You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by ahmed ghouzia <gh...@yahoo.com> on 2006/05/26 08:44:37 UTC
Re: Info on scoring/indexing and pagerank
Dear g
This is the best and complete book about search engines theories and designs
MINING THE WEB
DISCOVERING KNOWLEDGE FROM HYPERTEXT DATA
Author: Soumen Chakrabarti
It contains also references to good resources
Yours
ghouzia
"Insurance Squared Inc." <gc...@insurancesquared.com> wrote: Hi All,
Two general questions:
- I'm wondering if there are any good sources of written information on
actually writing a search engine script. Things like scoring, indexing,
that kind of stuff. I bought the lucene book, but that's lucene
specific technical info. Looking for something at the software
designer/technical manager level.
- It seems the pagerank method used by Google is patented, so we can't
really use that. AFAIK, nutch/lucene uses the log of the number of
inbound links in the scoring which is better than nothing,but not ideal
since it's based on straight volume of links. Is there a happy medium
that could be use? i.e some definition of authority sites (high
pagerank for example) passing along a greater weight with their links?
Or some other sort of quality instead of quantity calculation?
Thanks,
g.
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com