You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Smith Norton <sm...@gmail.com> on 2007/09/06 13:13:54 UTC

Increase ranks of some pages or sites manually?

Is it possible to increase the rank of some pages or sites manually
after the crawl is complete?

I am having a problem. I have a couple of intranet sites,
http://central/, http://esupport/,  http://journal/, etc.

central is the most important site but not all of its pages are
indexed and the page ranks from this site are also very low in search
results.

journal is not an important site but all pages of this site are
indexed and this site ranks very well in the results.

This is probably because, there is a lot of incoming links,
interlinking, etc. for journal. OTOH, central is a very simple and
plain looking site with not much interlinking.

I want to flip the situation where the crawler knows that it can not
sacrifice any central pages at the cost of indexing journal pages.
please suggest how this can be done.

Re: Increase ranks of some pages or sites manually?

Posted by Doğacan Güney <do...@gmail.com>.
Hi,

On 9/6/07, Smith Norton <sm...@gmail.com> wrote:
> Is it possible to increase the rank of some pages or sites manually
> after the crawl is complete?
>
> I am having a problem. I have a couple of intranet sites,
> http://central/, http://esupport/,  http://journal/, etc.
>
> central is the most important site but not all of its pages are
> indexed and the page ranks from this site are also very low in search
> results.
>
> journal is not an important site but all pages of this site are
> indexed and this site ranks very well in the results.
>
> This is probably because, there is a lot of incoming links,
> interlinking, etc. for journal. OTOH, central is a very simple and
> plain looking site with not much interlinking.
>
> I want to flip the situation where the crawler knows that it can not
> sacrifice any central pages at the cost of indexing journal pages.
> please suggest how this can be done.
>

This can be accomplished by writing a ScoringFilter. In your
ScoringFilter override indexerScore method to boost the score of urls
coming from http://central/.

For an ScoringFilter plugin example, you may take a look at
src/plugin/scoring-opic.

-- 
Doğacan Güney

Re: Increase ranks of some pages or sites manually?

Posted by Smith Norton <sm...@gmail.com>.
I found one property called query.site.boost in nutch-default.xml. If
I mention this in nutch-site.xml would it solve my problem?

These boost values are used while crawling or while searching through
search.jsp?

On 9/6/07, Smith Norton <sm...@gmail.com> wrote:
> Is it possible to increase the rank of some pages or sites manually
> after the crawl is complete?
>
> I am having a problem. I have a couple of intranet sites,
> http://central/, http://esupport/,  http://journal/, etc.
>
> central is the most important site but not all of its pages are
> indexed and the page ranks from this site are also very low in search
> results.
>
> journal is not an important site but all pages of this site are
> indexed and this site ranks very well in the results.
>
> This is probably because, there is a lot of incoming links,
> interlinking, etc. for journal. OTOH, central is a very simple and
> plain looking site with not much interlinking.
>
> I want to flip the situation where the crawler knows that it can not
> sacrifice any central pages at the cost of indexing journal pages.
> please suggest how this can be done.
>

Re: Increase ranks of some pages or sites manually?

Posted by Rohan Mehta <ro...@visvo.com>.
Hi Smith,

The simplest way out would be to use doc.setboost(boost) in 
org.apache.nutch.indexer.Indexer.java
Set the required boost to depending on the url.

Cheers,
Rohan

<ci...@visvo.com>Smith Norton wrote:
> Is it possible to increase the rank of some pages or sites manually
> after the crawl is complete?
>
> I am having a problem. I have a couple of intranet sites,
> http://central/, http://esupport/,  http://journal/, etc.
>
> central is the most important site but not all of its pages are
> indexed and the page ranks from this site are also very low in search
> results.
>
> journal is not an important site but all pages of this site are
> indexed and this site ranks very well in the results.
>
> This is probably because, there is a lot of incoming links,
> interlinking, etc. for journal. OTOH, central is a very simple and
> plain looking site with not much interlinking.
>
> I want to flip the situation where the crawler knows that it can not
> sacrifice any central pages at the cost of indexing journal pages.
> please suggest how this can be done.
>
>   


-- 
This message has been scanned for viruses and
dangerous content and is believed to be clean.