You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Neil Rosewarm <ne...@yahoo.com> on 2009/01/02 13:29:06 UTC

Crawl Timing_Please help

Dear Friends,

Wish you all a happy and prosperous new year 2009!

We are trying to crawl 50,0000 pages. we are surprised to see it is taking 3 full days. Though, we have a server with 4GB RAM and Core 2 DUO processor with 250 GB Hard drive, we had numerous issues like power down, internet connectivity etc.....

I am really concerned for the time taking for a single crawl. Please, anyone suggest a best way of crawling faster?

Also, request you to suggest any idea to configure the crawling from where it stopped due to any reasons like power cut..internet cutoff etc....

Your advises are highly appreciated!

Thank you,

Neil

Re: Crawl Timing_Please help

Posted by John Martyniak <jo...@beforedawn.com>.
Neil,

That shouldn't take that long, when I do a 50K page crawl it takes a  
few hours.

Have you tried it again? Without the issues that you where talking  
about.

To my knowledge I don't think that you can restart a crawl from where  
it left off, it would be a cool feature.

Also is this hosted or is it running off Dsl/Cable Modem?

-John

On Jan 2, 2009, at 5:29 AM, Neil Rosewarm <ne...@yahoo.com>  
wrote:

> Dear Friends,
>
> Wish you all a happy and prosperous new year 2009!
>
> We are trying to crawl 50,0000 pages. we are surprised to see it is  
> taking 3 full days. Though, we have a server with 4GB RAM and Core 2  
> DUO processor with 250 GB Hard drive, we had numerous issues like  
> power down, internet connectivity etc.....
>
> I am really concerned for the time taking for a single crawl.  
> Please, anyone suggest a best way of crawling faster?
>
> Also, request you to suggest any idea to configure the crawling from  
> where it stopped due to any reasons like power cut..internet cutoff  
> etc....
>
> Your advises are highly appreciated!
>
> Thank you,
>
> Neil

AW: Crawl Timing_Please help

Posted by Höchstötter Nadine <Ho...@huberverlag.de>.
Hi, Neil,
how many unique host do you crawl?
Instal jnettop to monitor the bandwidth. Are there any restrictions in respect to bandwidth usage? 
Cheers, Nadine.

-----Ursprüngliche Nachricht-----
Von: Neil Rosewarm [mailto:neil_rosewarm@yahoo.com] 
Gesendet: Freitag, 2. Januar 2009 13:29
An: nutch-user@lucene.apache.org
Betreff: Crawl Timing_Please help

Dear Friends,

Wish you all a happy and prosperous new year 2009!

We are trying to crawl 50,0000 pages. we are surprised to see it is taking 3 full days. Though, we have a server with 4GB RAM and Core 2 DUO processor with 250 GB Hard drive, we had numerous issues like power down, internet connectivity etc.....

I am really concerned for the time taking for a single crawl. Please, anyone suggest a best way of crawling faster?

Also, request you to suggest any idea to configure the crawling from where it stopped due to any reasons like power cut..internet cutoff etc....

Your advises are highly appreciated!

Thank you,

Neil