You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Amna Waqar <am...@gmail.com> on 2011/01/24 10:39:37 UTC

resuming the nutch crawl after interruption

Dear all,
I am using nutch 1.2.The problem im facing is that i m unable to resume the
crawl after power failure,network disconnection or some other sort of
interruption from the same point where it has been interrupted.Is there any
way that i can resume the crawl after interruption from the same
point.

Regards
Amna Waqar

Re: resuming the nutch crawl after interruption

Posted by Amna Waqar <am...@gmail.com>.
Okiee thanx a lot every one.Hoping to get its release  version soon
On Mon, Jan 24, 2011 at 5:30 AM, Julien Nioche <
lists.digitalpebble@gmail.com> wrote:

> As Markus said you can't resume the fetch step in 1.x (this is possible in
> 2.0 though) but you can resume the crawl itself by deleting the current
> segment then use a shell script to do the generation / fetching / parsing /
> updating steps.
>
> Julien
>
>
> On 24 January 2011 10:16, Markus Jelsma <ma...@openindex.io>
> wrote:
>
> > No, Nutch 1.x cannot resume an interrupted fetch job.
> >
> > On Monday 24 January 2011 10:39:37 Amna Waqar wrote:
> > > Dear all,
> > > I am using nutch 1.2.The problem im facing is that i m unable to resume
> > the
> > > crawl after power failure,network disconnection or some other sort of
> > > interruption from the same point where it has been interrupted.Is there
> > any
> > > way that i can resume the crawl after interruption from the same
> > > point.
> > >
> > > Regards
> > > Amna Waqar
> >
> > --
> > Markus Jelsma - CTO - Openindex
> > http://www.linkedin.com/in/markus17
> > 050-8536620 / 06-50258350
> >
>
>
>
> --
> *
> *Open Source Solutions for Text Engineering
>
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
>

Re: resuming the nutch crawl after interruption

Posted by Julien Nioche <li...@gmail.com>.
As Markus said you can't resume the fetch step in 1.x (this is possible in
2.0 though) but you can resume the crawl itself by deleting the current
segment then use a shell script to do the generation / fetching / parsing /
updating steps.

Julien


On 24 January 2011 10:16, Markus Jelsma <ma...@openindex.io> wrote:

> No, Nutch 1.x cannot resume an interrupted fetch job.
>
> On Monday 24 January 2011 10:39:37 Amna Waqar wrote:
> > Dear all,
> > I am using nutch 1.2.The problem im facing is that i m unable to resume
> the
> > crawl after power failure,network disconnection or some other sort of
> > interruption from the same point where it has been interrupted.Is there
> any
> > way that i can resume the crawl after interruption from the same
> > point.
> >
> > Regards
> > Amna Waqar
>
> --
> Markus Jelsma - CTO - Openindex
> http://www.linkedin.com/in/markus17
> 050-8536620 / 06-50258350
>



-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com

Re: resuming the nutch crawl after interruption

Posted by Markus Jelsma <ma...@openindex.io>.
Not in Nutch 1.x. It will be very difficult to build it. You must just restart 
the batch. Using smaller batches may be a good idea.

On Monday 24 January 2011 11:27:10 Amna Waqar wrote:
> Thanx a lot for ur reply Markus. Are there any classes in the nutch project
> which can be changed so that crawling can be resumed after
> interruption..??? On Mon, Jan 24, 2011 at 5:16 AM, Markus Jelsma
> 
> <ma...@openindex.io>wrote:
> > No, Nutch 1.x cannot resume an interrupted fetch job.
> > 
> > On Monday 24 January 2011 10:39:37 Amna Waqar wrote:
> > > Dear all,
> > > I am using nutch 1.2.The problem im facing is that i m unable to resume
> > 
> > the
> > 
> > > crawl after power failure,network disconnection or some other sort of
> > > interruption from the same point where it has been interrupted.Is there
> > 
> > any
> > 
> > > way that i can resume the crawl after interruption from the same
> > > point.
> > > 
> > > Regards
> > > Amna Waqar
> > 
> > --
> > Markus Jelsma - CTO - Openindex
> > http://www.linkedin.com/in/markus17
> > 050-8536620 / 06-50258350

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Re: resuming the nutch crawl after interruption

Posted by Markus Jelsma <ma...@openindex.io>.
No, Nutch 1.x cannot resume an interrupted fetch job.

On Monday 24 January 2011 10:39:37 Amna Waqar wrote:
> Dear all,
> I am using nutch 1.2.The problem im facing is that i m unable to resume the
> crawl after power failure,network disconnection or some other sort of
> interruption from the same point where it has been interrupted.Is there any
> way that i can resume the crawl after interruption from the same
> point.
> 
> Regards
> Amna Waqar

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350