You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by ha...@hsbc.com on 2018/11/19 10:41:36 UTC

unexpected Nutch crawl interruption

Hello,

What will happen if bin/crawl command is forced to be stopped by any reason? Server restart....

Kind regards,
Hany Shehata
Solutions Architect, Marketing and Communications IT
Corporate Functions | HSBC Operations, Services and Technology (HOST)
ul. Kapelanka 42A, 30-347 Kraków, Poland
__________________________________________________________________

Tie line: 7148 7689 4698
External: +48 123 42 0698
Mobile: +48 723 680 278
E-mail: hany.nasr@hsbc.com<ma...@hsbc.com>
__________________________________________________________________
Protect our environment - please only print this if you have to!



-----------------------------------------
SAVE PAPER - THINK BEFORE YOU PRINT!

This E-mail is confidential.  

It may also be legally privileged. If you are not the addressee you may not copy,
forward, disclose or use any part of it. If you have received this message in error,
please delete it and all copies from your system and notify the sender immediately by
return E-mail.

Internet communications cannot be guaranteed to be timely secure, error or virus-free.
The sender does not accept liability for any errors or omissions.

RE: RE: unexpected Nutch crawl interruption

Posted by ha...@hsbc.com.
This means there is nothing called corrupted db by any mean?


Kind regards, 
Hany Shehata
Solutions Architect, Marketing and Communications IT 
Corporate Functions | HSBC Operations, Services and Technology (HOST)
ul. Kapelanka 42A, 30-347 Kraków, Poland
__________________________________________________________________ 

Tie line: 7148 7689 4698 
External: +48 123 42 0698 
Mobile: +48 723 680 278 
E-mail: hany.nasr@hsbc.com 
__________________________________________________________________ 
Protect our environment - please only print this if you have to!


-----Original Message-----
From: Semyon Semyonov [mailto:semyon.semyonov@mail.com] 
Sent: Monday, November 19, 2018 12:59 PM
To: user@nutch.apache.org
Subject: Re: RE: unexpected Nutch crawl interruption

From the most recent updated crawldb.
 

Sent: Monday, November 19, 2018 at 12:35 PM
From: hany.nasr@hsbc.com
To: "user@nutch.apache.org" <us...@nutch.apache.org>
Subject: RE: unexpected Nutch crawl interruption Hello Semyon,

Does it means that if I re-run crawl command it will continue from where it has been stopped from the previous run?

Kind regards,
Hany Shehata
Solutions Architect, Marketing and Communications IT Corporate Functions | HSBC Operations, Services and Technology (HOST) ul. Kapelanka 42A, 30-347 Kraków, Poland __________________________________________________________________ 

Tie line: 7148 7689 4698
External: +48 123 42 0698
Mobile: +48 723 680 278
E-mail: hany.nasr@hsbc.com
__________________________________________________________________
Protect our environment - please only print this if you have to!


-----Original Message-----
From: Semyon Semyonov [mailto:semyon.semyonov@mail.com]
Sent: Monday, November 19, 2018 12:06 PM
To: user@nutch.apache.org
Subject: Re: unexpected Nutch crawl interruption

Hi Hany,  
 
If you open the script code you will reach that line:
 
# main loop : rounds of generate - fetch - parse - update for ((a=1; ; a++)) with number of break conditions.

For each iteration it calls n-independent map jobs.
If it breaks it stops.
You should finish the loop either with manual nutch commands, or start with the new call of crawl script using the past iteration crawldb.
Semyon.
 
 

Sent: Monday, November 19, 2018 at 11:41 AM
From: hany.nasr@hsbc.com
To: "user@nutch.apache.org" <us...@nutch.apache.org>
Subject: unexpected Nutch crawl interruption Hello,

What will happen if bin/crawl command is forced to be stopped by any reason? Server restart....

Kind regards,
Hany Shehata
Solutions Architect, Marketing and Communications IT Corporate Functions | HSBC Operations, Services and Technology (HOST) ul. Kapelanka 42A, 30-347 Kraków, Poland __________________________________________________________________

Tie line: 7148 7689 4698
External: +48 123 42 0698
Mobile: +48 723 680 278
E-mail: hany.nasr@hsbc.com<ma...@hsbc.com>
__________________________________________________________________
Protect our environment - please only print this if you have to!



-----------------------------------------
SAVE PAPER - THINK BEFORE YOU PRINT!

This E-mail is confidential.

It may also be legally privileged. If you are not the addressee you may not copy, forward, disclose or use any part of it. If you have received this message in error, please delete it and all copies from your system and notify the sender immediately by return E-mail.

Internet communications cannot be guaranteed to be timely secure, error or virus-free.
The sender does not accept liability for any errors or omissions.


***************************************************
This message originated from the Internet. Its originator may or may not be who they claim to be and the information contained in the message and any attachments may or may not be accurate.
****************************************************




-----------------------------------------
SAVE PAPER - THINK BEFORE YOU PRINT!

This E-mail is confidential.

It may also be legally privileged. If you are not the addressee you may not copy, forward, disclose or use any part of it. If you have received this message in error, please delete it and all copies from your system and notify the sender immediately by return E-mail.

Internet communications cannot be guaranteed to be timely secure, error or virus-free.
The sender does not accept liability for any errors or omissions.


***************************************************
This message originated from the Internet. Its originator may or may not be who they claim to be and the information contained in the message and any attachments may or may not be accurate.
****************************************************


-----------------------------------------
SAVE PAPER - THINK BEFORE YOU PRINT!

This E-mail is confidential.  

It may also be legally privileged. If you are not the addressee you may not copy,
forward, disclose or use any part of it. If you have received this message in error,
please delete it and all copies from your system and notify the sender immediately by
return E-mail.

Internet communications cannot be guaranteed to be timely secure, error or virus-free.
The sender does not accept liability for any errors or omissions.

Re: RE: unexpected Nutch crawl interruption

Posted by Semyon Semyonov <se...@mail.com>.
From the most recent updated crawldb.
 

Sent: Monday, November 19, 2018 at 12:35 PM
From: hany.nasr@hsbc.com
To: "user@nutch.apache.org" <us...@nutch.apache.org>
Subject: RE: unexpected Nutch crawl interruption
Hello Semyon,

Does it means that if I re-run crawl command it will continue from where it has been stopped from the previous run?

Kind regards,
Hany Shehata
Solutions Architect, Marketing and Communications IT
Corporate Functions | HSBC Operations, Services and Technology (HOST)
ul. Kapelanka 42A, 30-347 Kraków, Poland
__________________________________________________________________ 

Tie line: 7148 7689 4698
External: +48 123 42 0698
Mobile: +48 723 680 278
E-mail: hany.nasr@hsbc.com 
__________________________________________________________________ 
Protect our environment - please only print this if you have to!


-----Original Message-----
From: Semyon Semyonov [mailto:semyon.semyonov@mail.com]
Sent: Monday, November 19, 2018 12:06 PM
To: user@nutch.apache.org
Subject: Re: unexpected Nutch crawl interruption

Hi Hany,  
 
If you open the script code you will reach that line:
 
# main loop : rounds of generate - fetch - parse - update for ((a=1; ; a++)) with number of break conditions.

For each iteration it calls n-independent map jobs.
If it breaks it stops.
You should finish the loop either with manual nutch commands, or start with the new call of crawl script using the past iteration crawldb.
Semyon.
 
 

Sent: Monday, November 19, 2018 at 11:41 AM
From: hany.nasr@hsbc.com
To: "user@nutch.apache.org" <us...@nutch.apache.org>
Subject: unexpected Nutch crawl interruption Hello,

What will happen if bin/crawl command is forced to be stopped by any reason? Server restart....

Kind regards,
Hany Shehata
Solutions Architect, Marketing and Communications IT Corporate Functions | HSBC Operations, Services and Technology (HOST) ul. Kapelanka 42A, 30-347 Kraków, Poland __________________________________________________________________

Tie line: 7148 7689 4698
External: +48 123 42 0698
Mobile: +48 723 680 278
E-mail: hany.nasr@hsbc.com<ma...@hsbc.com>
__________________________________________________________________
Protect our environment - please only print this if you have to!



-----------------------------------------
SAVE PAPER - THINK BEFORE YOU PRINT!

This E-mail is confidential.

It may also be legally privileged. If you are not the addressee you may not copy, forward, disclose or use any part of it. If you have received this message in error, please delete it and all copies from your system and notify the sender immediately by return E-mail.

Internet communications cannot be guaranteed to be timely secure, error or virus-free.
The sender does not accept liability for any errors or omissions.


***************************************************
This message originated from the Internet. Its originator may or may not be who they claim to be and the information contained in the message and any attachments may or may not be accurate.
****************************************************




-----------------------------------------
SAVE PAPER - THINK BEFORE YOU PRINT!

This E-mail is confidential.

It may also be legally privileged. If you are not the addressee you may not copy,
forward, disclose or use any part of it. If you have received this message in error,
please delete it and all copies from your system and notify the sender immediately by
return E-mail.

Internet communications cannot be guaranteed to be timely secure, error or virus-free.
The sender does not accept liability for any errors or omissions.

RE: unexpected Nutch crawl interruption

Posted by ha...@hsbc.com.
Hello Semyon,

Does it means that if I re-run crawl command it will continue from where it has been stopped from the previous run?

Kind regards, 
Hany Shehata
Solutions Architect, Marketing and Communications IT 
Corporate Functions | HSBC Operations, Services and Technology (HOST)
ul. Kapelanka 42A, 30-347 Kraków, Poland
__________________________________________________________________ 

Tie line: 7148 7689 4698 
External: +48 123 42 0698 
Mobile: +48 723 680 278 
E-mail: hany.nasr@hsbc.com 
__________________________________________________________________ 
Protect our environment - please only print this if you have to!


-----Original Message-----
From: Semyon Semyonov [mailto:semyon.semyonov@mail.com] 
Sent: Monday, November 19, 2018 12:06 PM
To: user@nutch.apache.org
Subject: Re: unexpected Nutch crawl interruption

Hi Hany,  
 
If you open the script code you will reach that line:
 
# main loop : rounds of generate - fetch - parse - update for ((a=1; ; a++)) with number of break conditions.

For each iteration it calls n-independent map jobs.
If it breaks it stops.
You should finish the loop either with manual nutch commands, or start with the new call of crawl script using the past iteration crawldb.
Semyon.
 
 

Sent: Monday, November 19, 2018 at 11:41 AM
From: hany.nasr@hsbc.com
To: "user@nutch.apache.org" <us...@nutch.apache.org>
Subject: unexpected Nutch crawl interruption Hello,

What will happen if bin/crawl command is forced to be stopped by any reason? Server restart....

Kind regards,
Hany Shehata
Solutions Architect, Marketing and Communications IT Corporate Functions | HSBC Operations, Services and Technology (HOST) ul. Kapelanka 42A, 30-347 Kraków, Poland __________________________________________________________________

Tie line: 7148 7689 4698
External: +48 123 42 0698
Mobile: +48 723 680 278
E-mail: hany.nasr@hsbc.com<ma...@hsbc.com>
__________________________________________________________________
Protect our environment - please only print this if you have to!



-----------------------------------------
SAVE PAPER - THINK BEFORE YOU PRINT!

This E-mail is confidential.

It may also be legally privileged. If you are not the addressee you may not copy, forward, disclose or use any part of it. If you have received this message in error, please delete it and all copies from your system and notify the sender immediately by return E-mail.

Internet communications cannot be guaranteed to be timely secure, error or virus-free.
The sender does not accept liability for any errors or omissions.


***************************************************
This message originated from the Internet. Its originator may or may not be who they claim to be and the information contained in the message and any attachments may or may not be accurate.
****************************************************

 


-----------------------------------------
SAVE PAPER - THINK BEFORE YOU PRINT!

This E-mail is confidential.  

It may also be legally privileged. If you are not the addressee you may not copy,
forward, disclose or use any part of it. If you have received this message in error,
please delete it and all copies from your system and notify the sender immediately by
return E-mail.

Internet communications cannot be guaranteed to be timely secure, error or virus-free.
The sender does not accept liability for any errors or omissions.

Re: unexpected Nutch crawl interruption

Posted by Semyon Semyonov <se...@mail.com>.
Hi Hany,  
 
If you open the script code you will reach that line:
 
# main loop : rounds of generate - fetch - parse - update
for ((a=1; ; a++)) with number of break conditions.

For each iteration it calls n-independent map jobs.
If it breaks it stops.
You should finish the loop either with manual nutch commands, or start with the new call of crawl script using the past iteration crawldb.
Semyon.
 
 

Sent: Monday, November 19, 2018 at 11:41 AM
From: hany.nasr@hsbc.com
To: "user@nutch.apache.org" <us...@nutch.apache.org>
Subject: unexpected Nutch crawl interruption
Hello,

What will happen if bin/crawl command is forced to be stopped by any reason? Server restart....

Kind regards,
Hany Shehata
Solutions Architect, Marketing and Communications IT
Corporate Functions | HSBC Operations, Services and Technology (HOST)
ul. Kapelanka 42A, 30-347 Kraków, Poland
__________________________________________________________________

Tie line: 7148 7689 4698
External: +48 123 42 0698
Mobile: +48 723 680 278
E-mail: hany.nasr@hsbc.com<ma...@hsbc.com>
__________________________________________________________________
Protect our environment - please only print this if you have to!



-----------------------------------------
SAVE PAPER - THINK BEFORE YOU PRINT!

This E-mail is confidential.

It may also be legally privileged. If you are not the addressee you may not copy,
forward, disclose or use any part of it. If you have received this message in error,
please delete it and all copies from your system and notify the sender immediately by
return E-mail.

Internet communications cannot be guaranteed to be timely secure, error or virus-free.
The sender does not accept liability for any errors or omissions.