You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Mathieu Bouchard (JIRA)" <ji...@apache.org> on 2014/08/25 15:21:58 UTC
[jira] [Created] (NUTCH-1828) bin/crawl : incorrect handling of
nutch errors
Mathieu Bouchard created NUTCH-1828:
---------------------------------------
Summary: bin/crawl : incorrect handling of nutch errors
Key: NUTCH-1828
URL: https://issues.apache.org/jira/browse/NUTCH-1828
Project: Nutch
Issue Type: Bug
Components: nutchNewbie
Affects Versions: 2.2.1, 1.9
Environment: Ubuntu Server 14.04, OpenJDK 7
Reporter: Mathieu Bouchard
We are using Solr with Nutch to provide a complete search engine for our website.
I created a cron job that would use Nutch to crawl and update the Solr index each night. This cron job is trying to automatically correct some errors that could result in a corrupt crawldb. However, it seems that the bin/crawl command doesn't correctly propagate errors coming from bin/nutch.
Here is an exemple from the bin/crawl script :
$bin/nutch inject $CRAWL_PATH/crawldb $SEEDDIR
if [ $? -ne 0 ]
then exit $?
fi
Even if there is an error in the nutch inject command, the crawl script always returns 0. The way I understand it, the exit code returned is the result of the shell test and not the result of the nutch inject command.
To correct this, we would need to modify the script with something like :
$bin/nutch inject $CRAWL_PATH/crawldb $SEEDDIR
RETCODE=$?
if [ $RETCODE -ne 0 ]
then exit $RETCODE
fi
--
This message was sent by Atlassian JIRA
(v6.2#6252)