You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2011/08/23 13:48:29 UTC

[jira] [Commented] (NUTCH-1087) Deprecate crawl command and replace with example script

    [ https://issues.apache.org/jira/browse/NUTCH-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089405#comment-13089405 ] 

Andrzej Bialecki  commented on NUTCH-1087:
------------------------------------------


IIRC we had this discussion in the past... It's true that we already rely on Bash to do anything useful, no matter whether it's on Windows or on a *nix-like OS. And it's true that the crawl command has been a constant source of confusion over the years. The crawl application also suffered from some subtle bugs, especially when running in local mode (e.g. the PluginRepository leaks).

But the argument about maintenance costs is IMHO moot - you have to maintain a shell script, too, so it's no different from maintaining a Java class. Where it differs, I think, is that moving the crawl cycle logic to a shell script now raises the bar for Java developers who are not familiar with Bash scripting - a robust crawl script is not easy to follow, as it needs to handle error conditions and manage input/output resources on HDFS. On the other hand it's easier for system admins to tweak a script rather than tweaking a Java code... so I guess it's also a question of who's the audience for this functionality.

I'm +0 for removing Crawl and replacing it with a script, IMHO it doesn't change the picture in any significant way.


> Deprecate crawl command and replace with example script
> -------------------------------------------------------
>
>                 Key: NUTCH-1087
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1087
>             Project: Nutch
>          Issue Type: Task
>    Affects Versions: 1.4
>            Reporter: Markus Jelsma
>            Priority: Minor
>             Fix For: 1.4
>
>
> * remove the crawl command
> * add basic crawl shell script
> See thread:
> http://www.mail-archive.com/dev@nutch.apache.org/msg03848.html

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira