You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2017/09/20 08:08:01 UTC

[jira] [Commented] (NUTCH-2425) Update GettingNutchRunningWithUbuntu wiki article

    [ https://issues.apache.org/jira/browse/NUTCH-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16172884#comment-16172884 ] 

Sebastian Nagel commented on NUTCH-2425:
----------------------------------------

Hi [~krichter], thanks! Everyone is welcome to improve the documentation on the Wiki. Please, create an account (https://wiki.apache.org/nutch/FrontPage?action=newaccount) and send us your username over the mailing lists. Thanks!

Btw., {{urls}} can be a file or directory:
* {{bin/nutch inject .../crawldb urls/seeds.txt}} injects all URLs from this file
* {{bin/nutch inject .../crawldb urls/}} injects all URLs from {{urls/seeds.txt}} but also other files found in {{urls/}}

> Update GettingNutchRunningWithUbuntu wiki article
> -------------------------------------------------
>
>                 Key: NUTCH-2425
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2425
>             Project: Nutch
>          Issue Type: Task
>            Reporter: Karl Richter
>
> https://wiki.apache.org/nutch/GettingNutchRunningWithUbuntu contains some errors (e.g. `echo 'http://lucene.apache.org/nutch/' > urls` where `urls` is a directory) and obsolete parts (`conf/crawl-urlfilter.txt` is `conf/regex-urlfilter.txt` in 2.x) and thus appear to be tested well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)