You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "artodeto (JIRA)" <ji...@apache.org> on 2018/01/31 11:15:00 UTC
[jira] [Created] (NUTCH-2507) NutchTutorial wiki pages as a lot of
outdated command line calls when it starts with the solr interaction
artodeto created NUTCH-2507:
-------------------------------
Summary: NutchTutorial wiki pages as a lot of outdated command line calls when it starts with the solr interaction
Key: NUTCH-2507
URL: https://issues.apache.org/jira/browse/NUTCH-2507
Project: Nutch
Issue Type: Bug
Components: documentation
Affects Versions: 1.14
Reporter: artodeto
h2. h2. Section "Step-by-Step: Indexing into Apache Solr"
replace:
{code:java}
Example: bin/nutch index http://localhost:8983/solr crawl/crawldb/ -linkdb crawl/linkdb/ crawl/segments/20131108063838/ -filter -normalize -deleteGone{code}
with:
{code:java}
Example: bin/nutch index -Dsolr.server.url=http://localhost:8983/solr/nutch ${NUTCH_RUNTIME_HOME}/crawl
/crawldb/ -linkdb ${NUTCH_RUNTIME_HOME}/crawl
/linkdb/ ${NUTCH_RUNTIME_HOME}/crawl
/segments/20131108063838
/ -filter -normalize -deleteGo{code}
h2. Section "Step-by-Step: Deleting Duplicates"
replace:
{code:java}
Usage: bin/nutch dedup <solr url>
Example: /bin/nutch dedup http://localhost:8983/solr
{code}
with:
{code:java}
Usage: bin/nutch dedup <path to the crawldb> <solr url>
Example: /bin/nutch dedup ${NUTCH_RUNTIME_HOME}/crawl/crawldb/ http://localhost:8983/sol
{code}
h2. Section "Step-by-Step: Cleaning Solr"
replace:
{code:java}
Usage: bin/nutch clean -Dsolr.server.url=<solr index url> <crawldb>
Example: /bin/nutch clean -Dsolr.server.url=http://localhost:8983/solr/nutch crawl/crawldb/
{code}
with:
{code}
Usage: bin/nutch clean -Dsolr.server.url=<solr index url> <crawldb>
Example: /bin/nutch clean -Dsolr.server.url=http://localhost:8983/solr/nutch ${NUTCH_RUNTIME_HOME}/crawl/crawldb/
{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)