You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "FeiTian (JIRA)" <ji...@apache.org> on 2014/12/10 03:33:12 UTC
[jira] [Created] (NUTCH-1895) run() method in Crawler.java doesnt
put Nutch.ARG_BATCH in argMap
FeiTian created NUTCH-1895:
------------------------------
Summary: run() method in Crawler.java doesnt put Nutch.ARG_BATCH in argMap
Key: NUTCH-1895
URL: https://issues.apache.org/jira/browse/NUTCH-1895
Project: Nutch
Issue Type: Bug
Components: crawldb, indexer
Affects Versions: 2.2.1
Environment: Win7, Solr4.10.1
Reporter: FeiTian
I am using Nutch 2.2.1 and Solr 4.10.1.
OS: Win7.
Env: MyEclipse 10.
JAVA: jdk1.7.0_71
I am using command:
urls -depth 3 -topN 10 -solr http://localhost:8080/solr/collection2
to import data to Solr.
and using:
gora.sqlstore.jdbc.driver=com.mysql.jdbc.Driver
gora.sqlstore.jdbc.url=jdbc:mysql://192.168.0.69:3306/nutch?createDatabaseIfNotExist=true&useUnicode=true&characterEncoding=utf8&autoReconnect=true&zeroDateTimeBehavior=convertToNull
gora.sqlstore.jdbc.user=root
gora.sqlstore.jdbc.password=123456
to import data to mysql.
But I got null pointer exception on batchId, then I found:
In SolrIndexerJob.java, we need to get batchId from args:
@Override
public Map<String,Object> run(Map<String,Object> args) throws Exception {
String solrUrl = (String)args.get(Nutch.ARG_SOLR);
String batchId = (String)args.get(Nutch.ARG_BATCH);
NutchIndexWriterFactory.addClassToConf(getConf(), SolrWriter.class);
getConf().set(SolrConstants.SERVER_URL, solrUrl);
currentJob = createIndexJob(getConf(), "solr-index", batchId);
currentJob.waitForCompletion(true);
ToolUtil.recordJobStatus(null, currentJob, results);
return results;
}
But in Crawler.java, we did not put batchid in argMap:
@Override
public int run(String[] args) throws Exception {
if (args.length == 0) {
System.out.println("Usage: Crawler (<seedDir> | -continue) [-solr <solrURL>] [-threads n] [-depth i] [-topN N] [-numTasks N]");
return -1;
}
...
Map<String,Object> argMap = ToolUtil.toArgMap(
Nutch.ARG_THREADS, threads,
Nutch.ARG_DEPTH, depth,
Nutch.ARG_TOPN, topN,
Nutch.ARG_SOLR, solrUrl,
Nutch.ARG_SEEDDIR, seedDir,
Nutch.ARG_NUMTASKS, numTasks);
run(argMap);
return 0;
}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)