You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/12/11 21:50:13 UTC
[jira] [Comment Edited] (NUTCH-1895) run() method in Crawler.java
doesnt put Nutch.ARG_BATCH in argMap
[ https://issues.apache.org/jira/browse/NUTCH-1895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14243112#comment-14243112 ]
Sebastian Nagel edited comment on NUTCH-1895 at 12/11/14 8:50 PM:
------------------------------------------------------------------
To run Nutch on Windows requires Cygwin which also provides a bash shell to run bin/crawl and bin/nutch (it works on Windows 7, really!), but we should move any discussions how to run Nutch (2.x) on Windows to the Nutch user mailing list. Thanks!
was (Author: wastl-nagel):
To run Nutch on Windows requires Cygwin which also provides a bash shell to run bin/crawl and bin/nutch (it works on Windows 7, really!), but we should move any discussions how to run Nutch (2.x) on Windows to the Nutch user mailing list.
> run() method in Crawler.java doesnt put Nutch.ARG_BATCH in argMap
> -----------------------------------------------------------------
>
> Key: NUTCH-1895
> URL: https://issues.apache.org/jira/browse/NUTCH-1895
> Project: Nutch
> Issue Type: Bug
> Components: crawldb, indexer
> Affects Versions: 2.2.1
> Environment: Win7, Solr4.10.1
> Reporter: FeiTian
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> I am using Nutch 2.2.1 and Solr 4.10.1.
> OS: Win7.
> Env: MyEclipse 10.
> JAVA: jdk1.7.0_71
> I am using command:
> urls -depth 3 -topN 10 -solr http://localhost:8080/solr/collection2
> to import data to Solr.
> and using:
> gora.sqlstore.jdbc.driver=com.mysql.jdbc.Driver
> gora.sqlstore.jdbc.url=jdbc:mysql://192.168.0.69:3306/nutch?createDatabaseIfNotExist=true&useUnicode=true&characterEncoding=utf8&autoReconnect=true&zeroDateTimeBehavior=convertToNull
> gora.sqlstore.jdbc.user=root
> gora.sqlstore.jdbc.password=123456
> to import data to mysql.
> But I got null pointer exception on batchId, then I found:
> In SolrIndexerJob.java, we need to get batchId from args:
> @Override
> public Map<String,Object> run(Map<String,Object> args) throws Exception {
> String solrUrl = (String)args.get(Nutch.ARG_SOLR);
> String batchId = (String)args.get(Nutch.ARG_BATCH);
> NutchIndexWriterFactory.addClassToConf(getConf(), SolrWriter.class);
> getConf().set(SolrConstants.SERVER_URL, solrUrl);
> currentJob = createIndexJob(getConf(), "solr-index", batchId);
> currentJob.waitForCompletion(true);
> ToolUtil.recordJobStatus(null, currentJob, results);
> return results;
> }
> But in Crawler.java, we did not put batchid in argMap:
> @Override
> public int run(String[] args) throws Exception {
> if (args.length == 0) {
> System.out.println("Usage: Crawler (<seedDir> | -continue) [-solr <solrURL>] [-threads n] [-depth i] [-topN N] [-numTasks N]");
> return -1;
> }
> ...
> Map<String,Object> argMap = ToolUtil.toArgMap(
> Nutch.ARG_THREADS, threads,
> Nutch.ARG_DEPTH, depth,
> Nutch.ARG_TOPN, topN,
> Nutch.ARG_SOLR, solrUrl,
> Nutch.ARG_SEEDDIR, seedDir,
> Nutch.ARG_NUMTASKS, numTasks);
> run(argMap);
> return 0;
> }
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)