You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@nutch.apache.org by "FeiTian (JIRA)" <ji...@apache.org> on 2014/12/10 03:33:12 UTC

[jira] [Created] (NUTCH-1895) run() method in Crawler.java doesnt put Nutch.ARG_BATCH in argMap

FeiTian created NUTCH-1895:
------------------------------

             Summary: run() method in Crawler.java doesnt put Nutch.ARG_BATCH in argMap
                 Key: NUTCH-1895
                 URL: https://issues.apache.org/jira/browse/NUTCH-1895
             Project: Nutch
          Issue Type: Bug
          Components: crawldb, indexer
    Affects Versions: 2.2.1
         Environment: Win7, Solr4.10.1
            Reporter: FeiTian


I am using Nutch 2.2.1 and Solr 4.10.1.
OS: Win7.
Env: MyEclipse 10.
JAVA: jdk1.7.0_71
I am using command:
  urls -depth 3 -topN 10 -solr http://localhost:8080/solr/collection2
to import data to Solr.
and using:
  gora.sqlstore.jdbc.driver=com.mysql.jdbc.Driver
  gora.sqlstore.jdbc.url=jdbc:mysql://192.168.0.69:3306/nutch?createDatabaseIfNotExist=true&useUnicode=true&characterEncoding=utf8&autoReconnect=true&zeroDateTimeBehavior=convertToNull
  gora.sqlstore.jdbc.user=root
  gora.sqlstore.jdbc.password=123456
to import data to mysql.

But I got null pointer exception on batchId, then I found:

In SolrIndexerJob.java, we need to get batchId from args:

  @Override
  public Map<String,Object> run(Map<String,Object> args) throws Exception {
    String solrUrl = (String)args.get(Nutch.ARG_SOLR);
    String batchId = (String)args.get(Nutch.ARG_BATCH);
    NutchIndexWriterFactory.addClassToConf(getConf(), SolrWriter.class);
    getConf().set(SolrConstants.SERVER_URL, solrUrl);

    currentJob = createIndexJob(getConf(), "solr-index", batchId);

    currentJob.waitForCompletion(true);
    ToolUtil.recordJobStatus(null, currentJob, results);
    return results;
  }

But in Crawler.java, we did not put batchid in argMap:

 @Override
  public int run(String[] args) throws Exception {
    if (args.length == 0) {
      System.out.println("Usage: Crawler (<seedDir> | -continue) [-solr <solrURL>] [-threads n] [-depth i] [-topN N] [-numTasks N]");
      return -1;
    }

...

    Map<String,Object> argMap = ToolUtil.toArgMap(
        Nutch.ARG_THREADS, threads,
        Nutch.ARG_DEPTH, depth,
        Nutch.ARG_TOPN, topN,
        Nutch.ARG_SOLR, solrUrl,
        Nutch.ARG_SEEDDIR, seedDir,
        Nutch.ARG_NUMTASKS, numTasks);
    run(argMap);
    return 0;
  }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)