You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Jessica Glover <gl...@gmail.com> on 2015/06/17 16:58:59 UTC

2.3 REST API and batchId

I'm having trouble understanding the concept of a batch and which elements
of the crawl cycle require a batchId.

I've found that I need to specify a batch ID when I run a generate job, but
a batchId is not required for the fetch job to finish. But then my parse
job fails with:

ERROR impl.JobWorker - Cannot run job worker!
java.lang.NullPointerException
at org.apache.nutch.parse.ParserJob.getBatchIdFilter(ParserJob.java:268)
at org.apache.nutch.parse.ParserJob.run(ParserJob.java:256)
at org.apache.nutch.api.impl.JobWorker.run(JobWorker.java:64)
at ...

so I assume it's because I needed a batchId. But what exactly is a batch?

Thanks,
Jessica