You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/03/26 19:51:16 UTC
[jira] [Commented] (NUTCH-1545) capture batchId and remove
references to segments in 2.x crawl script.
[ https://issues.apache.org/jira/browse/NUTCH-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13614425#comment-13614425 ]
Lewis John McGibbney commented on NUTCH-1545:
---------------------------------------------
There are problems here.
Firstly we do not maintain the concept of crawldb locally.
We also generate batchId's randomly within the GeneratorJob as follows
{code}
batchId = (curTime / 1000) + "-" + randomSeed;
{code}
We need to capture this value within the crawl script and utilise it in fetching, parsing, etc.
> capture batchId and remove references to segments in 2.x crawl script.
> ----------------------------------------------------------------------
>
> Key: NUTCH-1545
> URL: https://issues.apache.org/jira/browse/NUTCH-1545
> Project: Nutch
> Issue Type: Task
> Affects Versions: 2.1
> Reporter: Lewis John McGibbney
> Priority: Minor
> Fix For: 2.2
>
> Attachments: NUTCH-1545.patch
>
>
> The concept of segment is replaced by batchId in 2.x
> I'm currently getting rid of segments references in 2.x
> This issue was flagged up and separate from NUTCH-1532 which I am working on.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira