You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/10/21 05:38:27 UTC
[jira] [Commented] (NUTCH-2148) Review and update mapred -->
mapreduce config params in crawl script
[ https://issues.apache.org/jira/browse/NUTCH-2148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966183#comment-14966183 ]
Lewis John McGibbney commented on NUTCH-2148:
---------------------------------------------
Seeing as the crawl script for 2.3.1 needs to be sorted we can deal with this for 2.3.1 as well.
> Review and update mapred --> mapreduce config params in crawl script
> --------------------------------------------------------------------
>
> Key: NUTCH-2148
> URL: https://issues.apache.org/jira/browse/NUTCH-2148
> Project: Nutch
> Issue Type: New Feature
> Components: bin
> Affects Versions: 1.10, 2.3.1
> Reporter: Lewis John McGibbney
> Assignee: Lewis John McGibbney
> Fix For: 1.11, 2.3.1
>
> Attachments: NUTCH-2148.patch
>
>
> Configuration parameters inside of $NUTCH_HOME/src/bin/crawl currently include
> {code}
> commonOptions="-D mapred.reduce.tasks=$numTasks -D mapred.child.java.opts=-Xmx1000m -D mapred.reduce.tasks.speculative.execution=false -D mapred.map.tasks.speculative.execution=false -D mapred.compress.map.output=true"
> {code}
> as well as
> {code}
> skipRecordsOptions="-D mapred.skip.attempts.to.start.skipping=2 -D mapred.skip.map.max.skip.records=1"
> __bin_nutch parse $commonOptions $skipRecordsOptions "$CRAWL_PATH"/segments/$SEGMENT
> {code}
> In all honesty as part of the upgrade to Hadoop 2.4.0, this should have been addressed!!! woops.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)