You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Markus Jelsma (Created) (JIRA)" <ji...@apache.org> on 2011/12/13 17:47:30 UTC
[jira] [Created] (NUTCH-1219) Upgrade all jobs to new MapReduce API
Upgrade all jobs to new MapReduce API
-------------------------------------
Key: NUTCH-1219
URL: https://issues.apache.org/jira/browse/NUTCH-1219
Project: Nutch
Issue Type: Task
Reporter: Markus Jelsma
Priority: Critical
Fix For: 1.5
We should upgrade to the new Hadoop API for Nutch trunk as already has been done for the Nutchgora branch.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1219) Upgrade all jobs to new MapReduce API
Posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Markus Jelsma updated NUTCH-1219:
---------------------------------
Description:
We should upgrade to the new Hadoop API for Nutch trunk as already has been done for the Nutchgora branch. If i'm not mistaken we can already upgrade to the latest 0.20.5 version that still carries the legacy API so we can, without immediately upgrading to 0.21 or higher, port the jobs to the new API without having the need for a separate branch to work on.
To the committers who created/ported jobs in NutchGora, please write down your advice and experience.
was:We should upgrade to the new Hadoop API for Nutch trunk as already has been done for the Nutchgora branch.
> Upgrade all jobs to new MapReduce API
> -------------------------------------
>
> Key: NUTCH-1219
> URL: https://issues.apache.org/jira/browse/NUTCH-1219
> Project: Nutch
> Issue Type: Task
> Reporter: Markus Jelsma
> Priority: Critical
> Fix For: 1.5
>
>
> We should upgrade to the new Hadoop API for Nutch trunk as already has been done for the Nutchgora branch. If i'm not mistaken we can already upgrade to the latest 0.20.5 version that still carries the legacy API so we can, without immediately upgrading to 0.21 or higher, port the jobs to the new API without having the need for a separate branch to work on.
> To the committers who created/ported jobs in NutchGora, please write down your advice and experience.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1219) Upgrade all jobs to new MapReduce
API
Posted by "Markus Jelsma (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13169308#comment-13169308 ]
Markus Jelsma commented on NUTCH-1219:
--------------------------------------
Keep in mind that does not work:
{code}
Configuration conf = getConf();
Job job = new Job(conf, jobName);
job.setJarByClass(DomainStatistics.class);
conf.setInt("domain.statistics.mode", mode);
conf.setBoolean("mapreduce.fileoutputcommitter.marksuccessfuljobs", false);
{code}
but this does:
{code}
Configuration conf = getConf();
conf.setInt("domain.statistics.mode", mode);
conf.setBoolean("mapreduce.fileoutputcommitter.marksuccessfuljobs", false);
Job job = new Job(conf, jobName);
job.setJarByClass(DomainStatistics.class);
{code}
It is easily overlooked with default settings!!
> Upgrade all jobs to new MapReduce API
> -------------------------------------
>
> Key: NUTCH-1219
> URL: https://issues.apache.org/jira/browse/NUTCH-1219
> Project: Nutch
> Issue Type: Task
> Reporter: Markus Jelsma
> Priority: Critical
> Fix For: 1.5
>
>
> We should upgrade to the new Hadoop API for Nutch trunk as already has been done for the Nutchgora branch. If i'm not mistaken we can already upgrade to the latest 0.20.5 version that still carries the legacy API so we can, without immediately upgrading to 0.21 or higher, port the jobs to the new API without having the need for a separate branch to work on.
> To the committers who created/ported jobs in NutchGora, please write down your advice and experience.
> http://www.slideshare.net/sh1mmer/upgrading-to-the-new-map-reduce-api
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1219) Upgrade all jobs to new MapReduce
API
Posted by "Markus Jelsma (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13169271#comment-13169271 ]
Markus Jelsma commented on NUTCH-1219:
--------------------------------------
There's an issue here. Right now NutchJobs are created, but they extend JobConf which doesn't exist. Right now i use ignore NutchJob, it carries no impls.
> Upgrade all jobs to new MapReduce API
> -------------------------------------
>
> Key: NUTCH-1219
> URL: https://issues.apache.org/jira/browse/NUTCH-1219
> Project: Nutch
> Issue Type: Task
> Reporter: Markus Jelsma
> Priority: Critical
> Fix For: 1.5
>
>
> We should upgrade to the new Hadoop API for Nutch trunk as already has been done for the Nutchgora branch. If i'm not mistaken we can already upgrade to the latest 0.20.5 version that still carries the legacy API so we can, without immediately upgrading to 0.21 or higher, port the jobs to the new API without having the need for a separate branch to work on.
> To the committers who created/ported jobs in NutchGora, please write down your advice and experience.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1219) Upgrade all jobs to new MapReduce API
Posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Markus Jelsma updated NUTCH-1219:
---------------------------------
Description:
We should upgrade to the new Hadoop API for Nutch trunk as already has been done for the Nutchgora branch. If i'm not mistaken we can already upgrade to the latest 0.20.5 version that still carries the legacy API so we can, without immediately upgrading to 0.21 or higher, port the jobs to the new API without having the need for a separate branch to work on.
To the committers who created/ported jobs in NutchGora, please write down your advice and experience.
http://www.slideshare.net/sh1mmer/upgrading-to-the-new-map-reduce-api
was:
We should upgrade to the new Hadoop API for Nutch trunk as already has been done for the Nutchgora branch. If i'm not mistaken we can already upgrade to the latest 0.20.5 version that still carries the legacy API so we can, without immediately upgrading to 0.21 or higher, port the jobs to the new API without having the need for a separate branch to work on.
To the committers who created/ported jobs in NutchGora, please write down your advice and experience.
> Upgrade all jobs to new MapReduce API
> -------------------------------------
>
> Key: NUTCH-1219
> URL: https://issues.apache.org/jira/browse/NUTCH-1219
> Project: Nutch
> Issue Type: Task
> Reporter: Markus Jelsma
> Priority: Critical
> Fix For: 1.5
>
>
> We should upgrade to the new Hadoop API for Nutch trunk as already has been done for the Nutchgora branch. If i'm not mistaken we can already upgrade to the latest 0.20.5 version that still carries the legacy API so we can, without immediately upgrading to 0.21 or higher, port the jobs to the new API without having the need for a separate branch to work on.
> To the committers who created/ported jobs in NutchGora, please write down your advice and experience.
> http://www.slideshare.net/sh1mmer/upgrading-to-the-new-map-reduce-api
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1219) Upgrade all jobs to new MapReduce API
Posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Markus Jelsma updated NUTCH-1219:
---------------------------------
Fix Version/s: (was: 1.5)
1.6
20120304-push-1.6
> Upgrade all jobs to new MapReduce API
> -------------------------------------
>
> Key: NUTCH-1219
> URL: https://issues.apache.org/jira/browse/NUTCH-1219
> Project: Nutch
> Issue Type: Task
> Reporter: Markus Jelsma
> Priority: Critical
> Fix For: 1.6
>
>
> We should upgrade to the new Hadoop API for Nutch trunk as already has been done for the Nutchgora branch. If i'm not mistaken we can already upgrade to the latest 0.20.5 version that still carries the legacy API so we can, without immediately upgrading to 0.21 or higher, port the jobs to the new API without having the need for a separate branch to work on.
> To the committers who created/ported jobs in NutchGora, please write down your advice and experience.
> http://www.slideshare.net/sh1mmer/upgrading-to-the-new-map-reduce-api
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira