You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Markus Jelsma (Created) (JIRA)" <ji...@apache.org> on 2011/12/13 17:47:30 UTC

[jira] [Created] (NUTCH-1219) Upgrade all jobs to new MapReduce API

Upgrade all jobs to new MapReduce API
-------------------------------------

                 Key: NUTCH-1219
                 URL: https://issues.apache.org/jira/browse/NUTCH-1219
             Project: Nutch
          Issue Type: Task
            Reporter: Markus Jelsma
            Priority: Critical
             Fix For: 1.5


We should upgrade to the new Hadoop API for Nutch trunk as already has been done for the Nutchgora branch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (NUTCH-1219) Upgrade all jobs to new MapReduce API

Posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Jelsma updated NUTCH-1219:
---------------------------------

    Description: 
We should upgrade to the new Hadoop API for Nutch trunk as already has been done for the Nutchgora branch. If i'm not mistaken we can already upgrade to the latest 0.20.5 version that still carries the legacy API so we can, without immediately upgrading to 0.21 or higher, port the jobs to the new API without having the need for a separate branch to work on.

To the committers who created/ported jobs in NutchGora, please write down your advice and experience.

  was:We should upgrade to the new Hadoop API for Nutch trunk as already has been done for the Nutchgora branch.

    
> Upgrade all jobs to new MapReduce API
> -------------------------------------
>
>                 Key: NUTCH-1219
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1219
>             Project: Nutch
>          Issue Type: Task
>            Reporter: Markus Jelsma
>            Priority: Critical
>             Fix For: 1.5
>
>
> We should upgrade to the new Hadoop API for Nutch trunk as already has been done for the Nutchgora branch. If i'm not mistaken we can already upgrade to the latest 0.20.5 version that still carries the legacy API so we can, without immediately upgrading to 0.21 or higher, port the jobs to the new API without having the need for a separate branch to work on.
> To the committers who created/ported jobs in NutchGora, please write down your advice and experience.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (NUTCH-1219) Upgrade all jobs to new MapReduce API

Posted by "Markus Jelsma (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13169308#comment-13169308 ] 

Markus Jelsma commented on NUTCH-1219:
--------------------------------------

Keep in mind that does not work:

{code}
    Configuration conf = getConf();
    Job job = new Job(conf, jobName);
    job.setJarByClass(DomainStatistics.class);
    conf.setInt("domain.statistics.mode", mode);
    conf.setBoolean("mapreduce.fileoutputcommitter.marksuccessfuljobs", false);
{code}

but this does:

{code}
    Configuration conf = getConf();
    conf.setInt("domain.statistics.mode", mode);
    conf.setBoolean("mapreduce.fileoutputcommitter.marksuccessfuljobs", false);
    Job job = new Job(conf, jobName);
    job.setJarByClass(DomainStatistics.class);
{code}

It is easily overlooked with default settings!!
                
> Upgrade all jobs to new MapReduce API
> -------------------------------------
>
>                 Key: NUTCH-1219
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1219
>             Project: Nutch
>          Issue Type: Task
>            Reporter: Markus Jelsma
>            Priority: Critical
>             Fix For: 1.5
>
>
> We should upgrade to the new Hadoop API for Nutch trunk as already has been done for the Nutchgora branch. If i'm not mistaken we can already upgrade to the latest 0.20.5 version that still carries the legacy API so we can, without immediately upgrading to 0.21 or higher, port the jobs to the new API without having the need for a separate branch to work on.
> To the committers who created/ported jobs in NutchGora, please write down your advice and experience.
> http://www.slideshare.net/sh1mmer/upgrading-to-the-new-map-reduce-api

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (NUTCH-1219) Upgrade all jobs to new MapReduce API

Posted by "Markus Jelsma (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13169271#comment-13169271 ] 

Markus Jelsma commented on NUTCH-1219:
--------------------------------------

There's an issue here. Right now NutchJobs are created, but they extend JobConf which doesn't exist. Right now i use ignore NutchJob, it carries no impls.
                
> Upgrade all jobs to new MapReduce API
> -------------------------------------
>
>                 Key: NUTCH-1219
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1219
>             Project: Nutch
>          Issue Type: Task
>            Reporter: Markus Jelsma
>            Priority: Critical
>             Fix For: 1.5
>
>
> We should upgrade to the new Hadoop API for Nutch trunk as already has been done for the Nutchgora branch. If i'm not mistaken we can already upgrade to the latest 0.20.5 version that still carries the legacy API so we can, without immediately upgrading to 0.21 or higher, port the jobs to the new API without having the need for a separate branch to work on.
> To the committers who created/ported jobs in NutchGora, please write down your advice and experience.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (NUTCH-1219) Upgrade all jobs to new MapReduce API

Posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Jelsma updated NUTCH-1219:
---------------------------------

    Description: 
We should upgrade to the new Hadoop API for Nutch trunk as already has been done for the Nutchgora branch. If i'm not mistaken we can already upgrade to the latest 0.20.5 version that still carries the legacy API so we can, without immediately upgrading to 0.21 or higher, port the jobs to the new API without having the need for a separate branch to work on.

To the committers who created/ported jobs in NutchGora, please write down your advice and experience.

http://www.slideshare.net/sh1mmer/upgrading-to-the-new-map-reduce-api

  was:
We should upgrade to the new Hadoop API for Nutch trunk as already has been done for the Nutchgora branch. If i'm not mistaken we can already upgrade to the latest 0.20.5 version that still carries the legacy API so we can, without immediately upgrading to 0.21 or higher, port the jobs to the new API without having the need for a separate branch to work on.

To the committers who created/ported jobs in NutchGora, please write down your advice and experience.

    
> Upgrade all jobs to new MapReduce API
> -------------------------------------
>
>                 Key: NUTCH-1219
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1219
>             Project: Nutch
>          Issue Type: Task
>            Reporter: Markus Jelsma
>            Priority: Critical
>             Fix For: 1.5
>
>
> We should upgrade to the new Hadoop API for Nutch trunk as already has been done for the Nutchgora branch. If i'm not mistaken we can already upgrade to the latest 0.20.5 version that still carries the legacy API so we can, without immediately upgrading to 0.21 or higher, port the jobs to the new API without having the need for a separate branch to work on.
> To the committers who created/ported jobs in NutchGora, please write down your advice and experience.
> http://www.slideshare.net/sh1mmer/upgrading-to-the-new-map-reduce-api

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (NUTCH-1219) Upgrade all jobs to new MapReduce API

Posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Jelsma updated NUTCH-1219:
---------------------------------

    Fix Version/s:     (was: 1.5)
                   1.6

20120304-push-1.6
                
> Upgrade all jobs to new MapReduce API
> -------------------------------------
>
>                 Key: NUTCH-1219
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1219
>             Project: Nutch
>          Issue Type: Task
>            Reporter: Markus Jelsma
>            Priority: Critical
>             Fix For: 1.6
>
>
> We should upgrade to the new Hadoop API for Nutch trunk as already has been done for the Nutchgora branch. If i'm not mistaken we can already upgrade to the latest 0.20.5 version that still carries the legacy API so we can, without immediately upgrading to 0.21 or higher, port the jobs to the new API without having the need for a separate branch to work on.
> To the committers who created/ported jobs in NutchGora, please write down your advice and experience.
> http://www.slideshare.net/sh1mmer/upgrading-to-the-new-map-reduce-api

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira