You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@nutch.apache.org by "Markus Jelsma (Created) (JIRA)" <ji...@apache.org> on 2011/12/14 15:35:30 UTC

[jira] [Created] (NUTCH-1225) Migrate CrawlDBScanner to MapReduce API

Migrate CrawlDBScanner to MapReduce API
---------------------------------------

                 Key: NUTCH-1225
                 URL: https://issues.apache.org/jira/browse/NUTCH-1225
             Project: Nutch
          Issue Type: Sub-task
            Reporter: Markus Jelsma




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: [jira] [Created] (NUTCH-1225) Migrate CrawlDBScanner to MapReduce API

Posted by Markus Jelsma <ma...@openindex.io>.

I've already ported all our custom jobs (they use sequencefiles) and i ported 
the DomainStatistics tool (NUTCH-1221) but all jobs using mapfileoutputformat  
cannot be ported on 0.20.x.

It is indeed different in a consistent way but it is tedious (as you said 
earlier). I want to work on porting but also work on other things that still 
use the old api and use it on production. This is why i'd love to use 0.21 
because it allows easy migration.

On Thursday 15 December 2011 13:18:35 Andrzej Bialecki wrote:
> On 15/12/2011 13:13, Markus Jelsma wrote:
> > hmm, i don't see how i can use the old mapred MapOutputFormat API with
> > the new Job API. job.setOutputFormatClass(MapFileOutputFormat.class)
> > expects an the mapreduce.lib.output.MapFileOutputFormat class and won't
> > accept the old API.
> > 
> > setOutputFormatClass(java.lang.Class<? extends
> > org.apache.hadoop.mapreduce.OutputFormat>) in
> > org.apache.hadoop.mapreduce.Job cannot be applied to
> > (java.lang.Class<org.apache.hadoop.mapred.MapFileOutputFormat>)
> > 
> > In short, i don't know how i can migrate jobs to the new API on 0.20.x
> > without having MapFileOutputFormat present in the new API. Trying to set
> > to old mapoutputformat
> 
> Ah, no, that's now what I meant ... of course you need to change the
> code to use the new api, and the new code will look quite different :)
> my point was only that it is different in a consistent way, so after
> you've ported one or two classes the other ones are easy to convert, too...
> 
> I'm bogged with other work now, but I'll see if I can prepare an example
> later today...

-- 
Markus Jelsma - CTO - Openindex

Re: [jira] [Created] (NUTCH-1225) Migrate CrawlDBScanner to MapReduce API

Posted by Markus Jelsma <ma...@openindex.io>.

I've looked into it again. This is not going to work well when we stay in 
0.20.x. Holding on to 0.20x means doing migration partially now and again just 
before upgrading to 0.22+. This is a _lot_ of extra work!

I strongly prefer an intermediate upgrade to 0.21 where both API's are 
present.

Does anyone know how i can modify Ivy to use Apache's maven repo for the 
Hadoop dependencies? It keeps trying to load it from maven central where the 
0.21 pom is not present.

On Thursday 15 December 2011 13:13:45 Markus Jelsma wrote:
> hmm, i don't see how i can use the old mapred MapOutputFormat API with the
> new Job API. job.setOutputFormatClass(MapFileOutputFormat.class) expects
> an the mapreduce.lib.output.MapFileOutputFormat class and won't accept the
> old API.
> 
> setOutputFormatClass(java.lang.Class<? extends
> org.apache.hadoop.mapreduce.OutputFormat>) in
> org.apache.hadoop.mapreduce.Job cannot be applied to
> (java.lang.Class<org.apache.hadoop.mapred.MapFileOutputFormat>)
> 
> In short, i don't know how i can migrate jobs to the new API on 0.20.x
> without having MapFileOutputFormat present in the new API. Trying to set
> to old mapoutputformat
> 
> On Thursday 15 December 2011 08:55:38 Andrzej Bialecki wrote:
> > On 14/12/2011 19:14, Markus Jelsma wrote:
> > > Yes, the goal is to upgade to 0.22 or higher. The problem is that 0.22
> > > doesn't have the old mapred API so we can only upgrade to 0.22 is all
> > > jobs are ported.
> > > 
> > > I thought the entire mapred package was deprecated but it seems that
> > > class is not deprecated. It feels a bit strange though, this still
> > > means that if we port all jobs to the new API, we still have to move
> > > all imports for this class from mapred to mapreduce before we can
> > > compile with 0.22.
> > > 
> > > Ah well, it better than nothing.
> > 
> > IMHO upgrading to 0.21 as an interim solution is not helpful, it only
> > creates more work - as you noticed yourself 0.21 is a strange animal.
> > 
> > As I mentioned before, the API changes between 0.20 and 0.22 are such
> > that in most cases rote replacement is enough.
> > 
> > Also, we can always create a branch to do this upgrade, and then merge
> > it with trunk when it's ready.

-- 
Markus Jelsma - CTO - Openindex

Re: [jira] [Created] (NUTCH-1225) Migrate CrawlDBScanner to MapReduce API

Posted by Andrzej Bialecki <ab...@getopt.org>.

On 15/12/2011 13:13, Markus Jelsma wrote:
> hmm, i don't see how i can use the old mapred MapOutputFormat API with the new
> Job API. job.setOutputFormatClass(MapFileOutputFormat.class) expects an the
> mapreduce.lib.output.MapFileOutputFormat class and won't accept the old API.
>
> setOutputFormatClass(java.lang.Class<? extends
> org.apache.hadoop.mapreduce.OutputFormat>) in org.apache.hadoop.mapreduce.Job
> cannot be applied to
> (java.lang.Class<org.apache.hadoop.mapred.MapFileOutputFormat>)
>
> In short, i don't know how i can migrate jobs to the new API on 0.20.x without
> having MapFileOutputFormat present in the new API. Trying to set to old
> mapoutputformat

Ah, no, that's now what I meant ... of course you need to change the 
code to use the new api, and the new code will look quite different :) 
my point was only that it is different in a consistent way, so after 
you've ported one or two classes the other ones are easy to convert, too...

I'm bogged with other work now, but I'll see if I can prepare an example 
later today...

-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: [jira] [Created] (NUTCH-1225) Migrate CrawlDBScanner to MapReduce API

Posted by Markus Jelsma <ma...@openindex.io>.

hmm, i don't see how i can use the old mapred MapOutputFormat API with the new 
Job API. job.setOutputFormatClass(MapFileOutputFormat.class) expects an the 
mapreduce.lib.output.MapFileOutputFormat class and won't accept the old API.

setOutputFormatClass(java.lang.Class<? extends 
org.apache.hadoop.mapreduce.OutputFormat>) in org.apache.hadoop.mapreduce.Job 
cannot be applied to 
(java.lang.Class<org.apache.hadoop.mapred.MapFileOutputFormat>)

In short, i don't know how i can migrate jobs to the new API on 0.20.x without 
having MapFileOutputFormat present in the new API. Trying to set to old 
mapoutputformat 

On Thursday 15 December 2011 08:55:38 Andrzej Bialecki wrote:
> On 14/12/2011 19:14, Markus Jelsma wrote:
> > Yes, the goal is to upgade to 0.22 or higher. The problem is that 0.22
> > doesn't have the old mapred API so we can only upgrade to 0.22 is all
> > jobs are ported.
> > 
> > I thought the entire mapred package was deprecated but it seems that
> > class is not deprecated. It feels a bit strange though, this still means
> > that if we port all jobs to the new API, we still have to move all
> > imports for this class from mapred to mapreduce before we can compile
> > with 0.22.
> > 
> > Ah well, it better than nothing.
> 
> IMHO upgrading to 0.21 as an interim solution is not helpful, it only
> creates more work - as you noticed yourself 0.21 is a strange animal.
> 
> As I mentioned before, the API changes between 0.20 and 0.22 are such
> that in most cases rote replacement is enough.
> 
> Also, we can always create a branch to do this upgrade, and then merge
> it with trunk when it's ready.

-- 
Markus Jelsma - CTO - Openindex

Re: [jira] [Created] (NUTCH-1225) Migrate CrawlDBScanner to MapReduce API

Posted by Andrzej Bialecki <ab...@getopt.org>.

On 14/12/2011 19:14, Markus Jelsma wrote:
> Yes, the goal is to upgade to 0.22 or higher. The problem is that 0.22
> doesn't have the old mapred API so we can only upgrade to 0.22 is all
> jobs are ported.
>
> I thought the entire mapred package was deprecated but it seems that
> class is not deprecated. It feels a bit strange though, this still means
> that if we port all jobs to the new API, we still have to move all
> imports for this class from mapred to mapreduce before we can compile
> with 0.22.
>
> Ah well, it better than nothing.

IMHO upgrading to 0.21 as an interim solution is not helpful, it only 
creates more work - as you noticed yourself 0.21 is a strange animal.

As I mentioned before, the API changes between 0.20 and 0.22 are such 
that in most cases rote replacement is enough.

Also, we can always create a branch to do this upgrade, and then merge 
it with trunk when it's ready.

-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: [jira] [Created] (NUTCH-1225) Migrate CrawlDBScanner to MapReduce API

Posted by Markus Jelsma <ma...@openindex.io>.

Yes, the goal is to upgade to 0.22 or higher. The problem is that 0.22 doesn't 
have the old mapred API so we can only upgrade to 0.22 is all jobs are ported.

I thought the entire mapred package was deprecated but it seems that class is 
not deprecated. It feels a bit strange though, this still means that if we 
port all jobs to the new API, we still have to move all imports for this class 
from mapred to mapreduce before we can compile with 0.22.

Ah well, it better than nothing.

thanks

> On 14/12/2011 18:30, Markus Jelsma wrote:
> > proper link:
> > 
> > http://hadoop.apache.org/common/docs/r0.20.205.0/api/org/apache/hadoop/ma
> > preduce/lib/output/package-summary.html
> 
> I thought the goal was to upgrade to 0.22, where this class is present.
> In 0.20.205 org.apache.hadoop.mapred.MapFileOutputFormat still uses the
> old api, and it's not deprecated yet.

Re: [jira] [Created] (NUTCH-1225) Migrate CrawlDBScanner to MapReduce API

Posted by Andrzej Bialecki <ab...@getopt.org>.

On 14/12/2011 18:30, Markus Jelsma wrote:
> proper link:
>
> http://hadoop.apache.org/common/docs/r0.20.205.0/api/org/apache/hadoop/mapreduce/lib/output/package-summary.html

I thought the goal was to upgrade to 0.22, where this class is present. 
In 0.20.205 org.apache.hadoop.mapred.MapFileOutputFormat still uses the 
old api, and it's not deprecated yet.


-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: [jira] [Created] (NUTCH-1225) Migrate CrawlDBScanner to MapReduce API

Posted by Markus Jelsma <ma...@openindex.io>.

proper link:

http://hadoop.apache.org/common/docs/r0.20.205.0/api/org/apache/hadoop/mapreduce/lib/output/package-
summary.html

> Hi,
> 
> I get class not found exceptions. When browsing java api docs of various
> versions i see it missing in maprduce.lib.output until 0.21.
> 
> Missing in 0.20.X
> http://hadoop.apache.org/common/docs/r0.20.205.0/api/index.html
> 
> Back again in 0.21+
> http://hadoop.apache.org/mapreduce/docs/current/api/org/apache/hadoop/mapre
> duce/lib/output/package- summary.html
> 
> > On 14/12/2011 16:01, Markus Jelsma wrote:
> > > This is highly annoying, MapFileOutputFormat is not present in the
> > > MapReduce API until 0.21!
> > 
> > AFAIK that's not the case ... there is both an old api and a new api
> > implementation (the old one is deprecated). The new api is in
> > org.apache.hadoop.mapreduce.lib.output .

Re: [jira] [Created] (NUTCH-1225) Migrate CrawlDBScanner to MapReduce API

Posted by Markus Jelsma <ma...@openindex.io>.

Hi,

I get class not found exceptions. When browsing java api docs of various 
versions i see it missing in maprduce.lib.output until 0.21.

Missing in 0.20.X
http://hadoop.apache.org/common/docs/r0.20.205.0/api/index.html

Back again in 0.21+
http://hadoop.apache.org/mapreduce/docs/current/api/org/apache/hadoop/mapreduce/lib/output/package-
summary.html



> On 14/12/2011 16:01, Markus Jelsma wrote:
> > This is highly annoying, MapFileOutputFormat is not present in the
> > MapReduce API until 0.21!
> 
> AFAIK that's not the case ... there is both an old api and a new api
> implementation (the old one is deprecated). The new api is in
> org.apache.hadoop.mapreduce.lib.output .

Re: [jira] [Created] (NUTCH-1225) Migrate CrawlDBScanner to MapReduce API

Posted by Andrzej Bialecki <ab...@getopt.org>.

On 14/12/2011 16:01, Markus Jelsma wrote:
> This is highly annoying, MapFileOutputFormat is not present in the MapReduce
> API until 0.21!

AFAIK that's not the case ... there is both an old api and a new api 
implementation (the old one is deprecated). The new api is in 
org.apache.hadoop.mapreduce.lib.output .

-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: [jira] [Created] (NUTCH-1225) Migrate CrawlDBScanner to MapReduce API

Posted by Markus Jelsma <ma...@openindex.io>.

This is highly annoying, MapFileOutputFormat is not present in the MapReduce 
API until 0.21!

Any hints? Use from old API? Something?

On Wednesday 14 December 2011 15:35:30 Markus Jelsma (Created) (JIRA) wrote:
> Migrate CrawlDBScanner to MapReduce API
> ---------------------------------------
> 
>                  Key: NUTCH-1225
>                  URL: https://issues.apache.org/jira/browse/NUTCH-1225
>              Project: Nutch
>           Issue Type: Sub-task
>             Reporter: Markus Jelsma
> 
> 
> 
> 
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA
> administrators:
> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira

-- 
Markus Jelsma - CTO - Openindex

[jira] [Commented] (NUTCH-1225) Migrate CrawlDBScanner to MapReduce API

Posted by "Markus Jelsma (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/NUTCH-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13170313#comment-13170313 ] 

Markus Jelsma commented on NUTCH-1225:
--------------------------------------

I removed the Hadoop deps from Ivy and manually added Hadoop 0.21 jars to the lib directory. Next two other deps must be added to Ivy

{code}
<!-- need to compile webgraph -->
                <dependency org="commons-cli" name="commons-cli" rev="20040117.000000"
                        conf="*->default" />
<!-- avro -->
                <dependency org="org.apache.avro" name="avro" rev="1.6.1"
                        conf="*->default" />
{code}

                
> Migrate CrawlDBScanner to MapReduce API
> ---------------------------------------
>
>                 Key: NUTCH-1225
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1225
>             Project: Nutch
>          Issue Type: Sub-task
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.5
>
>         Attachments: NUTCH-1225-1.5-1.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (NUTCH-1225) Migrate CrawlDBScanner to MapReduce API

Posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/NUTCH-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Jelsma updated NUTCH-1225:
---------------------------------

    Patch Info: Patch Available
      Assignee: Markus Jelsma
    
> Migrate CrawlDBScanner to MapReduce API
> ---------------------------------------
>
>                 Key: NUTCH-1225
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1225
>             Project: Nutch
>          Issue Type: Sub-task
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.5
>
>         Attachments: NUTCH-1225-1.5-1.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (NUTCH-1225) Migrate CrawlDBScanner to MapReduce API

Posted by "Markus Jelsma (Resolved) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/NUTCH-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Jelsma resolved NUTCH-1225.
----------------------------------

       Resolution: Won't Fix
    Fix Version/s:     (was: 1.5)
         Assignee:     (was: Markus Jelsma)

CrawlDBScanner tool is deprecated in favor of the CrawlDBReader tool.
                
> Migrate CrawlDBScanner to MapReduce API
> ---------------------------------------
>
>                 Key: NUTCH-1225
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1225
>             Project: Nutch
>          Issue Type: Sub-task
>            Reporter: Markus Jelsma
>         Attachments: NUTCH-1225-1.5-1.patch, NUTCH-1225-1.5-2.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (NUTCH-1225) Migrate CrawlDBScanner to MapReduce API

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/NUTCH-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172911#comment-13172911 ] 

Hudson commented on NUTCH-1225:
-------------------------------

Integrated in Nutch-trunk #1698 (See [https://builds.apache.org/job/Nutch-trunk/1698/])
    NUTCH-1225 Migrate CrawlDBScanner to MapReduce API

markus : http://svn.apache.org/viewvc/nutch/trunk/viewvc/?view=rev&root=&revision=1220788
Files : 
* /nutch/trunk/CHANGES.txt
* /nutch/trunk/src/java/org/apache/nutch/tools/CrawlDBScanner.java

                
> Migrate CrawlDBScanner to MapReduce API
> ---------------------------------------
>
>                 Key: NUTCH-1225
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1225
>             Project: Nutch
>          Issue Type: Sub-task
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.5
>
>         Attachments: NUTCH-1225-1.5-1.patch, NUTCH-1225-1.5-2.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (NUTCH-1225) Migrate CrawlDBScanner to MapReduce API

Posted by "Markus Jelsma (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/NUTCH-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13176164#comment-13176164 ] 

Markus Jelsma commented on NUTCH-1225:
--------------------------------------

Old mapred version restored per rev. 1224905.
                
> Migrate CrawlDBScanner to MapReduce API
> ---------------------------------------
>
>                 Key: NUTCH-1225
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1225
>             Project: Nutch
>          Issue Type: Sub-task
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.5
>
>         Attachments: NUTCH-1225-1.5-1.patch, NUTCH-1225-1.5-2.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (NUTCH-1225) Migrate CrawlDBScanner to MapReduce API

Posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/NUTCH-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Jelsma updated NUTCH-1225:
---------------------------------

    Attachment: NUTCH-1225-1.5-2.patch

New patch uses proper value iteration in reducer.

Old API:

{code}
    public void reduce(Text key, Iterator<CrawlDatum> values, Context context) throws IOException, InterruptedException {
      while (values.hasNext()) {
        CrawlDatum val = values.next();
        context.write(key, val);
      }
    }
{code}

New API:

{code}
    public void reduce(Text key, Iterable<CrawlDatum> values, Context context) throws IOException, InterruptedException {
      for (CrawlDatum val : values) {
        context.write(key, val);
      }
    }
{code}
                
> Migrate CrawlDBScanner to MapReduce API
> ---------------------------------------
>
>                 Key: NUTCH-1225
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1225
>             Project: Nutch
>          Issue Type: Sub-task
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.5
>
>         Attachments: NUTCH-1225-1.5-1.patch, NUTCH-1225-1.5-2.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Reopened] (NUTCH-1225) Migrate CrawlDBScanner to MapReduce API

Posted by "Markus Jelsma (Reopened) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/NUTCH-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Jelsma reopened NUTCH-1225:
----------------------------------


Reopening issue because we might need to downgrade back to 0.20.205. That means MapFileOutputFormat still cannot be used in the new API.

Makes me wonder why one would ever use the new API in 0.20.x versions when that crucial format is not available.
                
> Migrate CrawlDBScanner to MapReduce API
> ---------------------------------------
>
>                 Key: NUTCH-1225
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1225
>             Project: Nutch
>          Issue Type: Sub-task
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.5
>
>         Attachments: NUTCH-1225-1.5-1.patch, NUTCH-1225-1.5-2.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (NUTCH-1225) Migrate CrawlDBScanner to MapReduce API

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/NUTCH-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172364#comment-13172364 ] 

Hudson commented on NUTCH-1225:
-------------------------------

Integrated in nutch-trunk-maven #67 (See [https://builds.apache.org/job/nutch-trunk-maven/67/])
    NUTCH-1225 Migrate CrawlDBScanner to MapReduce API

markus : http://svn.apache.org/viewvc/nutch/trunk/viewvc/?view=rev&root=&revision=1220788
Files : 
* /nutch/trunk/CHANGES.txt
* /nutch/trunk/src/java/org/apache/nutch/tools/CrawlDBScanner.java

                
> Migrate CrawlDBScanner to MapReduce API
> ---------------------------------------
>
>                 Key: NUTCH-1225
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1225
>             Project: Nutch
>          Issue Type: Sub-task
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.5
>
>         Attachments: NUTCH-1225-1.5-1.patch, NUTCH-1225-1.5-2.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (NUTCH-1225) Migrate CrawlDBScanner to MapReduce API

Posted by "Markus Jelsma (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/NUTCH-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172209#comment-13172209 ] 

Markus Jelsma commented on NUTCH-1225:
--------------------------------------

I'll commit shortly if there are no objections
                
> Migrate CrawlDBScanner to MapReduce API
> ---------------------------------------
>
>                 Key: NUTCH-1225
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1225
>             Project: Nutch
>          Issue Type: Sub-task
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.5
>
>         Attachments: NUTCH-1225-1.5-1.patch, NUTCH-1225-1.5-2.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (NUTCH-1225) Migrate CrawlDBScanner to MapReduce API

Posted by "Markus Jelsma (Resolved) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/NUTCH-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Jelsma resolved NUTCH-1225.
----------------------------------

    Resolution: Fixed

Committed for 1.5 in rev. 1220788.
                
> Migrate CrawlDBScanner to MapReduce API
> ---------------------------------------
>
>                 Key: NUTCH-1225
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1225
>             Project: Nutch
>          Issue Type: Sub-task
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.5
>
>         Attachments: NUTCH-1225-1.5-1.patch, NUTCH-1225-1.5-2.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (NUTCH-1225) Migrate CrawlDBScanner to MapReduce API

Posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/NUTCH-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Jelsma updated NUTCH-1225:
---------------------------------

    Attachment: NUTCH-1225-1.5-1.patch

Patch for 1.5. This is only compatible with Hadoop 0.21 or higher!
                
> Migrate CrawlDBScanner to MapReduce API
> ---------------------------------------
>
>                 Key: NUTCH-1225
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1225
>             Project: Nutch
>          Issue Type: Sub-task
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.5
>
>         Attachments: NUTCH-1225-1.5-1.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira