You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@nutch.apache.org by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/07/01 02:44:45 UTC

[jira] Commented: (NUTCH-634) Patch - Nutch - Hadoop 0.17.0

    [ https://issues.apache.org/jira/browse/NUTCH-634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12609415#action_12609415 ] 

Andrzej Bialecki  commented on NUTCH-634:
-----------------------------------------

I ran a test crawl using Hadoop 0.17.1 release, after applying the portions of this patch without the OutputFormat and setting the property as above. The crawl succeeded with no problems.

If there are no further objections, I'd like to commit this patch with these changes within a day or two.

> Patch - Nutch - Hadoop 0.17.0
> -----------------------------
>
>                 Key: NUTCH-634
>                 URL: https://issues.apache.org/jira/browse/NUTCH-634
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 0.9.0
>            Reporter: Michael Gottesman
>            Assignee: Andrzej Bialecki 
>             Fix For: 0.9.0
>
>         Attachments: diff, hadoop-0.17.patch, hadoop-0.17.patch
>
>
> This is a patch so that Nutch can be used with Hadoop 0.17.0. The patch is located at http://pastie.org/212001
> The patch compiles and passes all current Nutch unit tests.
> I have tested that the crawler side of Nutch (i.e. inject, generate, fetch, parse, merge w/crawldb) definetly works, but have not tested the lucene indexing part. It might work, but it might not. 
> *NOTE* - the two main bugs that had to be overcome were not noticed by any of the unit tests. The bugs only came up during actual testing. The bugs were:
> 1. Changes to the Hadoop Iterator
> 2. Addition of Serialization to MapReduce Framework

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.