You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/09/15 00:02:45 UTC

[jira] [Commented] (NUTCH-2097) Proposal for Nutch 3.x

    [ https://issues.apache.org/jira/browse/NUTCH-2097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14744364#comment-14744364 ] 

Lewis John McGibbney commented on NUTCH-2097:
---------------------------------------------

Hi Folks,
After being hooked up via [~chrismattmann], I've just spoken with [~ndouba] on Skype. This is really exciting work so I asked him to please log a Jira issue as a parent issue (which he has done) and we can begin thinking about a Nutch 3.X branch.
The core work undertaken by Nadeem so far can be summarized as follows
 * Complete Ant + Ivy build system overhaul. (Non-back compatible)
 * Upgrade of all mapred- --> mapreduce API's in Nutch (Non-back compatible)
 * Complete refactoring of all IO (custom NutchWritable’s) into separate [IO package|https://github.com/allfro/nutch/tree/mr2-mvn/nutch-runtime/src/main/java/org/apache/nutch/io]
 * Complete refactoring of all Mapper functions into separate [mapper package|https://github.com/allfro/nutch/tree/mr2-mvn/nutch-runtime/src/main/java/org/apache/nutch/mapper]
Complete refactoring of all Reducer functions into separate [reducer package|https://github.com/allfro/nutch/tree/mr2-mvn/nutch-runtime/src/main/java/org/apache/nutch/reducer]
 * Introduction of [lib package|https://github.com/allfro/nutch/tree/mr2-mvn/nutch-runtime/src/main/java/org/apache/nutch/lib] which contains all input and output formats.
 * Upgrade of Hadoop dependencies from 2.4.0 --> 2.7.1

The above package naming conventions of course are intended to provide synergy with Apache Hadoop.

My thoughts are a follows: The work which has gone on in Nadeem's mr2-mvn branch are too wide and cover too much of the Nutch 1.11-SNAPSHOT (as of commit r1697466 NUTCH-2049 Upgrade Trunk to Hadoop > 2.4 stable) code base for us to back port them into Nutch trunk (1.11-SNAPSHOT). Both Nadeem and myself therefore discussed and proposed that we forward port all commits (post commit r1697466) to Nadeem's branch and propose this codebase as Nutch 3.X which will lessen the burden on everyone. The burden can be defined as defining a patch for each tools, each issue, and each change. That would be hellish. The former way as described above is a better solution.

This issues should act as a parent for defining Nutch 3.X based off of Nutch 1.11-SNAPSHOT.

> Proposal for Nutch 3.x
> ----------------------
>
>                 Key: NUTCH-2097
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2097
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.12
>            Reporter: Nadeem Douba
>            Assignee: Lewis John McGibbney
>
> This is a parent issue which contains a proposal for Nutch 3.x. It's based on my branch (mr2-mvn at https://github.com/allfro/nutch).  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)