You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Doug Cutting (JIRA)" <ji...@apache.org> on 2006/01/31 18:53:32 UTC

[jira] Created: (NUTCH-193) move NDFS and MapReduce to a separate project

move NDFS and MapReduce to a separate project
---------------------------------------------

         Key: NUTCH-193
         URL: http://issues.apache.org/jira/browse/NUTCH-193
     Project: Nutch
        Type: Task
  Components: ndfs  
    Versions: 0.8-dev    
    Reporter: Doug Cutting
 Assigned to: Doug Cutting 
     Fix For: 0.8-dev


The NDFS and MapReduce code should move from Nutch to a new Lucene sub-project named Hadoop.

My plan is to do this as follows:

1. Move all code in the following packages from Nutch to Hadoop:

org.apache.nutch.fs
org.apache.nutch.io
org.apache.nutch.ipc
org.apache.nutch.mapred
org.apache.nutch.ndfs

These packages will all be renamed to org.apache.hadoop, and Nutch code will be updated to reflect this.

2. Move selected classes from Nutch to Hadoop, as follows:

org.apache.nutch.util.NutchConf -> org.apache.hadoop.conf.Configuration
org.apache.nutch.util.NutchConfigurable -> org.apache.hadoop.Configurable 
org.apache.nutch.util.NutchConfigured -> org.apache.hadoop.Configured

org.apache.nutch.util.Progress -> org.apache.hadoop.util.Progress
org.apache.nutch.util.LogFormatter-> org.apache.hadoop.util.LogFormatter
org.apache.nutch.util.Daemon -> org.apache.hadoop.util.Daemon

3. Add a jar containing all of the above the Nutch's lib directory.

Does this plan sound reasonable?


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (NUTCH-193) move NDFS and MapReduce to a separate project

Posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/NUTCH-193?page=comments#action_12364672 ] 

Andrzej Bialecki  commented on NUTCH-193:
-----------------------------------------

Ok, the sooner the better from my POV. I didn;t have anything in mind that would be included in Hadoop, rather Nutch patches that I'm working on. Affected patches include some of the recent larger ones: the adaptive fetch schedule thing and crawl metadata. No big deal, but we need to know what to shoot for.

> move NDFS and MapReduce to a separate project
> ---------------------------------------------
>
>          Key: NUTCH-193
>          URL: http://issues.apache.org/jira/browse/NUTCH-193
>      Project: Nutch
>         Type: Task
>   Components: ndfs
>     Versions: 0.8-dev
>     Reporter: Doug Cutting
>     Assignee: Doug Cutting
>      Fix For: 0.8-dev

>
> The NDFS and MapReduce code should move from Nutch to a new Lucene sub-project named Hadoop.
> My plan is to do this as follows:
> 1. Move all code in the following packages from Nutch to Hadoop:
> org.apache.nutch.fs
> org.apache.nutch.io
> org.apache.nutch.ipc
> org.apache.nutch.mapred
> org.apache.nutch.ndfs
> These packages will all be renamed to org.apache.hadoop, and Nutch code will be updated to reflect this.
> 2. Move selected classes from Nutch to Hadoop, as follows:
> org.apache.nutch.util.NutchConf -> org.apache.hadoop.conf.Configuration
> org.apache.nutch.util.NutchConfigurable -> org.apache.hadoop.Configurable 
> org.apache.nutch.util.NutchConfigured -> org.apache.hadoop.Configured
> org.apache.nutch.util.Progress -> org.apache.hadoop.util.Progress
> org.apache.nutch.util.LogFormatter-> org.apache.hadoop.util.LogFormatter
> org.apache.nutch.util.Daemon -> org.apache.hadoop.util.Daemon
> 3. Add a jar containing all of the above the Nutch's lib directory.
> Does this plan sound reasonable?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (NUTCH-193) move NDFS and MapReduce to a separate project

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/NUTCH-193?page=comments#action_12364657 ] 

Doug Cutting commented on NUTCH-193:
------------------------------------

NDFS, the Nutch Distributed Filesystem will be renamed HDFS, the Hadoop Distributed Filesystem.  Its code will live in the package org.apache.nutch.dfs, and its fs implementation class will be named DistributedFileSystem.

> move NDFS and MapReduce to a separate project
> ---------------------------------------------
>
>          Key: NUTCH-193
>          URL: http://issues.apache.org/jira/browse/NUTCH-193
>      Project: Nutch
>         Type: Task
>   Components: ndfs
>     Versions: 0.8-dev
>     Reporter: Doug Cutting
>     Assignee: Doug Cutting
>      Fix For: 0.8-dev

>
> The NDFS and MapReduce code should move from Nutch to a new Lucene sub-project named Hadoop.
> My plan is to do this as follows:
> 1. Move all code in the following packages from Nutch to Hadoop:
> org.apache.nutch.fs
> org.apache.nutch.io
> org.apache.nutch.ipc
> org.apache.nutch.mapred
> org.apache.nutch.ndfs
> These packages will all be renamed to org.apache.hadoop, and Nutch code will be updated to reflect this.
> 2. Move selected classes from Nutch to Hadoop, as follows:
> org.apache.nutch.util.NutchConf -> org.apache.hadoop.conf.Configuration
> org.apache.nutch.util.NutchConfigurable -> org.apache.hadoop.Configurable 
> org.apache.nutch.util.NutchConfigured -> org.apache.hadoop.Configured
> org.apache.nutch.util.Progress -> org.apache.hadoop.util.Progress
> org.apache.nutch.util.LogFormatter-> org.apache.hadoop.util.LogFormatter
> org.apache.nutch.util.Daemon -> org.apache.hadoop.util.Daemon
> 3. Add a jar containing all of the above the Nutch's lib directory.
> Does this plan sound reasonable?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (NUTCH-193) move NDFS and MapReduce to a separate project

Posted by "Mike Cafarella (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/NUTCH-193?page=comments#action_12365458 ] 

Mike Cafarella commented on NUTCH-193:
--------------------------------------


  It should be noted that the name "Nutch" also comes from one of Doug's children.
They seem to have a proud future in advertising and product naming.


> move NDFS and MapReduce to a separate project
> ---------------------------------------------
>
>          Key: NUTCH-193
>          URL: http://issues.apache.org/jira/browse/NUTCH-193
>      Project: Nutch
>         Type: Task
>   Components: ndfs
>     Versions: 0.8-dev
>     Reporter: Doug Cutting
>     Assignee: Doug Cutting
>      Fix For: 0.8-dev

>
> The NDFS and MapReduce code should move from Nutch to a new Lucene sub-project named Hadoop.
> My plan is to do this as follows:
> 1. Move all code in the following packages from Nutch to Hadoop:
> org.apache.nutch.fs
> org.apache.nutch.io
> org.apache.nutch.ipc
> org.apache.nutch.mapred
> org.apache.nutch.ndfs
> These packages will all be renamed to org.apache.hadoop, and Nutch code will be updated to reflect this.
> 2. Move selected classes from Nutch to Hadoop, as follows:
> org.apache.nutch.util.NutchConf -> org.apache.hadoop.conf.Configuration
> org.apache.nutch.util.NutchConfigurable -> org.apache.hadoop.Configurable 
> org.apache.nutch.util.NutchConfigured -> org.apache.hadoop.Configured
> org.apache.nutch.util.Progress -> org.apache.hadoop.util.Progress
> org.apache.nutch.util.LogFormatter-> org.apache.hadoop.util.LogFormatter
> org.apache.nutch.util.Daemon -> org.apache.hadoop.util.Daemon
> 3. Add a jar containing all of the above the Nutch's lib directory.
> Does this plan sound reasonable?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


Re: [jira] Resolved: (NUTCH-193) move NDFS and MapReduce to a separate project

Posted by Doug Cutting <cu...@apache.org>.
Sami Siren wrote:
> ParseText vs. org.apache.nutch.parse.ParseText
> ParseData vs. org.apache.nutch.parse.ParseData
> Content vs. org.apache.nutch.protocol.Content

I'll fix this Monday morning.

Doug

Re: [jira] Resolved: (NUTCH-193) move NDFS and MapReduce to a separate project

Posted by Sami Siren <ss...@gmail.com>.
Doug Cutting wrote:

> Doug Cutting (JIRA) wrote:
>
>>      [ http://issues.apache.org/jira/browse/NUTCH-193?page=all ]
>>      Doug Cutting resolved NUTCH-193:
>> --------------------------------
>>
>>     Resolution: Fixed
>>
>> I just committed this.  Phew!
>
>
> The major incompatibility I introduced with this was changing the 
> top-level element in config files from <nutch-conf> to <configuration>.
>
> Also, I have not yet tested whether files created before this patch 
> will be correctly handled.  The issue is that files contain class 
> names and class nicknames and both some of these names and the 
> nickname mechanism have changed.  This needs to be tested & fixed if 
> it is broken.
>
Threre's atleast problem with these:

ParseText vs. org.apache.nutch.parse.ParseText
ParseData vs. org.apache.nutch.parse.ParseData
Content vs. org.apache.nutch.protocol.Content

--
 Sami Siren

Re: [jira] Resolved: (NUTCH-193) move NDFS and MapReduce to a separate project

Posted by Doug Cutting <cu...@apache.org>.
Doug Cutting (JIRA) wrote:
>      [ http://issues.apache.org/jira/browse/NUTCH-193?page=all ]
>      
> Doug Cutting resolved NUTCH-193:
> --------------------------------
> 
>     Resolution: Fixed
> 
> I just committed this.  Phew!

The major incompatibility I introduced with this was changing the 
top-level element in config files from <nutch-conf> to <configuration>.

Also, I have not yet tested whether files created before this patch will 
be correctly handled.  The issue is that files contain class names and 
class nicknames and both some of these names and the nickname mechanism 
have changed.  This needs to be tested & fixed if it is broken.

Doug

[jira] Resolved: (NUTCH-193) move NDFS and MapReduce to a separate project

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/NUTCH-193?page=all ]
     
Doug Cutting resolved NUTCH-193:
--------------------------------

    Resolution: Fixed

I just committed this.  Phew!

> move NDFS and MapReduce to a separate project
> ---------------------------------------------
>
>          Key: NUTCH-193
>          URL: http://issues.apache.org/jira/browse/NUTCH-193
>      Project: Nutch
>         Type: Task
>   Components: ndfs
>     Versions: 0.8-dev
>     Reporter: Doug Cutting
>     Assignee: Doug Cutting
>      Fix For: 0.8-dev

>
> The NDFS and MapReduce code should move from Nutch to a new Lucene sub-project named Hadoop.
> My plan is to do this as follows:
> 1. Move all code in the following packages from Nutch to Hadoop:
> org.apache.nutch.fs
> org.apache.nutch.io
> org.apache.nutch.ipc
> org.apache.nutch.mapred
> org.apache.nutch.ndfs
> These packages will all be renamed to org.apache.hadoop, and Nutch code will be updated to reflect this.
> 2. Move selected classes from Nutch to Hadoop, as follows:
> org.apache.nutch.util.NutchConf -> org.apache.hadoop.conf.Configuration
> org.apache.nutch.util.NutchConfigurable -> org.apache.hadoop.Configurable 
> org.apache.nutch.util.NutchConfigured -> org.apache.hadoop.Configured
> org.apache.nutch.util.Progress -> org.apache.hadoop.util.Progress
> org.apache.nutch.util.LogFormatter-> org.apache.hadoop.util.LogFormatter
> org.apache.nutch.util.Daemon -> org.apache.hadoop.util.Daemon
> 3. Add a jar containing all of the above the Nutch's lib directory.
> Does this plan sound reasonable?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (NUTCH-193) move NDFS and MapReduce to a separate project

Posted by "John Xing (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/NUTCH-193?page=comments#action_12365051 ] 

John Xing commented on NUTCH-193:
---------------------------------

what's in the name hadoop? Because "had oops"?

> move NDFS and MapReduce to a separate project
> ---------------------------------------------
>
>          Key: NUTCH-193
>          URL: http://issues.apache.org/jira/browse/NUTCH-193
>      Project: Nutch
>         Type: Task
>   Components: ndfs
>     Versions: 0.8-dev
>     Reporter: Doug Cutting
>     Assignee: Doug Cutting
>      Fix For: 0.8-dev

>
> The NDFS and MapReduce code should move from Nutch to a new Lucene sub-project named Hadoop.
> My plan is to do this as follows:
> 1. Move all code in the following packages from Nutch to Hadoop:
> org.apache.nutch.fs
> org.apache.nutch.io
> org.apache.nutch.ipc
> org.apache.nutch.mapred
> org.apache.nutch.ndfs
> These packages will all be renamed to org.apache.hadoop, and Nutch code will be updated to reflect this.
> 2. Move selected classes from Nutch to Hadoop, as follows:
> org.apache.nutch.util.NutchConf -> org.apache.hadoop.conf.Configuration
> org.apache.nutch.util.NutchConfigurable -> org.apache.hadoop.Configurable 
> org.apache.nutch.util.NutchConfigured -> org.apache.hadoop.Configured
> org.apache.nutch.util.Progress -> org.apache.hadoop.util.Progress
> org.apache.nutch.util.LogFormatter-> org.apache.hadoop.util.LogFormatter
> org.apache.nutch.util.Daemon -> org.apache.hadoop.util.Daemon
> 3. Add a jar containing all of the above the Nutch's lib directory.
> Does this plan sound reasonable?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (NUTCH-193) move NDFS and MapReduce to a separate project

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/NUTCH-193?page=comments#action_12364690 ] 

Doug Cutting commented on NUTCH-193:
------------------------------------

Otis: yes, thanks, I meant org.apache.hadoop.dfs.

Andrzej: I'm awaiting Mike's commit of NUTCH-183, which should happen today.  I'll then try to make the split tomorrow.

> move NDFS and MapReduce to a separate project
> ---------------------------------------------
>
>          Key: NUTCH-193
>          URL: http://issues.apache.org/jira/browse/NUTCH-193
>      Project: Nutch
>         Type: Task
>   Components: ndfs
>     Versions: 0.8-dev
>     Reporter: Doug Cutting
>     Assignee: Doug Cutting
>      Fix For: 0.8-dev

>
> The NDFS and MapReduce code should move from Nutch to a new Lucene sub-project named Hadoop.
> My plan is to do this as follows:
> 1. Move all code in the following packages from Nutch to Hadoop:
> org.apache.nutch.fs
> org.apache.nutch.io
> org.apache.nutch.ipc
> org.apache.nutch.mapred
> org.apache.nutch.ndfs
> These packages will all be renamed to org.apache.hadoop, and Nutch code will be updated to reflect this.
> 2. Move selected classes from Nutch to Hadoop, as follows:
> org.apache.nutch.util.NutchConf -> org.apache.hadoop.conf.Configuration
> org.apache.nutch.util.NutchConfigurable -> org.apache.hadoop.Configurable 
> org.apache.nutch.util.NutchConfigured -> org.apache.hadoop.Configured
> org.apache.nutch.util.Progress -> org.apache.hadoop.util.Progress
> org.apache.nutch.util.LogFormatter-> org.apache.hadoop.util.LogFormatter
> org.apache.nutch.util.Daemon -> org.apache.hadoop.util.Daemon
> 3. Add a jar containing all of the above the Nutch's lib directory.
> Does this plan sound reasonable?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (NUTCH-193) move NDFS and MapReduce to a separate project

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/NUTCH-193?page=comments#action_12364669 ] 

Otis Gospodnetic commented on NUTCH-193:
----------------------------------------

I assume Doug meant org.apache.hadoop.dfs, not org.apache.nutch.dfs.

> move NDFS and MapReduce to a separate project
> ---------------------------------------------
>
>          Key: NUTCH-193
>          URL: http://issues.apache.org/jira/browse/NUTCH-193
>      Project: Nutch
>         Type: Task
>   Components: ndfs
>     Versions: 0.8-dev
>     Reporter: Doug Cutting
>     Assignee: Doug Cutting
>      Fix For: 0.8-dev

>
> The NDFS and MapReduce code should move from Nutch to a new Lucene sub-project named Hadoop.
> My plan is to do this as follows:
> 1. Move all code in the following packages from Nutch to Hadoop:
> org.apache.nutch.fs
> org.apache.nutch.io
> org.apache.nutch.ipc
> org.apache.nutch.mapred
> org.apache.nutch.ndfs
> These packages will all be renamed to org.apache.hadoop, and Nutch code will be updated to reflect this.
> 2. Move selected classes from Nutch to Hadoop, as follows:
> org.apache.nutch.util.NutchConf -> org.apache.hadoop.conf.Configuration
> org.apache.nutch.util.NutchConfigurable -> org.apache.hadoop.Configurable 
> org.apache.nutch.util.NutchConfigured -> org.apache.hadoop.Configured
> org.apache.nutch.util.Progress -> org.apache.hadoop.util.Progress
> org.apache.nutch.util.LogFormatter-> org.apache.hadoop.util.LogFormatter
> org.apache.nutch.util.Daemon -> org.apache.hadoop.util.Daemon
> 3. Add a jar containing all of the above the Nutch's lib directory.
> Does this plan sound reasonable?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (NUTCH-193) move NDFS and MapReduce to a separate project

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/NUTCH-193?page=comments#action_12365087 ] 

Doug Cutting commented on NUTCH-193:
------------------------------------

The name my kid gave a stuffed yellow elephant.  Short, relatively easy to spell and pronounce, meaningless, and not used elsewhere: those are my naming criteria.  Kids are good at generating such.  Googol is a kid's term.

> move NDFS and MapReduce to a separate project
> ---------------------------------------------
>
>          Key: NUTCH-193
>          URL: http://issues.apache.org/jira/browse/NUTCH-193
>      Project: Nutch
>         Type: Task
>   Components: ndfs
>     Versions: 0.8-dev
>     Reporter: Doug Cutting
>     Assignee: Doug Cutting
>      Fix For: 0.8-dev

>
> The NDFS and MapReduce code should move from Nutch to a new Lucene sub-project named Hadoop.
> My plan is to do this as follows:
> 1. Move all code in the following packages from Nutch to Hadoop:
> org.apache.nutch.fs
> org.apache.nutch.io
> org.apache.nutch.ipc
> org.apache.nutch.mapred
> org.apache.nutch.ndfs
> These packages will all be renamed to org.apache.hadoop, and Nutch code will be updated to reflect this.
> 2. Move selected classes from Nutch to Hadoop, as follows:
> org.apache.nutch.util.NutchConf -> org.apache.hadoop.conf.Configuration
> org.apache.nutch.util.NutchConfigurable -> org.apache.hadoop.Configurable 
> org.apache.nutch.util.NutchConfigured -> org.apache.hadoop.Configured
> org.apache.nutch.util.Progress -> org.apache.hadoop.util.Progress
> org.apache.nutch.util.LogFormatter-> org.apache.hadoop.util.LogFormatter
> org.apache.nutch.util.Daemon -> org.apache.hadoop.util.Daemon
> 3. Add a jar containing all of the above the Nutch's lib directory.
> Does this plan sound reasonable?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (NUTCH-193) move NDFS and MapReduce to a separate project

Posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/NUTCH-193?page=comments#action_12364662 ] 

Andrzej Bialecki  commented on NUTCH-193:
-----------------------------------------

What timeframe did you have in mind? There are a few patches in the queue, which will be affected by this split.

Other than that - emphatic yes!

> move NDFS and MapReduce to a separate project
> ---------------------------------------------
>
>          Key: NUTCH-193
>          URL: http://issues.apache.org/jira/browse/NUTCH-193
>      Project: Nutch
>         Type: Task
>   Components: ndfs
>     Versions: 0.8-dev
>     Reporter: Doug Cutting
>     Assignee: Doug Cutting
>      Fix For: 0.8-dev

>
> The NDFS and MapReduce code should move from Nutch to a new Lucene sub-project named Hadoop.
> My plan is to do this as follows:
> 1. Move all code in the following packages from Nutch to Hadoop:
> org.apache.nutch.fs
> org.apache.nutch.io
> org.apache.nutch.ipc
> org.apache.nutch.mapred
> org.apache.nutch.ndfs
> These packages will all be renamed to org.apache.hadoop, and Nutch code will be updated to reflect this.
> 2. Move selected classes from Nutch to Hadoop, as follows:
> org.apache.nutch.util.NutchConf -> org.apache.hadoop.conf.Configuration
> org.apache.nutch.util.NutchConfigurable -> org.apache.hadoop.Configurable 
> org.apache.nutch.util.NutchConfigured -> org.apache.hadoop.Configured
> org.apache.nutch.util.Progress -> org.apache.hadoop.util.Progress
> org.apache.nutch.util.LogFormatter-> org.apache.hadoop.util.LogFormatter
> org.apache.nutch.util.Daemon -> org.apache.hadoop.util.Daemon
> 3. Add a jar containing all of the above the Nutch's lib directory.
> Does this plan sound reasonable?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (NUTCH-193) move NDFS and MapReduce to a separate project

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/NUTCH-193?page=comments#action_12364665 ] 

Doug Cutting commented on NUTCH-193:
------------------------------------

Andrzej: I'd like to do this soon, this week or next.  No matter how long I wait, there will probably always be a few patches queued that will need to be updated.  But hopefully we can avoid large patches like NUTCH-169.  What other patches are you concerned about in particular?

Sami: yes, the fuse stuff would then make a great hadoop contrib package.






> move NDFS and MapReduce to a separate project
> ---------------------------------------------
>
>          Key: NUTCH-193
>          URL: http://issues.apache.org/jira/browse/NUTCH-193
>      Project: Nutch
>         Type: Task
>   Components: ndfs
>     Versions: 0.8-dev
>     Reporter: Doug Cutting
>     Assignee: Doug Cutting
>      Fix For: 0.8-dev

>
> The NDFS and MapReduce code should move from Nutch to a new Lucene sub-project named Hadoop.
> My plan is to do this as follows:
> 1. Move all code in the following packages from Nutch to Hadoop:
> org.apache.nutch.fs
> org.apache.nutch.io
> org.apache.nutch.ipc
> org.apache.nutch.mapred
> org.apache.nutch.ndfs
> These packages will all be renamed to org.apache.hadoop, and Nutch code will be updated to reflect this.
> 2. Move selected classes from Nutch to Hadoop, as follows:
> org.apache.nutch.util.NutchConf -> org.apache.hadoop.conf.Configuration
> org.apache.nutch.util.NutchConfigurable -> org.apache.hadoop.Configurable 
> org.apache.nutch.util.NutchConfigured -> org.apache.hadoop.Configured
> org.apache.nutch.util.Progress -> org.apache.hadoop.util.Progress
> org.apache.nutch.util.LogFormatter-> org.apache.hadoop.util.LogFormatter
> org.apache.nutch.util.Daemon -> org.apache.hadoop.util.Daemon
> 3. Add a jar containing all of the above the Nutch's lib directory.
> Does this plan sound reasonable?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (NUTCH-193) move NDFS and MapReduce to a separate project

Posted by "Sami Siren (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/NUTCH-193?page=comments#action_12364663 ] 

Sami Siren commented on NUTCH-193:
----------------------------------

+1

I quess the fuse-j - ndfs work from John/me  could be part of hadoop /contrib after this change?

> move NDFS and MapReduce to a separate project
> ---------------------------------------------
>
>          Key: NUTCH-193
>          URL: http://issues.apache.org/jira/browse/NUTCH-193
>      Project: Nutch
>         Type: Task
>   Components: ndfs
>     Versions: 0.8-dev
>     Reporter: Doug Cutting
>     Assignee: Doug Cutting
>      Fix For: 0.8-dev

>
> The NDFS and MapReduce code should move from Nutch to a new Lucene sub-project named Hadoop.
> My plan is to do this as follows:
> 1. Move all code in the following packages from Nutch to Hadoop:
> org.apache.nutch.fs
> org.apache.nutch.io
> org.apache.nutch.ipc
> org.apache.nutch.mapred
> org.apache.nutch.ndfs
> These packages will all be renamed to org.apache.hadoop, and Nutch code will be updated to reflect this.
> 2. Move selected classes from Nutch to Hadoop, as follows:
> org.apache.nutch.util.NutchConf -> org.apache.hadoop.conf.Configuration
> org.apache.nutch.util.NutchConfigurable -> org.apache.hadoop.Configurable 
> org.apache.nutch.util.NutchConfigured -> org.apache.hadoop.Configured
> org.apache.nutch.util.Progress -> org.apache.hadoop.util.Progress
> org.apache.nutch.util.LogFormatter-> org.apache.hadoop.util.LogFormatter
> org.apache.nutch.util.Daemon -> org.apache.hadoop.util.Daemon
> 3. Add a jar containing all of the above the Nutch's lib directory.
> Does this plan sound reasonable?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (NUTCH-193) move NDFS and MapReduce to a separate project

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/NUTCH-193?page=comments#action_12365130 ] 

Doug Cutting commented on NUTCH-193:
------------------------------------

Okay, I've moved the code from Nutch to Hadoop.  Now I need to repair Nutch so that it still works!

One remaining problem is the need to separate nutch config files from hadoop config files.  There's now a hadoop-default.xml and hadoop-site.xml, which are separate from the similarly-named nutch files.  For now, I'll fix this by adding the following methods to Hadoop's Configuration class:

void addDefaultResource(String name);
void addFinalResource(String name);

Then add a Nutch utility class like:

public class NutchConfiguration {
  public static Configuration create() {
    Configuration conf = new Configuration();
    addNutchResources(conf);
  }
  public static Configuration addNutchResources(Configuration conf) {
    addDefaultResource("nutch-default.xml");
    addFinalResource("nutch-site.xml");
  }
}

Then all of the places which currently call 'new NutchConf()' can be replaced with 'NutchConfiguration().create()'.

Longer-term we might consider a more radical re-design of the configuration API.  But first we need to get Hadoop and Nutch split.





> move NDFS and MapReduce to a separate project
> ---------------------------------------------
>
>          Key: NUTCH-193
>          URL: http://issues.apache.org/jira/browse/NUTCH-193
>      Project: Nutch
>         Type: Task
>   Components: ndfs
>     Versions: 0.8-dev
>     Reporter: Doug Cutting
>     Assignee: Doug Cutting
>      Fix For: 0.8-dev

>
> The NDFS and MapReduce code should move from Nutch to a new Lucene sub-project named Hadoop.
> My plan is to do this as follows:
> 1. Move all code in the following packages from Nutch to Hadoop:
> org.apache.nutch.fs
> org.apache.nutch.io
> org.apache.nutch.ipc
> org.apache.nutch.mapred
> org.apache.nutch.ndfs
> These packages will all be renamed to org.apache.hadoop, and Nutch code will be updated to reflect this.
> 2. Move selected classes from Nutch to Hadoop, as follows:
> org.apache.nutch.util.NutchConf -> org.apache.hadoop.conf.Configuration
> org.apache.nutch.util.NutchConfigurable -> org.apache.hadoop.Configurable 
> org.apache.nutch.util.NutchConfigured -> org.apache.hadoop.Configured
> org.apache.nutch.util.Progress -> org.apache.hadoop.util.Progress
> org.apache.nutch.util.LogFormatter-> org.apache.hadoop.util.LogFormatter
> org.apache.nutch.util.Daemon -> org.apache.hadoop.util.Daemon
> 3. Add a jar containing all of the above the Nutch's lib directory.
> Does this plan sound reasonable?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira