You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Dennis Kubes (JIRA)" <ji...@apache.org> on 2009/11/22 00:39:39 UTC

[jira] Created: (NUTCH-768) Upgrade Nutch 1.0 to use Hadoop 0.20

Upgrade Nutch 1.0 to use Hadoop 0.20
------------------------------------

                 Key: NUTCH-768
                 URL: https://issues.apache.org/jira/browse/NUTCH-768
             Project: Nutch
          Issue Type: Improvement
    Affects Versions: 1.1
         Environment: All
            Reporter: Dennis Kubes
            Assignee: Dennis Kubes
             Fix For: 1.1


Upgrade Nutch 1.0 to use the Hadoop 0.20 release.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (NUTCH-768) Upgrade Nutch 1.0 to use Hadoop 0.20

Posted by "Dennis Kubes (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dennis Kubes updated NUTCH-768:
-------------------------------

    Attachment: NUTCH-768-1-20091125.patch

I thought I was going to be able to do this without code changes.  No such luck.  

There are many, many deprecations as a result of this upgrade.  Anything that used the old Mapper and Reducer interfaces seems to have deprecated methods in it.  The NutchBean class needed to implement the two RPC*Bean interfaces to handle changes in Hadoop RPC (that could have been a leftover from 1.0 changes but I don't think so).  Also there are numerous changes to build scripts and the nutch bin script to support different hadoop jars.

There are also many new files for the conf directory as Hadoop has split out files and has new configuration files for new capabilities.

After all changes I was able to run everything in local and pseudo-distributed mode as well as test out local and distributed searching.  Everything seems to work fine.  After we make this upgrade I would recommend going back and updating all of the tool interfaces for the most recent APIs.

> Upgrade Nutch 1.0 to use Hadoop 0.20
> ------------------------------------
>
>                 Key: NUTCH-768
>                 URL: https://issues.apache.org/jira/browse/NUTCH-768
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.1
>         Environment: All
>            Reporter: Dennis Kubes
>            Assignee: Dennis Kubes
>             Fix For: 1.1
>
>         Attachments: NUTCH-768-1-20091125.patch
>
>
> Upgrade Nutch 1.0 to use the Hadoop 0.20 release.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-768) Upgrade Nutch 1.0 to use Hadoop 0.20

Posted by "Dennis Kubes (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784066#action_12784066 ] 

Dennis Kubes commented on NUTCH-768:
------------------------------------

If no objections I will commit this tomorrow sometime?

> Upgrade Nutch 1.0 to use Hadoop 0.20
> ------------------------------------
>
>                 Key: NUTCH-768
>                 URL: https://issues.apache.org/jira/browse/NUTCH-768
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.1
>         Environment: All
>            Reporter: Dennis Kubes
>            Assignee: Dennis Kubes
>             Fix For: 1.1
>
>         Attachments: NUTCH-768-1-20091125.patch
>
>
> Upgrade Nutch 1.0 to use the Hadoop 0.20 release.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Closed: (NUTCH-768) Upgrade Nutch 1.0 to use Hadoop 0.20

Posted by "Dennis Kubes (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dennis Kubes closed NUTCH-768.
------------------------------

    Resolution: Fixed

Weird.  The hsqldb License file was the same checksum as that pulled from hadoop.  It must have had the windows EOL in hadoop distribution as well.  I changed it anyways.  Everything committed with revision 885778.

> Upgrade Nutch 1.0 to use Hadoop 0.20
> ------------------------------------
>
>                 Key: NUTCH-768
>                 URL: https://issues.apache.org/jira/browse/NUTCH-768
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.1
>         Environment: All
>            Reporter: Dennis Kubes
>            Assignee: Dennis Kubes
>             Fix For: 1.1
>
>         Attachments: NUTCH-768-1-20091125.patch
>
>
> Upgrade Nutch 1.0 to use the Hadoop 0.20 release.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-768) Upgrade Nutch 1.0 to use Hadoop 0.20

Posted by "Dennis Kubes (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782172#action_12782172 ] 

Dennis Kubes commented on NUTCH-768:
------------------------------------

I have tested the upgrade with Hadoop 0.20.  To upgrade this correctly we do need to upgrade Xerces both in the main lib jars and within the lib-xml plugin.  I have upgraded to the most recent version of Xerces 2.9.x.  Having run through multiple full crawl and index cycles both on the new and old indexing frameworks, including the webgraphdb, and the solr indexing process, I didn't find any errors within the process.  If no one has any objections I will commit these changes within the next 24 hours.

> Upgrade Nutch 1.0 to use Hadoop 0.20
> ------------------------------------
>
>                 Key: NUTCH-768
>                 URL: https://issues.apache.org/jira/browse/NUTCH-768
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.1
>         Environment: All
>            Reporter: Dennis Kubes
>            Assignee: Dennis Kubes
>             Fix For: 1.1
>
>
> Upgrade Nutch 1.0 to use the Hadoop 0.20 release.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-768) Upgrade Nutch 1.0 to use Hadoop 0.20

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792782#action_12792782 ] 

Hudson commented on NUTCH-768:
------------------------------

Integrated in Nutch-trunk #1015 (See [http://hudson.zones.apache.org/hudson/job/Nutch-trunk/1015/])
    

> Upgrade Nutch 1.0 to use Hadoop 0.20
> ------------------------------------
>
>                 Key: NUTCH-768
>                 URL: https://issues.apache.org/jira/browse/NUTCH-768
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.1
>         Environment: All
>            Reporter: Dennis Kubes
>            Assignee: Dennis Kubes
>             Fix For: 1.1
>
>         Attachments: NUTCH-768-1-20091125.patch
>
>
> Upgrade Nutch 1.0 to use the Hadoop 0.20 release.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-768) Upgrade Nutch 1.0 to use Hadoop 0.20

Posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782179#action_12782179 ] 

Andrzej Bialecki  commented on NUTCH-768:
-----------------------------------------

Are there any source code changes involved? If so, please upload a patch.

Did you check this in local, distributed or pseudo-distributed mode? In the past there have been errors related to local (or distributed) mode that wouldn't occur when running in other modes.

> Upgrade Nutch 1.0 to use Hadoop 0.20
> ------------------------------------
>
>                 Key: NUTCH-768
>                 URL: https://issues.apache.org/jira/browse/NUTCH-768
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.1
>         Environment: All
>            Reporter: Dennis Kubes
>            Assignee: Dennis Kubes
>             Fix For: 1.1
>
>
> Upgrade Nutch 1.0 to use the Hadoop 0.20 release.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-768) Upgrade Nutch 1.0 to use Hadoop 0.20

Posted by "Dennis Kubes (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12790162#action_12790162 ] 

Dennis Kubes commented on NUTCH-768:
------------------------------------

The older jetty jar file was not removed with this patch.  It will need to be removed from the nutch lib directory if applying the patch versus pulling from trunk.  There is also a second patch that updates unit tests for the Jetty interfaces.  Neither of these will need to be applied if pulling from Trunk as those problems have been corrected.

> Upgrade Nutch 1.0 to use Hadoop 0.20
> ------------------------------------
>
>                 Key: NUTCH-768
>                 URL: https://issues.apache.org/jira/browse/NUTCH-768
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.1
>         Environment: All
>            Reporter: Dennis Kubes
>            Assignee: Dennis Kubes
>             Fix For: 1.1
>
>         Attachments: NUTCH-768-1-20091125.patch
>
>
> Upgrade Nutch 1.0 to use the Hadoop 0.20 release.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-768) Upgrade Nutch 1.0 to use Hadoop 0.20

Posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784206#action_12784206 ] 

Andrzej Bialecki  commented on NUTCH-768:
-----------------------------------------

+1.

Minor nit: file lib/hsqldb-1.8.0.10.LICENSE.txt uses Windows EOL style, this should be probably corrected before commit.

> Upgrade Nutch 1.0 to use Hadoop 0.20
> ------------------------------------
>
>                 Key: NUTCH-768
>                 URL: https://issues.apache.org/jira/browse/NUTCH-768
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.1
>         Environment: All
>            Reporter: Dennis Kubes
>            Assignee: Dennis Kubes
>             Fix For: 1.1
>
>         Attachments: NUTCH-768-1-20091125.patch
>
>
> Upgrade Nutch 1.0 to use the Hadoop 0.20 release.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.