You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Volos Stavros <st...@epfl.ch> on 2011/03/23 14:45:25 UTC

Upgrade nutch to hadoop 0.21 - Dependency missing

Hi all,

I have been using Nutch 1.1 with hadoop 0.20.2. I was able to achieve 90% utilization on a two-node cluster. 
Each node has 12 cores. I am trying to achieve 100% utilization, but it seems that there is a scalability 
bottleneck in the Listener thread that inserts the incoming queries to the callQueue from which the thread
handlers pop out requests. According to https://issues.apache.org/jira/browse/HADOOP-6713 this was a known
bug that is fixed in hadoop 0.21. 

So I upgraded Nutch to use hadoop 0.21 by downloading the jar files of hadoop to the lib directory and changing
the libraries for the native directory. I was able to clean/build the nutch. Next, I deployed it to tomcat and my 
distributed search servers. When I send a query the tomcat sends the query to the two nodes and receives the 
results. However, the results are not shown on the web

The hadoop.log of the search node has the following error:

2011-03-23 14:22:29,564 INFO  ipc.Server - IPC Server handler 3 on 8889, call getSummary([Lorg.apache.nutch.searcher.HitDetails;@24c68a98, facebook) from 192.168.10.43:37498: error: java.io.IOException: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.NoClassDefFoundError: org/apache/avro/io/DatumReader
java.io.IOException: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.NoClassDefFoundError: org/apache/avro/io/DatumReader   
        at org.apache.nutch.searcher.FetchedSegments.getSummary(FetchedSegments.java:297)
        at org.apache.nutch.searcher.NutchBean.getSummary(NutchBean.java:341)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:342)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1350)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1346)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:742)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1344)

It seems that the dependency with the package org.apache.avro is missing. So I made sure that these dependencies are included
when building the hadoop/common. Still, the problem exists.

Any help?

Thanks,
-Stavros.