You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by Mohamed Imran K R <mo...@gmail.com> on 2007/08/22 12:00:54 UTC

problems with nutch clustering

hi
    we are trying to build a nutch cluster for the natural language
processing department of our research centre. we are deploying a search
engine for evaluation on tamil ( a south indian language). The search engine
works really well with all the customizations that they have done on a
single machine however we are facing some small issues on clustering. The
error i got sounds familiar but its vexing. I am using nutch 0.9 and
jdk1.5.0_12. i followed this tutorial
http://wiki.apache.org/nutch/NutchHadoopTutorial and it worked for a single
system but i came up with this error from the web interface for the slave
machine, when running the same on a cluster

Map output lost, rescheduling: getMapOutput(task_0001_m_000001_0,1) failed :
java.io.EOFException
	at java.io.DataInputStream.readFully(DataInputStream.java:178)
	at java.io.DataInputStream.readLong(DataInputStream.java:380)
	at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:1643)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
	at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427)
	at org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475)
	at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)
	at org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
	at org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635)
	at org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
	at org.mortbay.http.HttpServer.service(HttpServer.java:954)
	at org.mortbay.http.HttpConnection.service(HttpConnection.java:814)
	at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981)
	at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)
	at org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244)
	at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
	at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)

The above is the error message that is thrown out on the slave console and
then on the nutch processes running on the slave just idles along...( i
check with tasktracker and its 0%)
i did change a couple of lines in log4j.properties as
hadoop.log.dir=.
hadoop.log.file=hadoop.log
but other than these, i am running a default 0.9 release.
looking forward to your help in solving this issue and having a nutch of a
time
BTW is there any IRC channel for nutch?
-- 
Regards
Mohamed Imran K R
AU-KBC Research Centre

Re: problems with nutch clustering

Posted by Mohamed Imran K R <mo...@gmail.com>.

On 8/22/07, bikram <bi...@yahoo.com> wrote:
>
>
> HI..
>
> Can u post some details..... Like..

Hello ... thanks for the reply

Os, Config Files, Logs etc..

I am running the master node on fedora 7 and the slave node on fedora 6..I
am using nutch 0.9 and jdk1.5.0_12...
the hadoop-site.xml looks like this http://pastebin.com/m5df1fe48
and my hadoop-env.sh is http://pastebin.com/m5b991d92

my log4j.properties is http://pastebin.com/d782632d8

my client hadoop-nutch-tasktracker-client.log is
http://pastebin.com/d3efe33a

the error message on web based tracker is http://pastebin.com/d6d9ee617
pls letme know of any more files that i need to send...
once again thank for spending ur time
-- 
Regards
Mohamed Imran K R

Re: problems with nutch clustering

Posted by bikram <bi...@yahoo.com>.

HI..

Can u post some details..... Like..

Os, Config Files, Logs etc..

Thanx
Bikram




Mohamed Imran K R wrote:
> 
> hi
>     we are trying to build a nutch cluster for the natural language
> processing department of our research centre. we are deploying a search
> engine for evaluation on tamil ( a south indian language). The search
> engine
> works really well with all the customizations that they have done on a
> single machine however we are facing some small issues on clustering. The
> error i got sounds familiar but its vexing. I am using nutch 0.9 and
> jdk1.5.0_12. i followed this tutorial
> http://wiki.apache.org/nutch/NutchHadoopTutorial and it worked for a
> single
> system but i came up with this error from the web interface for the slave
> machine, when running the same on a cluster
> 
> Map output lost, rescheduling: getMapOutput(task_0001_m_000001_0,1) failed
> :
> java.io.EOFException
> 	at java.io.DataInputStream.readFully(DataInputStream.java:178)
> 	at java.io.DataInputStream.readLong(DataInputStream.java:380)
> 	at
> org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:1643)
> 	at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
> 	at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
> 	at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427)
> 	at
> org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475)
> 	at
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)
> 	at org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
> 	at
> org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635)
> 	at org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
> 	at org.mortbay.http.HttpServer.service(HttpServer.java:954)
> 	at org.mortbay.http.HttpConnection.service(HttpConnection.java:814)
> 	at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981)
> 	at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)
> 	at
> org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244)
> 	at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
> 	at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)
> 
> The above is the error message that is thrown out on the slave console and
> then on the nutch processes running on the slave just idles along...( i
> check with tasktracker and its 0%)
> i did change a couple of lines in log4j.properties as
> hadoop.log.dir=.
> hadoop.log.file=hadoop.log
> but other than these, i am running a default 0.9 release.
> looking forward to your help in solving this issue and having a nutch of a
> time
> BTW is there any IRC channel for nutch?
> -- 
> Regards
> Mohamed Imran K R
> AU-KBC Research Centre
> 
> 

-- 
View this message in context: http://www.nabble.com/problems-with-nutch-clustering-tf4310647.html#a12273638
Sent from the Nutch - User mailing list archive at Nabble.com.