You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by bhavin pandya <bv...@gmail.com> on 2009/10/06 12:48:23 UTC
mapred.ReduceTask - java.io.FileNotFoundException
Hi,
I am trying to configure nutch and hadoop on 2 node. But while trying
to fetch, i am getting this exception. (same exception i am getting
sometime while injecting new seed)
2009-10-06 14:56:51,609 WARN mapred.ReduceTask -
java.io.FileNotFoundException: http://127.0.0.1:50060/mapOutput?
job=job_200910061454_0001&map=attempt_200910061454_0001_m_000000_0&reduce=3
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1345)
at java.security.AccessController.doPrivileged(Native Method)
at sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1339)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:993)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1293)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1231)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1144)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1084)
Caused by: java.io.FileNotFoundException:
http://127.0.0.1:50060/mapOutput?job=job_200910061454_0001&map
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTrack
er/jobcache/job_200910061454_0001/attempt_200910061454_0001_m_000000_0/output/f
ile.out.index in any of the configured local directories
at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalP
athToRead(LocalDirAllocator.java:381)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAl
locator.java:138)
at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTrac
ker.java:2840)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:42
7)
at org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicat
ionHandler.java:475)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:
567)
at org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
at org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicatio
nContext.java:635)
at org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
at org.mortbay.http.HttpServer.service(HttpServer.java:954)
at org.mortbay.http.HttpConnection.service(HttpConnection.java:814)
at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981)
at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)
at org.mortbay.http.SocketListener.handleConnection(SocketListener.java
:244)
at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)
And then continueous message in hadooop.log like
2009-10-06 15:56:43,918 WARN mapred.ReduceTask -
attempt_200910061538_0005_r_000001_0 adding host 127.0.0.1 to penalty
box, next contact in 150 seconds
Here is my hadoop-site.xml content:
<property>
<name>fs.default.name</name>
<value>hdfs://crawler1.mydomain.com:9000/</value>
<description>
The name of the default file system. Either the literal string
"local" or a host:port for NDFS.
</description>
</property>
<property>
<name>mapred.job.tracker</name>
<value>crawler1.mydomain.com:9001</value>
<description>
The host and port that the MapReduce job tracker runs at. If
"local", then jobs are run in-process as a single map and
reduce task.
</description>
</property>
<property>
<name>mapred.map.tasks</name>
<value>2</value>
<description>
define mapred.map tasks to be number of slave hosts
</description>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>2</value>
<description>
define mapred.reduce tasks to be number of slave hosts
</description>
</property>
<property>
<name>dfs.name.dir</name>
<value>/nutch/filesystem/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/nutch/filesystem/data</value>
</property>
<property>
<name>mapred.system.dir</name>
<value>/nutch/filesystem/mapreduce/system</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>/nutch/filesystem/mapreduce/local</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/tmp</value>
<description>A base for other temporary directories</description>
</property>
netstat -antp shows program listening on port 50060
tcp 0 0 0.0.0.0:50090 0.0.0.0:*
LISTEN 12855/java
tcp 0 0 0.0.0.0:50060 0.0.0.0:*
LISTEN 13014/java
tcp 0 0 0.0.0.0:50030 0.0.0.0:*
LISTEN 12923/java
tcp 0 0 0.0.0.0:50010 0.0.0.0:*
LISTEN 12765/java
tcp 0 0 0.0.0.0:50075 0.0.0.0:*
LISTEN 12765/java
masters:
crawler1.mydomain.com
slaves:
crawler1.mydomain.com
crawler2.mydomain.com
It works perfect with signle machine configuration.
I am using nutch 1.0. Any pointer?
Thanks.
- Bhavin
Re: mapred.ReduceTask - java.io.FileNotFoundException
Posted by bhavin pandya <bv...@gmail.com>.
Hi,
I don't know exact cause of exception but it is resolved.
Earlier I have done some changes in IP settings but still server was
using old configuration.
I restart the service and problem is resolved.
Thanks.
Bhavin
On Tue, Oct 6, 2009 at 4:48 PM, tittutomen <su...@gmail.com> wrote:
>
>
>
> bhavin pandya-3 wrote:
>>
>> Hi,
>>
>> I am trying to configure nutch and hadoop on 2 node. But while trying
>> to fetch, i am getting this exception. (same exception i am getting
>> sometime while injecting new seed)
>>
>> 2009-10-06 14:56:51,609 WARN mapred.ReduceTask -
>> java.io.FileNotFoundException: http://127.0.0.1:50060/mapOutput?
>> job=job_200910061454_0001&map=attempt_200910061454_0001_m_000000_0&reduce=3
>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>> Method)
>> at
>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>> at
>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>> at
>> sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1345)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at
>> sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1339)
>> at
>> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:993)
>> at
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1293)
>> at
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1231)
>> at
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1144)
>> at
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1084)
>> Caused by: java.io.FileNotFoundException:
>> http://127.0.0.1:50060/mapOutput?job=job_200910061454_0001&map
>>
>>
>> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
>> taskTrack
>> er/jobcache/job_200910061454_0001/attempt_200910061454_0001_m_000000_0/output/f
>> ile.out.index in any of the configured local directories
>> at
>> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalP
>> athToRead(LocalDirAllocator.java:381)
>> at
>> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAl
>> locator.java:138)
>> at
>> org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTrac
>> ker.java:2840)
>> at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
>> at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
>> at
>> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:42
>> 7)
>> at
>> org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicat
>> ionHandler.java:475)
>> at
>> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:
>> 567)
>> at org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
>> at
>> org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicatio
>> nContext.java:635)
>> at org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
>> at org.mortbay.http.HttpServer.service(HttpServer.java:954)
>> at
>> org.mortbay.http.HttpConnection.service(HttpConnection.java:814)
>> at
>> org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981)
>> at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)
>> at
>> org.mortbay.http.SocketListener.handleConnection(SocketListener.java
>> :244)
>> at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
>> at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)
>>
>>
>> And then continueous message in hadooop.log like
>> 2009-10-06 15:56:43,918 WARN mapred.ReduceTask -
>> attempt_200910061538_0005_r_000001_0 adding host 127.0.0.1 to penalty
>> box, next contact in 150 seconds
>>
>>
>> Here is my hadoop-site.xml content:
>> <property>
>> <name>fs.default.name</name>
>> <value>hdfs://crawler1.mydomain.com:9000/</value>
>> <description>
>> The name of the default file system. Either the literal string
>> "local" or a host:port for NDFS.
>> </description>
>> </property>
>>
>> <property>
>> <name>mapred.job.tracker</name>
>> <value>crawler1.mydomain.com:9001</value>
>> <description>
>> The host and port that the MapReduce job tracker runs at. If
>> "local", then jobs are run in-process as a single map and
>> reduce task.
>> </description>
>> </property>
>>
>> <property>
>> <name>mapred.map.tasks</name>
>> <value>2</value>
>> <description>
>> define mapred.map tasks to be number of slave hosts
>> </description>
>> </property>
>>
>> <property>
>> <name>mapred.reduce.tasks</name>
>> <value>2</value>
>> <description>
>> define mapred.reduce tasks to be number of slave hosts
>> </description>
>> </property>
>>
>> <property>
>> <name>dfs.name.dir</name>
>> <value>/nutch/filesystem/name</value>
>> </property>
>>
>> <property>
>> <name>dfs.data.dir</name>
>> <value>/nutch/filesystem/data</value>
>> </property>
>>
>> <property>
>> <name>mapred.system.dir</name>
>> <value>/nutch/filesystem/mapreduce/system</value>
>> </property>
>>
>> <property>
>> <name>mapred.local.dir</name>
>> <value>/nutch/filesystem/mapreduce/local</value>
>> </property>
>>
>> <property>
>> <name>dfs.replication</name>
>> <value>1</value>
>> </property>
>>
>> <property>
>> <name>hadoop.tmp.dir</name>
>> <value>/tmp</value>
>> <description>A base for other temporary directories</description>
>> </property>
>>
>> netstat -antp shows program listening on port 50060
>> tcp 0 0 0.0.0.0:50090 0.0.0.0:*
>> LISTEN 12855/java
>> tcp 0 0 0.0.0.0:50060 0.0.0.0:*
>> LISTEN 13014/java
>> tcp 0 0 0.0.0.0:50030 0.0.0.0:*
>> LISTEN 12923/java
>> tcp 0 0 0.0.0.0:50010 0.0.0.0:*
>> LISTEN 12765/java
>> tcp 0 0 0.0.0.0:50075 0.0.0.0:*
>> LISTEN 12765/java
>>
>> masters:
>> crawler1.mydomain.com
>>
>> slaves:
>> crawler1.mydomain.com
>> crawler2.mydomain.com
>>
>>
>> It works perfect with signle machine configuration.
>> I am using nutch 1.0. Any pointer?
>>
>> Thanks.
>> - Bhavin
>>
>>
>
>
>
> Please try with the following configuration change
> <property>
> <name>dfs.replication</name>
> <value>2</value>
> </property>
>
>
>
> --
> View this message in context: http://www.nabble.com/mapred.ReduceTask---java.io.FileNotFoundException-tp25766523p25766880.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
>
--
- Bhavin
Re: mapred.ReduceTask - java.io.FileNotFoundException
Posted by tittutomen <su...@gmail.com>.
bhavin pandya-3 wrote:
>
> Hi,
>
> I am trying to configure nutch and hadoop on 2 node. But while trying
> to fetch, i am getting this exception. (same exception i am getting
> sometime while injecting new seed)
>
> 2009-10-06 14:56:51,609 WARN mapred.ReduceTask -
> java.io.FileNotFoundException: http://127.0.0.1:50060/mapOutput?
> job=job_200910061454_0001&map=attempt_200910061454_0001_m_000000_0&reduce=3
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> at
> sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1345)
> at java.security.AccessController.doPrivileged(Native Method)
> at
> sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1339)
> at
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:993)
> at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1293)
> at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1231)
> at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1144)
> at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1084)
> Caused by: java.io.FileNotFoundException:
> http://127.0.0.1:50060/mapOutput?job=job_200910061454_0001&map
>
>
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
> taskTrack
> er/jobcache/job_200910061454_0001/attempt_200910061454_0001_m_000000_0/output/f
> ile.out.index in any of the configured local directories
> at
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalP
> athToRead(LocalDirAllocator.java:381)
> at
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAl
> locator.java:138)
> at
> org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTrac
> ker.java:2840)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
> at
> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:42
> 7)
> at
> org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicat
> ionHandler.java:475)
> at
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:
> 567)
> at org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
> at
> org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicatio
> nContext.java:635)
> at org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
> at org.mortbay.http.HttpServer.service(HttpServer.java:954)
> at
> org.mortbay.http.HttpConnection.service(HttpConnection.java:814)
> at
> org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981)
> at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)
> at
> org.mortbay.http.SocketListener.handleConnection(SocketListener.java
> :244)
> at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
> at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)
>
>
> And then continueous message in hadooop.log like
> 2009-10-06 15:56:43,918 WARN mapred.ReduceTask -
> attempt_200910061538_0005_r_000001_0 adding host 127.0.0.1 to penalty
> box, next contact in 150 seconds
>
>
> Here is my hadoop-site.xml content:
> <property>
> <name>fs.default.name</name>
> <value>hdfs://crawler1.mydomain.com:9000/</value>
> <description>
> The name of the default file system. Either the literal string
> "local" or a host:port for NDFS.
> </description>
> </property>
>
> <property>
> <name>mapred.job.tracker</name>
> <value>crawler1.mydomain.com:9001</value>
> <description>
> The host and port that the MapReduce job tracker runs at. If
> "local", then jobs are run in-process as a single map and
> reduce task.
> </description>
> </property>
>
> <property>
> <name>mapred.map.tasks</name>
> <value>2</value>
> <description>
> define mapred.map tasks to be number of slave hosts
> </description>
> </property>
>
> <property>
> <name>mapred.reduce.tasks</name>
> <value>2</value>
> <description>
> define mapred.reduce tasks to be number of slave hosts
> </description>
> </property>
>
> <property>
> <name>dfs.name.dir</name>
> <value>/nutch/filesystem/name</value>
> </property>
>
> <property>
> <name>dfs.data.dir</name>
> <value>/nutch/filesystem/data</value>
> </property>
>
> <property>
> <name>mapred.system.dir</name>
> <value>/nutch/filesystem/mapreduce/system</value>
> </property>
>
> <property>
> <name>mapred.local.dir</name>
> <value>/nutch/filesystem/mapreduce/local</value>
> </property>
>
> <property>
> <name>dfs.replication</name>
> <value>1</value>
> </property>
>
> <property>
> <name>hadoop.tmp.dir</name>
> <value>/tmp</value>
> <description>A base for other temporary directories</description>
> </property>
>
> netstat -antp shows program listening on port 50060
> tcp 0 0 0.0.0.0:50090 0.0.0.0:*
> LISTEN 12855/java
> tcp 0 0 0.0.0.0:50060 0.0.0.0:*
> LISTEN 13014/java
> tcp 0 0 0.0.0.0:50030 0.0.0.0:*
> LISTEN 12923/java
> tcp 0 0 0.0.0.0:50010 0.0.0.0:*
> LISTEN 12765/java
> tcp 0 0 0.0.0.0:50075 0.0.0.0:*
> LISTEN 12765/java
>
> masters:
> crawler1.mydomain.com
>
> slaves:
> crawler1.mydomain.com
> crawler2.mydomain.com
>
>
> It works perfect with signle machine configuration.
> I am using nutch 1.0. Any pointer?
>
> Thanks.
> - Bhavin
>
>
Please try with the following configuration change
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
--
View this message in context: http://www.nabble.com/mapred.ReduceTask---java.io.FileNotFoundException-tp25766523p25766880.html
Sent from the Nutch - User mailing list archive at Nabble.com.