You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Mohan Lal <mo...@gmail.com> on 2006/09/30 10:57:32 UTC

Problem in Distributed file system


Hi all,
          Im Using nutch 0.8.1, i have done distributed crawling having 3
machines MASTER,  NODE1, NODE2

In Master i can able to see the file system directories using the command
bin/hadoop dfs -ls
but in NODE1 and NODE2 i can not able to see the files system direcories,
any one please help how can i saw the contents inside the NODE
machines................

my config

slaves
LOCALHOST
NODE1
NODE2

hadoop-site.xml having

<configuration>  
 <property>  
    <name>fs.default.name</name>  
    <value>localhost:9000</value>  
 </property>  

 <property>  
    <name>dfs.name.dir</name>  
    <value>/tmp/hadoop/dfs/name</value>  
 </property> 

 <property>  
    <name>dfs.data.dir</name>  
    <value>/tmp/hadoop/dfs/data</value>  
 </property> 

 <property>  
    <name>dfs.replication</name>  
    <value>2</value>  
 </property> 

 <property>
    <name>dfs.datanode.port</name>
    <value>50010</value>
    <description>The port number that the dfs datanode server uses as a
starting 
	        point to look for a free port to listen on.
    </description>
 </property>

 <property>
   <name>dfs.info.port</name>
   <value>50070</value>
   <description>The base port number for the dfs namenode web
ui.</description>
 </property>

 <property>
   <name>dfs.datanode.dns.nameserver</name>
   <value>192.168.0.1</value>
   <description>The host name or IP address of the name server (DNS)
      which a DataNode should use to determine the host name used by the
      NameNode for communication and display purposes.
    </description>
 </property>

<!-- map/reduce properties -->

 <property>  
    <name>mapred.job.tracker</name>  
    <value>localhost:9001</value>  
 </property>  

 <property>
   <name>mapred.job.tracker.info.port</name>
   <value>50030</value>
   <description>The port that the MapReduce job tracker info webserver runs
at.
   </description>
 </property>

 <property>
   <name>mapred.task.tracker.output.port</name>
   <value>50040</value>
   <description>The port number that the MapReduce task tracker output
server uses as a starting
               point to look for a free port to listen on.
   </description>
 </property>

 <property>
   <name>mapred.task.tracker.report.port</name>
   <value>50050</value>
   <description>The port number that the MapReduce task tracker report
server uses as a starting
               point to look for a free port to listen on.
   </description>
 </property>

 <property>
   <name>tasktracker.http.port</name>
   <value>50060</value>
   <description>The default port for task trackers to use as their http
server.
   </description>
 </property>

 <property>  
    <name>mapred.local.dir</name>  
    <value>/tmp/hadoop/mapred/local</value>  
 </property>

 <property>
   <name>mapred.temp.dir</name>
   <value>/tmp/hadoop/mapred/temp</value>
   <description>A shared directory for temporary files.
   </description>
 </property>

 <property>
   <name>mapred.system.dir</name>
   <value>/tmp/hadoop/mapred/system</value>
   <description>The shared directory where MapReduce stores control files.
   </description>
 </property>

 <property>
    <name>mapred.tasktracker.dns.nameserver</name>
    <value>192.168.0.1</value>
    <description>The host name or IP address of the name server (DNS)
  	which a TaskTracker should use to determine the host name used by
	the JobTracker for communication and display purposes.
    </description>
 </property>

 <property>
   <name>tasktracker.http.threads</name>
   <value>10</value>
   <description>The number of worker threads that for the http server. This
is
               used for map output fetching
   </description>
 </property>

 <property>
   <name>mapred.map.tasks</name>
   <value>10</value>
   <description>The default number of map tasks per job.  Typically set
   to a prime several times greater than number of available hosts.
   Ignored when mapred.job.tracker is "local".  
   </description>
 </property>

 <property>
   <name>mapred.reduce.tasks</name>
   <value>2</value>
   <description>The default number of reduce tasks per job.  Typically set
   to a prime close to the number of available hosts.  Ignored when
   mapred.job.tracker is "local".
   </description>
 </property>

 <property>
   <name>mapred.reduce.parallel.copies</name>
   <value>5</value>
   <description>The default number of parallel transfers run by reduce
   during the copy(shuffle) phase.
   </description>
 </property>

</configuration>

also my log files having the WARN like in file
hadoop-root-namenode-mohanlal.qburst.local.log

2006-09-30 14:08:26,919 WARN  fs.FSNamesystem - Zero targets found,
forbidden1.size=1 forbidden2.size()=0

please help me...................

Regards
Mohan Lal
-- 
View this message in context: http://www.nabble.com/Problem-in-Distributed-file-system-tf2360944.html#a6577395
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: Problem in Distributed file system

Posted by Sunil Kumar PK <pk...@gmail.com>.
Thanks Dennis, I was also hanging with the same problem.

On 10/1/06, nutch-dev@dragonflymc.com <nu...@dragonflymc.com> wrote:
>
> Change your config to be as below.  Localhost will only work on the
> machine the namenode is running on.  The other machines must point to the
> server the namenode and jobtracker are running on.
>
> Dennis
>
> <configuration>
> <property>
>     <name>fs.default.name</name>
>     <value>machinename.domainname.com:9000</value>
> </property>
>
> <property>
>
>     <name>mapred.job.tracker</name>
>     <value>machinename.domainname.com:9001</value>
> </property>
>
>
> Dennis
> >
> > Hi all,
> >           Im Using nutch 0.8.1, i have done distributed crawling having
> 3
> > machines MASTER,  NODE1, NODE2
> >
> > In Master i can able to see the file system directories using the
> command
> > bin/hadoop dfs -ls
> > but in NODE1 and NODE2 i can not able to see the files system
> direcories,
> > any one please help how can i saw the contents inside the NODE
> > machines................
> >
> > my config
> >
> > slaves
> > LOCALHOST
> > NODE1
> > NODE2
> >
> > hadoop-site.xml having
> >
> > <configuration>
> >  <property>
> >     <name>fs.default.name</name>
> >     <value>localhost:9000</value>
> >  </property>
> >
> >  <property>
> >     <name>dfs.name.dir</name>
> >     <value>/tmp/hadoop/dfs/name</value>
> >  </property>
> >
> >  <property>
> >     <name>dfs.data.dir</name>
> >     <value>/tmp/hadoop/dfs/data</value>
> >  </property>
> >
> >  <property>
> >     <name>dfs.replication</name>
> >     <value>2</value>
> >  </property>
> >
> >  <property>
> >     <name>dfs.datanode.port</name>
> >     <value>50010</value>
> >     <description>The port number that the dfs datanode server uses as a
> > starting
> >               point to look for a free port to listen on.
> >     </description>
> >  </property>
> >
> >  <property>
> >    <name>dfs.info.port</name>
> >    <value>50070</value>
> >    <description>The base port number for the dfs namenode web
> > ui.</description>
> >  </property>
> >
> >  <property>
> >    <name>dfs.datanode.dns.nameserver</name>
> >    <value>192.168.0.1</value>
> >    <description>The host name or IP address of the name server (DNS)
> >       which a DataNode should use to determine the host name used by the
> >       NameNode for communication and display purposes.
> >     </description>
> >  </property>
> >
> > <!-- map/reduce properties -->
> >
> >  <property>
> >     <name>mapred.job.tracker</name>
> >     <value>localhost:9001</value>
> >  </property>
> >
> >  <property>
> >    <name>mapred.job.tracker.info.port</name>
> >    <value>50030</value>
> >    <description>The port that the MapReduce job tracker info webserver
> > runs
> > at.
> >    </description>
> >  </property>
> >
> >  <property>
> >    <name>mapred.task.tracker.output.port</name>
> >    <value>50040</value>
> >    <description>The port number that the MapReduce task tracker output
> > server uses as a starting
> >                point to look for a free port to listen on.
> >    </description>
> >  </property>
> >
> >  <property>
> >    <name>mapred.task.tracker.report.port</name>
> >    <value>50050</value>
> >    <description>The port number that the MapReduce task tracker report
> > server uses as a starting
> >                point to look for a free port to listen on.
> >    </description>
> >  </property>
> >
> >  <property>
> >    <name>tasktracker.http.port</name>
> >    <value>50060</value>
> >    <description>The default port for task trackers to use as their http
> > server.
> >    </description>
> >  </property>
> >
> >  <property>
> >     <name>mapred.local.dir</name>
> >     <value>/tmp/hadoop/mapred/local</value>
> >  </property>
> >
> >  <property>
> >    <name>mapred.temp.dir</name>
> >    <value>/tmp/hadoop/mapred/temp</value>
> >    <description>A shared directory for temporary files.
> >    </description>
> >  </property>
> >
> >  <property>
> >    <name>mapred.system.dir</name>
> >    <value>/tmp/hadoop/mapred/system</value>
> >    <description>The shared directory where MapReduce stores control
> files.
> >    </description>
> >  </property>
> >
> >  <property>
> >     <name>mapred.tasktracker.dns.nameserver</name>
> >     <value>192.168.0.1</value>
> >     <description>The host name or IP address of the name server (DNS)
> >       which a TaskTracker should use to determine the host name used by
> >       the JobTracker for communication and display purposes.
> >     </description>
> >  </property>
> >
> >  <property>
> >    <name>tasktracker.http.threads</name>
> >    <value>10</value>
> >    <description>The number of worker threads that for the http server.
> > This
> > is
> >                used for map output fetching
> >    </description>
> >  </property>
> >
> >  <property>
> >    <name>mapred.map.tasks</name>
> >    <value>10</value>
> >    <description>The default number of map tasks per job.  Typically set
> >    to a prime several times greater than number of available hosts.
> >    Ignored when mapred.job.tracker is "local".
> >    </description>
> >  </property>
> >
> >  <property>
> >    <name>mapred.reduce.tasks</name>
> >    <value>2</value>
> >    <description>The default number of reduce tasks per job.  Typically
> set
> >    to a prime close to the number of available hosts.  Ignored when
> >    mapred.job.tracker is "local".
> >    </description>
> >  </property>
> >
> >  <property>
> >    <name>mapred.reduce.parallel.copies</name>
> >    <value>5</value>
> >    <description>The default number of parallel transfers run by reduce
> >    during the copy(shuffle) phase.
> >    </description>
> >  </property>
> >
> > </configuration>
> >
> > also my log files having the WARN like in file
> > hadoop-root-namenode-mohanlal.qburst.local.log
> >
> > 2006-09-30 14:08:26,919 WARN  fs.FSNamesystem - Zero targets found,
> > forbidden1.size=1 forbidden2.size()=0
> >
> > please help me...................
> >
> > Regards
> > Mohan Lal
> > --
> > View this message in context:
> >
> http://www.nabble.com/Problem-in-Distributed-file-system-tf2360944.html#a6577395
> > Sent from the Nutch - User mailing list archive at Nabble.com.
> >
> >
>
>
>

Re: Problem in Distributed file system

Posted by nu...@dragonflymc.com.
Change your config to be as below.  Localhost will only work on the
machine the namenode is running on.  The other machines must point to the
server the namenode and jobtracker are running on.

Dennis

<configuration>
 <property>
    <name>fs.default.name</name>
    <value>machinename.domainname.com:9000</value>
 </property>

<property>

    <name>mapred.job.tracker</name>
    <value>machinename.domainname.com:9001</value>
 </property>


Dennis
>
> Hi all,
>           Im Using nutch 0.8.1, i have done distributed crawling having 3
> machines MASTER,  NODE1, NODE2
>
> In Master i can able to see the file system directories using the command
> bin/hadoop dfs -ls
> but in NODE1 and NODE2 i can not able to see the files system direcories,
> any one please help how can i saw the contents inside the NODE
> machines................
>
> my config
>
> slaves
> LOCALHOST
> NODE1
> NODE2
>
> hadoop-site.xml having
>
> <configuration>
>  <property>
>     <name>fs.default.name</name>
>     <value>localhost:9000</value>
>  </property>
>
>  <property>
>     <name>dfs.name.dir</name>
>     <value>/tmp/hadoop/dfs/name</value>
>  </property>
>
>  <property>
>     <name>dfs.data.dir</name>
>     <value>/tmp/hadoop/dfs/data</value>
>  </property>
>
>  <property>
>     <name>dfs.replication</name>
>     <value>2</value>
>  </property>
>
>  <property>
>     <name>dfs.datanode.port</name>
>     <value>50010</value>
>     <description>The port number that the dfs datanode server uses as a
> starting
> 	        point to look for a free port to listen on.
>     </description>
>  </property>
>
>  <property>
>    <name>dfs.info.port</name>
>    <value>50070</value>
>    <description>The base port number for the dfs namenode web
> ui.</description>
>  </property>
>
>  <property>
>    <name>dfs.datanode.dns.nameserver</name>
>    <value>192.168.0.1</value>
>    <description>The host name or IP address of the name server (DNS)
>       which a DataNode should use to determine the host name used by the
>       NameNode for communication and display purposes.
>     </description>
>  </property>
>
> <!-- map/reduce properties -->
>
>  <property>
>     <name>mapred.job.tracker</name>
>     <value>localhost:9001</value>
>  </property>
>
>  <property>
>    <name>mapred.job.tracker.info.port</name>
>    <value>50030</value>
>    <description>The port that the MapReduce job tracker info webserver
> runs
> at.
>    </description>
>  </property>
>
>  <property>
>    <name>mapred.task.tracker.output.port</name>
>    <value>50040</value>
>    <description>The port number that the MapReduce task tracker output
> server uses as a starting
>                point to look for a free port to listen on.
>    </description>
>  </property>
>
>  <property>
>    <name>mapred.task.tracker.report.port</name>
>    <value>50050</value>
>    <description>The port number that the MapReduce task tracker report
> server uses as a starting
>                point to look for a free port to listen on.
>    </description>
>  </property>
>
>  <property>
>    <name>tasktracker.http.port</name>
>    <value>50060</value>
>    <description>The default port for task trackers to use as their http
> server.
>    </description>
>  </property>
>
>  <property>
>     <name>mapred.local.dir</name>
>     <value>/tmp/hadoop/mapred/local</value>
>  </property>
>
>  <property>
>    <name>mapred.temp.dir</name>
>    <value>/tmp/hadoop/mapred/temp</value>
>    <description>A shared directory for temporary files.
>    </description>
>  </property>
>
>  <property>
>    <name>mapred.system.dir</name>
>    <value>/tmp/hadoop/mapred/system</value>
>    <description>The shared directory where MapReduce stores control files.
>    </description>
>  </property>
>
>  <property>
>     <name>mapred.tasktracker.dns.nameserver</name>
>     <value>192.168.0.1</value>
>     <description>The host name or IP address of the name server (DNS)
>   	which a TaskTracker should use to determine the host name used by
> 	the JobTracker for communication and display purposes.
>     </description>
>  </property>
>
>  <property>
>    <name>tasktracker.http.threads</name>
>    <value>10</value>
>    <description>The number of worker threads that for the http server.
> This
> is
>                used for map output fetching
>    </description>
>  </property>
>
>  <property>
>    <name>mapred.map.tasks</name>
>    <value>10</value>
>    <description>The default number of map tasks per job.  Typically set
>    to a prime several times greater than number of available hosts.
>    Ignored when mapred.job.tracker is "local".
>    </description>
>  </property>
>
>  <property>
>    <name>mapred.reduce.tasks</name>
>    <value>2</value>
>    <description>The default number of reduce tasks per job.  Typically set
>    to a prime close to the number of available hosts.  Ignored when
>    mapred.job.tracker is "local".
>    </description>
>  </property>
>
>  <property>
>    <name>mapred.reduce.parallel.copies</name>
>    <value>5</value>
>    <description>The default number of parallel transfers run by reduce
>    during the copy(shuffle) phase.
>    </description>
>  </property>
>
> </configuration>
>
> also my log files having the WARN like in file
> hadoop-root-namenode-mohanlal.qburst.local.log
>
> 2006-09-30 14:08:26,919 WARN  fs.FSNamesystem - Zero targets found,
> forbidden1.size=1 forbidden2.size()=0
>
> please help me...................
>
> Regards
> Mohan Lal
> --
> View this message in context:
> http://www.nabble.com/Problem-in-Distributed-file-system-tf2360944.html#a6577395
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
>