You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by Dennis Kubes <nu...@dragonflymc.com> on 2006/03/17 00:32:06 UTC

Help Setting Up Nutch 0.8 Distributed

I am having trouble getting Nutch to work using the DFS.  I pulled Nutch 0.8
from SVN and build it just
fine using eclipse.  I was able to set it up on a Whitebox Enterprise Linux
3 Respin 2 box (800 Mghz,
512M ram) and do a crawl using the local file-system.  I was able to setup
the was inside of tomcat
and search the local index.
 
I then tried to switch to using the DFS.  I was running everything as a
nutch user.  I have a password-less
login to the local machine.  I am using the options below in my
hadoop-site.xml file.  When I run start-all.sh
I get some weird output but doing a ps -ef | grep java shows 2 java threads
running.  Then when I try to do
a crawl it errors out.  
 
Anybody got any ideas.
 
Dennis
 
  
hadoop-site.xml
----------------------------------------------------------------------------
------------------

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
 
<!-- Put site-specific property overrides in this file. -->
 
<configuration>
 
<property>
  <name>fs.default.name</name>
  <value>localhost:9000</value>
  <description>
    The name of the default file system. Either the literal string
    "local" or a host:port for NDFS.
  </description>
</property>
 
<property>
  <name>mapred.job.tracker</name>
  <value>localhost:9001</value>
  <description>
    The host and port that the MapReduce job tracker runs at. If
    "local", then jobs are run in-process as a single map and
    reduce task.
  </description>
</property>
 
<property>
  <name>mapred.map.tasks</name>
  <value>2</value>
  <description>
    define mapred.map tasks to be number of slave hosts
  </description>
</property>
 
<property>
  <name>mapred.reduce.tasks</name>
  <value>2</value>
  <description>
    define mapred.reduce tasks to be number of slave hosts
  </description>
</property>
 
<property>
  <name>dfs.name.dir</name>
  <value>/nutch/filesystem/name</value>
</property>
 
<property>
  <name>dfs.data.dir</name>
  <value>/nutch/filesystem/data</value>
</property>
 
<property>
  <name>mapred.local.dir</name>
  <value>/nutch/filesystem/mapreduce</value>
</property>
 
</configuration>

 
log of startup
----------------------------------------------------------------------------
------------------
localhost:9000: command-line: line 0: Bad configuration option:
ConnectTimeout
devcluster02:9000: command-line: line 0: Bad configuration option:
ConnectTimeout
starting namenode, logging to
/nutch/search/bin/../logs/hadoop-nutch-namenode-devcluster01.visvo.com.log
: command not foundadoop: line 2:
: command not foundadoop: line 7:
: command not foundadoop: line 10:
: command not foundadoop: line 13:
: command not foundadoop: line 16:
: command not foundadoop: line 19:
: command not foundadoop: line 22:
: command not foundadoop: line 25:
: command not foundadoop: line 28:
: command not foundadoop: line 31:
starting jobtracker, logging to
/nutch/search/bin/../logs/hadoop-nutch-jobtracke
r-devcluster01.visvo.com.log
: command not foundadoop: line 2:
: command not foundadoop: line 7:
: command not foundadoop: line 10:
: command not foundadoop: line 13:
: command not foundadoop: line 16:
: command not foundadoop: line 19:
: command not foundadoop: line 22:
: command not foundadoop: line 25:
: command not foundadoop: line 28:
: command not foundadoop: line 31:
localhost:9000: command-line: line 0: Bad configuration option:
ConnectTimeout
devcluster02:9000: command-line: line 0: Bad configuration option:
ConnectTimeout

ps -ef | grep java
----------------------------------------------------------------------------
------------------
[nutch@devcluster01 search]$ ps -ef | grep java
nutch     9907     1  2 17:26 pts/0    00:00:02
/usr/java/jdk1.5.0_06/bin/java -Xmx1000m -classpath
/nutch/search/conf:/usr/java/jdk1.5.0_06/lib/tools.jar:/nutch/search:/nutch/
search/hadoop-*.jar:/nutch/search/lib/commons-lang-2.1.jar:/nutch/search/lib
/commons-logging-api-1.0.4.jar:/nutch/search/lib/concurrent-1.3.4.ja
nutch     9945     1  8 17:27 pts/0    00:00:07
/usr/java/jdk1.5.0_06/bin/java -Xmx1000m -classpath
/nutch/search/conf:/usr/java/jdk1.5.0_06/lib/tools.jar:/nutch/search:/nutch/
search/hadoop-*.jar:/nutch/search/lib/commons-lang-2.1.jar:/nutch/search/lib
/commons-logging-api-1.0.4.jar:/nutch/search/lib/concurrent-1.3.4.ja
nutch    10028  9771  0 17:28 pts/0    00:00:00 grep java

  
Errors when running crawl 
----------------------------------------------------------------------------
------------------
[nutch@devcluster01 search]$ bin/nutch crawl urls -depth 3 -topN 50
060316 173158 parsing
jar:file:/nutch/search/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
060316 173158 parsing file:/nutch/search/conf/nutch-default.xml
060316 173159 parsing file:/nutch/search/conf/crawl-tool.xml
060316 173159 parsing
jar:file:/nutch/search/lib/hadoop-0.1-dev.jar!/mapred-default.xml
060316 173159 parsing file:/nutch/search/conf/nutch-site.xml
060316 173159 parsing file:/nutch/search/conf/hadoop-site.xml
060316 173159 Client connection to 127.0.0.1:9000: starting
060316 173159 crawl started in: crawl-20060316173159
060316 173159 rootUrlDir = urls
060316 173159 threads = 10
060316 173159 depth = 3
060316 173159 topN = 50
060316 173159 Injector: starting
060316 173159 Injector: crawlDb: crawl-20060316173159/crawldb
060316 173159 Injector: urlDir: urls
060316 173159 Injector: Converting injected urls to crawl db entries.
060316 173159 parsing
jar:file:/nutch/search/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
060316 173159 parsing file:/nutch/search/conf/nutch-default.xml
060316 173159 parsing file:/nutch/search/conf/crawl-tool.xml
060316 173159 parsing
jar:file:/nutch/search/lib/hadoop-0.1-dev.jar!/mapred-default.xml
060316 173159 parsing
jar:file:/nutch/search/lib/hadoop-0.1-dev.jar!/mapred-default.xml
060316 173159 parsing file:/nutch/search/conf/nutch-site.xml
060316 173159 parsing file:/nutch/search/conf/hadoop-site.xml
060316 173200 Client connection to 127.0.0.1:9001: starting
060316 173200 Client connection to 127.0.0.1:9000: starting
060316 173200 parsing
jar:file:/nutch/search/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
060316 173200 parsing file:/nutch/search/conf/hadoop-site.xml
Exception in thread "main" java.io.IOException: Cannot create file
/tmp/hadoop/mapred/system/submit_wdapr7/job.jar on client
DFSClient_1136455260
        at org.apache.hadoop.ipc.Client.call(Client.java:301)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:141)
        at org.apache.hadoop.dfs.$Proxy0.create(Unknown Source)
        at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSCli
ent.java:587)
        at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:556)
        at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:99)
        at
org.apache.hadoop.dfs.DistributedFileSystem.createRaw(DistributedFileSystem.
java:71)
        at
org.apache.hadoop.fs.FSDataOutputStream$Summer.<init>(FSDataOutputStream.jav
a:39)
        at
org.apache.hadoop.fs.FSDataOutputStream.<init>(FSDataOutputStream.java:128)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:180)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:168)
        at
org.apache.hadoop.dfs.DistributedFileSystem.doFromLocalFile(DistributedFileS
ystem.java:156)
        at
org.apache.hadoop.dfs.DistributedFileSystem.copyFromLocalFile(DistributedFil
eSystem.java:131)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:247)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:294)
        at org.apache.nutch.crawl.Injector.inject(Injector.java:114)
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)

RE: Help Setting Up Nutch 0.8 Distributed

Posted by Dennis Kubes <nu...@dragonflymc.com>.

My problem was that I was using the distributed file system but
trying to start the crawl from a local directory instead of first
uploading the crawl list to the distributed filesystem.  Once I 
uploaded it, it worked fine.

Dennis 

-----Original Message-----
From: Marko Bauhardt [mailto:mb@media-style.com] 
Sent: Saturday, March 18, 2006 5:48 AM
To: nutch-user@lucene.apache.org
Subject: Re: Help Setting Up Nutch 0.8 Distributed


Am 17.03.2006 um 17:20 schrieb Dennis Kubes:

> Exception in thread "main" java.io.IOException: Job failed!
>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java: 
> 310)
>         at org.apache.nutch.crawl.Injector.inject(Injector.java:114)
>         at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)

The Injector fails.

> java.io.IOException: No input directories specified in: Configuration:
> defaults: hadoop-default.xml , mapred-default.xml


The Injector does not found the directory with the inject files.

Marko

Re: Help Setting Up Nutch 0.8 Distributed

Posted by Marko Bauhardt <mb...@media-style.com>.

Am 17.03.2006 um 17:20 schrieb Dennis Kubes:

> Exception in thread "main" java.io.IOException: Job failed!
>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java: 
> 310)
>         at org.apache.nutch.crawl.Injector.inject(Injector.java:114)
>         at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)

The Injector fails.

> java.io.IOException: No input directories specified in: Configuration:
> defaults: hadoop-default.xml , mapred-default.xml


The Injector does not found the directory with the inject files.

Marko

RE: Help Setting Up Nutch 0.8 Distributed

Posted by Dennis Kubes <nu...@dragonflymc.com>.

That was just specifying the command line wrong.  It starts the crawl not
but just stalls:

060317 101821 parsing file:/nutch/search/conf/hadoop-site.xml
060317 101829 Running job: job_1ko8i3
060317 101830  map 0%  reduce 0%

I am seeing this a lot in the namenode log:

060317 102009 Zero targets found, forbidden1.size=1 forbidden2.size()=0
060317 102009 Zero targets found, forbidden1.size=1 forbidden2.size()=0

-----Original Message-----
From: Dennis Kubes [mailto:nutch-dev@dragonflymc.com] 
Sent: Friday, March 17, 2006 9:55 AM
To: nutch-user@lucene.apache.org
Subject: RE: Help Setting Up Nutch 0.8 Distributed

Ok, the servers are starting now but when I try to do a crawl I am getting
an error like below.  I think that I am missing a configuration option, but
I don't know which one.  I have included my hadoop-site.xml as well.

error upon crawl:
060317 093312 Client connection to 127.0.0.1:9000: starting
060317 093312 parsing
jar:file:/nutch/search/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
060317 093312 parsing file:/nutch/search/conf/hadoop-site.xml
060317 093322 Running job: job_c78m3c
060317 093323  map 100%  reduce 100%
Exception in thread "main" java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:310)
        at org.apache.nutch.crawl.Injector.inject(Injector.java:114)
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)

job tracker log file:
060317 093322 parsing
jar:file:/nutch/search/lib/hadoop-0.1-dev.jar!/mapred-default.xml
060317 093322 parsing
/nutch/filesystem/mapreduce/local/job_c78m3c.xml/jobTracker
060317 093322 parsing file:/nutch/search/conf/hadoop-site.xml
060317 093322 job init failed
java.io.IOException: No input directories specified in: Configuration:
defaults: hadoop-default.xml , mapred-default.xml ,
/nutch/filesystem/mapreduce/local/
job_c78m3c.xml/jobTrackerfinal: hadoop-site.xml
        at
org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java:84)
        at
org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java:94)
        at
org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:127)
        at
org.apache.hadoop.mapred.JobTracker$JobInitThread.run(JobTracker.java:208)
        at java.lang.Thread.run(Thread.java:595)
Exception in thread "Thread-21" java.lang.NullPointerException
        at
org.apache.hadoop.mapred.JobInProgress.kill(JobInProgress.java:437)
        at
org.apache.hadoop.mapred.JobTracker$JobInitThread.run(JobTracker.java:212)
        at java.lang.Thread.run(Thread.java:595)
060317 093325 Server connection on port 9001 from 127.0.0.1: exiting

hadoop-site.xml:
<property>
  <name>mapred.job.tracker</name>
  <value>localhost:9001</value>
  <description>
    The host and port that the MapReduce job tracker runs at. If
    "local", then jobs are run in-process as a single map and
    reduce task.
  </description>
</property>

<property>
  <name>mapred.map.tasks</name>
  <value>2</value>
  <description>
    define mapred.map tasks to be number of slave hosts
  </description>
</property>

<property>
  <name>mapred.reduce.tasks</name>
  <value>2</value>
  <description>
    define mapred.reduce tasks to be number of slave hosts
  </description>
</property>

<property>
  <name>dfs.name.dir</name>
  <value>/nutch/filesystem/name</value>
</property>

<property>
  <name>dfs.data.dir</name>
  <value>/nutch/filesystem/data</value>
</property>

<property>
  <name>mapred.system.dir</name>
  <value>/nutch/filesystem/mapreduce/system</value>
</property>

<property>
  <name>mapred.local.dir</name>
  <value>/nutch/filesystem/mapreduce/local</value>
</property>
 

-----Original Message-----
From: Dennis Kubes [mailto:nutch-dev@dragonflymc.com] 
Sent: Friday, March 17, 2006 9:05 AM
To: nutch-user@lucene.apache.org
Subject: RE: Help Setting Up Nutch 0.8 Distributed

I got one of the issues fixed.  The output like below is caused by the
hadoop-env.sh
file being in dos format and not being executable.  A dos2unix and chmod 700
fixed 
the command not found output.  Still working on why the server won't start.

caused by hadoop-env.sh in dos format and not being executable:

: command not found line 2:
: command not found line 7:
: command not found line 10:
: command not found line 13:
: command not found line 16:
: command not found line 20:
: command not found line 23:
: command not found line 26:
: command not found line 29:
: command not found line 32:

Dennis

-----Original Message-----
From: Doug Cutting [mailto:cutting@apache.org] 
Sent: Thursday, March 16, 2006 6:50 PM
To: nutch-user@lucene.apache.org
Subject: Re: Help Setting Up Nutch 0.8 Distributed

Dennis Kubes wrote:
> : command not foundlaves.sh: line 29:
> : command not foundlaves.sh: line 32:
> localhost: ssh: \015: Name or service not known
> devcluster02: ssh: \015: Name or service not known
> 
> And still getting this error:
> 
> 060316 175355 parsing file:/nutch/search/conf/hadoop-site.xml
> Exception in thread "main" java.io.IOException: Cannot create file
> /tmp/hadoop/mapred/system/submit_mmuodk/job.jar on client
> DFSClient_-913777457
>         at org.apache.hadoop.ipc.Client.call(Client.java:301)
>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:141)
>         at org.apache.hadoop.dfs.$Proxy0.create(Unknown Source)
>         at
>
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSCli
> ent.java:587)
>         at org
> 
> My ssh version is:
> 
> openssh-clients-3.6.1p2-33.30.3
> openssh-server-3.6.1p2-33.30.3
> openssh-askpass-gnome-3.6.1p2-33.30.3
> openssh-3.6.1p2-33.30.3
> openssh-askpass-3.6.1p2-33.30.3
> 
> Is it something to do with my slaves file?

The \015 looks like a file has a CR where perhaps an LF is expected? 
What does 'od -c conf/slaves' print?  What happens when you try 
something like 'bin/slaves uptime'?

Doug

RE: Help Setting Up Nutch 0.8 Distributed

Posted by Dennis Kubes <nu...@dragonflymc.com>.

Ok, the servers are starting now but when I try to do a crawl I am getting
an error like below.  I think that I am missing a configuration option, but
I don't know which one.  I have included my hadoop-site.xml as well.

error upon crawl:
060317 093312 Client connection to 127.0.0.1:9000: starting
060317 093312 parsing
jar:file:/nutch/search/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
060317 093312 parsing file:/nutch/search/conf/hadoop-site.xml
060317 093322 Running job: job_c78m3c
060317 093323  map 100%  reduce 100%
Exception in thread "main" java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:310)
        at org.apache.nutch.crawl.Injector.inject(Injector.java:114)
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)

job tracker log file:
060317 093322 parsing
jar:file:/nutch/search/lib/hadoop-0.1-dev.jar!/mapred-default.xml
060317 093322 parsing
/nutch/filesystem/mapreduce/local/job_c78m3c.xml/jobTracker
060317 093322 parsing file:/nutch/search/conf/hadoop-site.xml
060317 093322 job init failed
java.io.IOException: No input directories specified in: Configuration:
defaults: hadoop-default.xml , mapred-default.xml ,
/nutch/filesystem/mapreduce/local/
job_c78m3c.xml/jobTrackerfinal: hadoop-site.xml
        at
org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java:84)
        at
org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java:94)
        at
org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:127)
        at
org.apache.hadoop.mapred.JobTracker$JobInitThread.run(JobTracker.java:208)
        at java.lang.Thread.run(Thread.java:595)
Exception in thread "Thread-21" java.lang.NullPointerException
        at
org.apache.hadoop.mapred.JobInProgress.kill(JobInProgress.java:437)
        at
org.apache.hadoop.mapred.JobTracker$JobInitThread.run(JobTracker.java:212)
        at java.lang.Thread.run(Thread.java:595)
060317 093325 Server connection on port 9001 from 127.0.0.1: exiting

hadoop-site.xml:
<property>
  <name>mapred.job.tracker</name>
  <value>localhost:9001</value>
  <description>
    The host and port that the MapReduce job tracker runs at. If
    "local", then jobs are run in-process as a single map and
    reduce task.
  </description>
</property>

<property>
  <name>mapred.map.tasks</name>
  <value>2</value>
  <description>
    define mapred.map tasks to be number of slave hosts
  </description>
</property>

<property>
  <name>mapred.reduce.tasks</name>
  <value>2</value>
  <description>
    define mapred.reduce tasks to be number of slave hosts
  </description>
</property>

<property>
  <name>dfs.name.dir</name>
  <value>/nutch/filesystem/name</value>
</property>

<property>
  <name>dfs.data.dir</name>
  <value>/nutch/filesystem/data</value>
</property>

<property>
  <name>mapred.system.dir</name>
  <value>/nutch/filesystem/mapreduce/system</value>
</property>

<property>
  <name>mapred.local.dir</name>
  <value>/nutch/filesystem/mapreduce/local</value>
</property>
 

-----Original Message-----
From: Dennis Kubes [mailto:nutch-dev@dragonflymc.com] 
Sent: Friday, March 17, 2006 9:05 AM
To: nutch-user@lucene.apache.org
Subject: RE: Help Setting Up Nutch 0.8 Distributed

I got one of the issues fixed.  The output like below is caused by the
hadoop-env.sh
file being in dos format and not being executable.  A dos2unix and chmod 700
fixed 
the command not found output.  Still working on why the server won't start.

caused by hadoop-env.sh in dos format and not being executable:

: command not found line 2:
: command not found line 7:
: command not found line 10:
: command not found line 13:
: command not found line 16:
: command not found line 20:
: command not found line 23:
: command not found line 26:
: command not found line 29:
: command not found line 32:

Dennis

-----Original Message-----
From: Doug Cutting [mailto:cutting@apache.org] 
Sent: Thursday, March 16, 2006 6:50 PM
To: nutch-user@lucene.apache.org
Subject: Re: Help Setting Up Nutch 0.8 Distributed

Dennis Kubes wrote:
> : command not foundlaves.sh: line 29:
> : command not foundlaves.sh: line 32:
> localhost: ssh: \015: Name or service not known
> devcluster02: ssh: \015: Name or service not known
> 
> And still getting this error:
> 
> 060316 175355 parsing file:/nutch/search/conf/hadoop-site.xml
> Exception in thread "main" java.io.IOException: Cannot create file
> /tmp/hadoop/mapred/system/submit_mmuodk/job.jar on client
> DFSClient_-913777457
>         at org.apache.hadoop.ipc.Client.call(Client.java:301)
>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:141)
>         at org.apache.hadoop.dfs.$Proxy0.create(Unknown Source)
>         at
>
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSCli
> ent.java:587)
>         at org
> 
> My ssh version is:
> 
> openssh-clients-3.6.1p2-33.30.3
> openssh-server-3.6.1p2-33.30.3
> openssh-askpass-gnome-3.6.1p2-33.30.3
> openssh-3.6.1p2-33.30.3
> openssh-askpass-3.6.1p2-33.30.3
> 
> Is it something to do with my slaves file?

The \015 looks like a file has a CR where perhaps an LF is expected? 
What does 'od -c conf/slaves' print?  What happens when you try 
something like 'bin/slaves uptime'?

Doug

RE: Help Setting Up Nutch 0.8 Distributed

Posted by Dennis Kubes <nu...@dragonflymc.com>.

I got one of the issues fixed.  The output like below is caused by the
hadoop-env.sh
file being in dos format and not being executable.  A dos2unix and chmod 700
fixed 
the command not found output.  Still working on why the server won't start.

caused by hadoop-env.sh in dos format and not being executable:

: command not found line 2:
: command not found line 7:
: command not found line 10:
: command not found line 13:
: command not found line 16:
: command not found line 20:
: command not found line 23:
: command not found line 26:
: command not found line 29:
: command not found line 32:

Dennis

-----Original Message-----
From: Doug Cutting [mailto:cutting@apache.org] 
Sent: Thursday, March 16, 2006 6:50 PM
To: nutch-user@lucene.apache.org
Subject: Re: Help Setting Up Nutch 0.8 Distributed

Dennis Kubes wrote:
> : command not foundlaves.sh: line 29:
> : command not foundlaves.sh: line 32:
> localhost: ssh: \015: Name or service not known
> devcluster02: ssh: \015: Name or service not known
> 
> And still getting this error:
> 
> 060316 175355 parsing file:/nutch/search/conf/hadoop-site.xml
> Exception in thread "main" java.io.IOException: Cannot create file
> /tmp/hadoop/mapred/system/submit_mmuodk/job.jar on client
> DFSClient_-913777457
>         at org.apache.hadoop.ipc.Client.call(Client.java:301)
>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:141)
>         at org.apache.hadoop.dfs.$Proxy0.create(Unknown Source)
>         at
>
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSCli
> ent.java:587)
>         at org
> 
> My ssh version is:
> 
> openssh-clients-3.6.1p2-33.30.3
> openssh-server-3.6.1p2-33.30.3
> openssh-askpass-gnome-3.6.1p2-33.30.3
> openssh-3.6.1p2-33.30.3
> openssh-askpass-3.6.1p2-33.30.3
> 
> Is it something to do with my slaves file?

The \015 looks like a file has a CR where perhaps an LF is expected? 
What does 'od -c conf/slaves' print?  What happens when you try 
something like 'bin/slaves uptime'?

Doug

Re: Help Setting Up Nutch 0.8 Distributed

Posted by Doug Cutting <cu...@apache.org>.

Dennis Kubes wrote:
> : command not foundlaves.sh: line 29:
> : command not foundlaves.sh: line 32:
> localhost: ssh: \015: Name or service not known
> devcluster02: ssh: \015: Name or service not known
> 
> And still getting this error:
> 
> 060316 175355 parsing file:/nutch/search/conf/hadoop-site.xml
> Exception in thread "main" java.io.IOException: Cannot create file
> /tmp/hadoop/mapred/system/submit_mmuodk/job.jar on client
> DFSClient_-913777457
>         at org.apache.hadoop.ipc.Client.call(Client.java:301)
>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:141)
>         at org.apache.hadoop.dfs.$Proxy0.create(Unknown Source)
>         at
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSCli
> ent.java:587)
>         at org
> 
> My ssh version is:
> 
> openssh-clients-3.6.1p2-33.30.3
> openssh-server-3.6.1p2-33.30.3
> openssh-askpass-gnome-3.6.1p2-33.30.3
> openssh-3.6.1p2-33.30.3
> openssh-askpass-3.6.1p2-33.30.3
> 
> Is it something to do with my slaves file?

The \015 looks like a file has a CR where perhaps an LF is expected? 
What does 'od -c conf/slaves' print?  What happens when you try 
something like 'bin/slaves uptime'?

Doug

RE: Help Setting Up Nutch 0.8 Distributed

Posted by Dennis Kubes <nu...@dragonflymc.com>.

Now I am getting this:

...
: command not foundlaves.sh: line 29:
: command not foundlaves.sh: line 32:
localhost: ssh: \015: Name or service not known
devcluster02: ssh: \015: Name or service not known

And still getting this error:

060316 175355 parsing file:/nutch/search/conf/hadoop-site.xml
Exception in thread "main" java.io.IOException: Cannot create file
/tmp/hadoop/mapred/system/submit_mmuodk/job.jar on client
DFSClient_-913777457
        at org.apache.hadoop.ipc.Client.call(Client.java:301)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:141)
        at org.apache.hadoop.dfs.$Proxy0.create(Unknown Source)
        at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSCli
ent.java:587)
        at org

My ssh version is:

openssh-clients-3.6.1p2-33.30.3
openssh-server-3.6.1p2-33.30.3
openssh-askpass-gnome-3.6.1p2-33.30.3
openssh-3.6.1p2-33.30.3
openssh-askpass-3.6.1p2-33.30.3

Is it something to do with my slaves file?

-----Original Message-----
From: Doug Cutting [mailto:cutting@apache.org] 
Sent: Thursday, March 16, 2006 5:46 PM
To: nutch-user@lucene.apache.org
Subject: Re: Help Setting Up Nutch 0.8 Distributed

Dennis Kubes wrote:
> localhost:9000: command-line: line 0: Bad configuration option:
> ConnectTimeout
> devcluster02:9000: command-line: line 0: Bad configuration option:
> ConnectTimeout
[ ... ]
> localhost:9000: command-line: line 0: Bad configuration option:
> ConnectTimeout
> devcluster02:9000: command-line: line 0: Bad configuration option:
> ConnectTimeout

The launch of the datanodes and tasktrackers failed, since your version 
of ssh does not support the ConnectTimeout option.  Edit 
conf/nutch-env.sh, and add a 'export HADOOP_SSH_OPTS=' line to remove 
this option.

Doug

Re: Help Setting Up Nutch 0.8 Distributed

Posted by Doug Cutting <cu...@apache.org>.

Dennis Kubes wrote:
> localhost:9000: command-line: line 0: Bad configuration option:
> ConnectTimeout
> devcluster02:9000: command-line: line 0: Bad configuration option:
> ConnectTimeout
[ ... ]
> localhost:9000: command-line: line 0: Bad configuration option:
> ConnectTimeout
> devcluster02:9000: command-line: line 0: Bad configuration option:
> ConnectTimeout

The launch of the datanodes and tasktrackers failed, since your version 
of ssh does not support the ConnectTimeout option.  Edit 
conf/nutch-env.sh, and add a 'export HADOOP_SSH_OPTS=' line to remove 
this option.

Doug