You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by toabhishek16 <to...@gmail.com> on 2008/09/22 10:13:39 UTC
Error in hadoop crawling
Hi to all,
I am trying to crawl using hadoop on the single machine. other command like
hadoop namenode -format and examples provided with hadoop are working fine.
But when I am trying to crawl using hadoop its giving error which I am
pasting below
[root@localhost nutch-0.9]# ./bin/nutch crawl urls/ -dir aaa
Exception in thread "main" java.net.SocketTimeoutException: timed out
waiting for rpc response
at org.apache.hadoop.ipc.Client.call(Client.java:473)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:163)
at org.apache.hadoop.dfs.$Proxy0.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:247)
at org.apache.hadoop.dfs.DFSClient.<init>(DFSClient.java:105)
at
org.apache.hadoop.dfs.DistributedFileSystem$RawDistributedFileSystem.initialize(DistributedFileSystem.java:67)
at
org.apache.hadoop.fs.FilterFileSystem.initialize(FilterFileSystem.java:57)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:160)
at org.apache.hadoop.fs.FileSystem.getNamed(FileSystem.java:119)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:91)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:83)
Please help me to solve this problem....
Thanks in advance.
--
View this message in context: http://www.nabble.com/Error-in-hadoop-crawling-tp19603532p19603532.html
Sent from the Nutch - User mailing list archive at Nabble.com.
AW: Error in hadoop crawling
Posted by Alexander Dick <al...@dick.at>.
Hi,
I don't know if this is the solution for your problem, but you should add at
least the depth parameter:
# ./bin/nutch crawl urls/ -dir aaa -depth 10 -threads 20
Cheers
Alex
> -----Ursprüngliche Nachricht-----
> Von: toabhishek16 [mailto:toabhishek16@gmail.com]
> Gesendet: Montag, 22. September 2008 10:14
> An: nutch-user@lucene.apache.org
> Betreff: Error in hadoop crawling
>
>
> Hi to all,
> I am trying to crawl using hadoop on the single machine. other command
like
> hadoop namenode -format and examples provided with hadoop are working
fine.
> But when I am trying to crawl using hadoop its giving error which I am
> pasting below
>
> [root@localhost nutch-0.9]# ./bin/nutch crawl urls/ -dir aaa
> Exception in thread "main" java.net.SocketTimeoutException: timed out
> waiting for rpc response
> at org.apache.hadoop.ipc.Client.call(Client.java:473)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:163)
> at org.apache.hadoop.dfs.$Proxy0.getProtocolVersion(Unknown
Source)
> at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:247)
> at org.apache.hadoop.dfs.DFSClient.<init>(DFSClient.java:105)
> at
>
org.apache.hadoop.dfs.DistributedFileSystem$RawDistributedFileSystem.initial
ize(Dist
> ributedFileSystem.java:67)
> at
> org.apache.hadoop.fs.FilterFileSystem.initialize(FilterFileSystem.java:57)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:160)
> at org.apache.hadoop.fs.FileSystem.getNamed(FileSystem.java:119)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:91)
> at org.apache.nutch.crawl.Crawl.main(Crawl.java:83)
>
> Please help me to solve this problem....
>
> Thanks in advance.
> --
> View this message in context:
http://www.nabble.com/Error-in-hadoop-crawling-
> tp19603532p19603532.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
>
> __________ NOD32 3458 (20080921) Information __________
>
> This message was checked by NOD32 antivirus system.
> http://www.eset.com