You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by toabhishek16 <to...@gmail.com> on 2008/09/22 10:13:39 UTC

Error in hadoop crawling

Hi to all,
I am trying to crawl using hadoop on the single machine. other command like
hadoop namenode -format and examples provided with hadoop are working fine.
But when I am trying to crawl using hadoop its giving error which I am
pasting below

[root@localhost nutch-0.9]# ./bin/nutch crawl urls/ -dir aaa
Exception in thread "main" java.net.SocketTimeoutException: timed out
waiting for rpc response
        at org.apache.hadoop.ipc.Client.call(Client.java:473)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:163)
        at org.apache.hadoop.dfs.$Proxy0.getProtocolVersion(Unknown Source)
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:247)
        at org.apache.hadoop.dfs.DFSClient.<init>(DFSClient.java:105)
        at
org.apache.hadoop.dfs.DistributedFileSystem$RawDistributedFileSystem.initialize(DistributedFileSystem.java:67)
        at
org.apache.hadoop.fs.FilterFileSystem.initialize(FilterFileSystem.java:57)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:160)
        at org.apache.hadoop.fs.FileSystem.getNamed(FileSystem.java:119)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:91)
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:83)

Please help me to solve this problem....

Thanks in advance.
-- 
View this message in context: http://www.nabble.com/Error-in-hadoop-crawling-tp19603532p19603532.html
Sent from the Nutch - User mailing list archive at Nabble.com.

AW: Error in hadoop crawling

Posted by Alexander Dick <al...@dick.at>.

Hi, 

I don't know if this is the solution for your problem, but you should add at
least the depth parameter:
# ./bin/nutch crawl urls/ -dir aaa -depth 10 -threads 20

Cheers
Alex

> -----Ursprüngliche Nachricht-----
> Von: toabhishek16 [mailto:toabhishek16@gmail.com]
> Gesendet: Montag, 22. September 2008 10:14
> An: nutch-user@lucene.apache.org
> Betreff: Error in hadoop crawling
> 
> 
> Hi to all,
> I am trying to crawl using hadoop on the single machine. other command
like
> hadoop namenode -format and examples provided with hadoop are working
fine.
> But when I am trying to crawl using hadoop its giving error which I am
> pasting below
> 
> [root@localhost nutch-0.9]# ./bin/nutch crawl urls/ -dir aaa
> Exception in thread "main" java.net.SocketTimeoutException: timed out
> waiting for rpc response
>         at org.apache.hadoop.ipc.Client.call(Client.java:473)
>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:163)
>         at org.apache.hadoop.dfs.$Proxy0.getProtocolVersion(Unknown
Source)
>         at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:247)
>         at org.apache.hadoop.dfs.DFSClient.<init>(DFSClient.java:105)
>         at
>
org.apache.hadoop.dfs.DistributedFileSystem$RawDistributedFileSystem.initial
ize(Dist
> ributedFileSystem.java:67)
>         at
> org.apache.hadoop.fs.FilterFileSystem.initialize(FilterFileSystem.java:57)
>         at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:160)
>         at org.apache.hadoop.fs.FileSystem.getNamed(FileSystem.java:119)
>         at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:91)
>         at org.apache.nutch.crawl.Crawl.main(Crawl.java:83)
> 
> Please help me to solve this problem....
> 
> Thanks in advance.
> --
> View this message in context:
http://www.nabble.com/Error-in-hadoop-crawling-
> tp19603532p19603532.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
> 
> 
> __________ NOD32 3458 (20080921) Information __________
> 
> This message was checked by NOD32 antivirus system.
> http://www.eset.com