You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Dima Mazmanov <nu...@proservice.ge> on 2006/03/01 10:18:28 UTC

Crawl Exception(NullPointerException)

This is what am I doing :

./hadoop namenode &
./hadoop datanode &
./hadoop jobtracker &
./hadoop tasktracker &

(in directrory seeds I placed url.txt)
./hadoop dfs - put seeds seeds
./nutch crawl seeds -dir tmpdir -depth 3
And after all of this I get following

060301 140839 parsing 
file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060301 140839 parsing 
file:/usr/home/duche/nutch-nightly/conf/nutch-default.xml
060301 140839 parsing file:/usr/home/duche/nutch-nightly/conf/crawl-tool.xml
060301 140839 parsing 
file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml
060301 140839 parsing file:/usr/home/duche/nutch-nightly/conf/nutch-site.xml
060301 140839 parsing 
file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060301 140839 Client connection to 127.0.0.1:9000: starting
060301 140839 crawl started in: crawled
060301 140839 rootUrlDir = seeds
060301 140839 threads = 10
060301 140839 depth = 3
060301 140839 Injector: starting
060301 140839 Injector: crawlDb: crawled/crawldb
060301 140839 Injector: urlDir: seeds
060301 140839 Injector: Converting injected urls to crawl db entries.
060301 140839 parsing 
file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060301 140839 parsing 
file:/usr/home/duche/nutch-nightly/conf/nutch-default.xml
060301 140839 parsing file:/usr/home/duche/nutch-nightly/conf/crawl-tool.xml
060301 140839 parsing 
file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml
060301 140839 parsing 
file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml
060301 140839 parsing file:/usr/home/duche/nutch-nightly/conf/nutch-site.xml
060301 140839 parsing 
file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060301 140839 Client connection to 127.0.0.1:9001: starting
060301 140839 Client connection to 127.0.0.1:9000: starting
060301 140839 parsing 
file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060301 140839 parsing 
file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060301 140844 Running job: job_oewx5k
060301 140845  map 0%  reduce 0%
java.io.IOException: java.lang.NullPointerException
        at 
org.apache.hadoop.mapred.JobInProgress.obtainNewMapTask(JobInProgress.java:274)
        at 
org.apache.hadoop.mapred.JobTracker.pollForNewTask(JobTracker.java:534)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:324)
        at org.apache.hadoop.ipc.RPC$1.call(RPC.java:208)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:200)
java.lang.reflect.UndeclaredThrowableException
        at org.apache.hadoop.mapred.$Proxy0.pollForNewTask(Unknown Source)
        at 
org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:250)
        at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:313)
        at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:684)
Caused by: java.io.IOException: java.lang.NullPointerException
        at org.apache.hadoop.ipc.Client.call(Client.java:301)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:141)
        ... 4 more
060301 140844 Lost connection to JobTracker [localhost/127.0.0.1:9001]. 
Retrying...
060301 140849 parsing 
file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060301 140849 parsing 
file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml
060301 140849 parsing 
/usr/home/duche/nutch-nightly/hadoop/mapred/local/jobTracker/job_oewx5k.xml
060301 140849 parsing 
file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
java.io.IOException: Not a file: /user/root/seeds/seeds
        at 
org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java:99)
        at 
org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:125)
        at 
org.apache.hadoop.mapred.JobInProgress.obtainNewMapTask(JobInProgress.java:256)
        at 
org.apache.hadoop.mapred.JobTracker.pollForNewTask(JobTracker.java:534)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:324)
        at org.apache.hadoop.ipc.RPC$1.call(RPC.java:208)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:200)
060301 140849 Cannot create task split for job_oewx5k
060301 140849 Server handler 4 on 9001 call error: java.io.IOException: 
java.lang.NullPointerException
java.io.IOException: java.lang.NullPointerException
        at 
org.apache.hadoop.mapred.JobInProgress.obtainNewMapTask(JobInProgress.java:274)
        at 
org.apache.hadoop.mapred.JobTracker.pollForNewTask(JobTracker.java:534)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:324)
        at org.apache.hadoop.ipc.RPC$1.call(RPC.java:208)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:200)
java.lang.reflect.UndeclaredThrowableException
        at org.apache.hadoop.mapred.$Proxy0.pollForNewTask(Unknown Source)
        at 
org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:250)
        at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:313)
        at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:684)
Caused by: java.io.IOException: java.lang.NullPointerException
        at org.apache.hadoop.ipc.Client.call(Client.java:301)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:141)
        ... 4 more
060301 140849 Lost connection to JobTracker [localhost/127.0.0.1:9001]. 
Retrying...

Any clues? Please help me.
I think , I'm doing everything right.
If not, please explain me.


----- Original Message ----- 
From: "Dima Mazmanov" <di...@proservice.ge>
To: <nu...@lucene.apache.org>
Sent: Tuesday, February 28, 2006 10:15 AM
Subject: Problems with hadoop


I have a problem during executing hadoop scripts
./start-all.sh
gives me following

source: not found
Password:

What does it mean??? what kind of source wasn't found, and what password I 
must type?
I configured ssh like it was written it tutorial, but with no result.
Could you tell me what am I doing wrong?
Thanks.
Regards, Dima


__________ NOD32 1.1421 (20060228) Information __________

This message was checked by NOD32 antivirus system.
http://www.eset.com