You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Dima Mazmanov <nu...@proservice.ge> on 2006/03/01 14:58:31 UTC

Problems with crawling

Hi all!

Oh!
./hadoop namenode
./hadoop datanode
./hadoop jobtracker
./hadoop tasktracker
./hadoop dfs -put urls urls
executed normally:)
but
./nutch crawl urls -dir tmpdir -depth 3
doesn't create tmpdir directory.
And..imho it  got caught in an endless loop..
This is an output

k060301 142821 Server connection on port 9001 from 127.0.0.1: starting
060301 142821 Server connection on port 9000 from 127.0.0.1: starting
060301 142823 574 Received block blk_2591040264765485729 from /212.58.116.70
060301 142823 575 Received block blk_5499388239431783540 from /212.58.116.70
060301 142823 576 Received block blk_-7203392459151183228 from /212.58.116.70
060301 142823 577 Received block blk_-6402877820976754119 from /212.58.116.70
060301 142824 task_m_5j3zwv done; removing files.
060301 142824 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060301 142824 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml
060301 142824 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060301 142824 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060301 142824 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml
060301 142824 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060301 142824 578 Served block blk_-7203392459151183228 to /212.58.116.70
060301 142824 579 Served block blk_-6402877820976754119 to /212.58.116.70
060301 142824 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060301 142824 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml
060301 142824 parsing /usr/home/duche/nutch-nightly/hadoop/mapred/local/jobTracker/job_frrugw.xml
060301 142824 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060301 142825 580 Served block blk_2591040264765485729 to /212.58.116.70
060301 142825 581 Served block blk_5499388239431783540 to /212.58.116.70
060301 142827 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060301 142827 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml
060301 142827 parsing /usr/home/duche/nutch-nightly/hadoop/mapred/local/jobTracker/job_frrugw.xml
060301 142827 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060301 142827 Adding task 'task_m_1s82kb' to tip tip_pw0d4r, for tracker 'tracker_93814'
060301 142827 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060301 142827 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml
060301 142827 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060301 142828 582 Served block blk_-7203392459151183228 to /212.58.116.70
060301 142828 583 Served block blk_-6402877820976754119 to /212.58.116.70
060301 142828 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060301 142828 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml
060301 142828 parsing /usr/home/duche/nutch-nightly/hadoop/mapred/local/taskTracker/task_m_1s82kb/job.xml
060301 142828 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060301 142828 584 Served block blk_2591040264765485729 to /212.58.116.70
060301 142829 585 Served block blk_5499388239431783540 to /212.58.116.70
060301 142829 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060301 142829 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml
060301 142829 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060301 142829 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060301 142829 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml
060301 142829 parsing /usr/home/duche/nutch-nightly/hadoop/mapred/local/taskTracker/task_m_1s82kb/job.xml
060301 142829 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060301 142829 task_m_5otcyk done; removing files.
060301 142830 task_m_1s82kb  Child starting
060301 142830 Adding task 'task_r_b29o76' to tip tip_tw1d4r, for tracker 'tracker_93814'
060301 142830 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060301 142830 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml
060301 142830 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060301 142831 586 Served block blk_-7203392459151183228 to /212.58.116.70
060301 142831 587 Served block blk_-6402877820976754119 to /212.58.116.70
060301 142831 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060301 142831 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml
060301 142831 parsing /usr/home/duche/nutch-nightly/hadoop/mapred/local/taskTracker/task_r_b29o76/job.xml
060301 142831 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060301 142831 task_m_1s82kb  parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060301 142831 task_m_1s82kb  parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060301 142831 Server connection on port 50050 from 212.58.116.70: starting
060301 142831 task_m_1s82kb  Client connection to 0.0.0.0:50050: starting
060301 142831 task_m_1s82kb  parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060301 142831 task_m_1s82kb  parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml
060301 142831 task_m_1s82kb  parsing /usr/home/duche/nutch-nightly/hadoop/mapred/local/taskTracker/task_m_1s82kb/job.xml
060301 142831 task_m_1s82kb  parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060301 142831 task_m_1s82kb  parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060301 142831 task_m_1s82kb  parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml
060301 142831 task_m_1s82kb  parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060301 142831 Server connection on port 9000 from 127.0.0.1: starting
060301 142831 task_m_1s82kb  Client connection to 127.0.0.1:9000: starting
060301 142831 588 Served block blk_2591040264765485729 to /212.58.116.70
060301 142832 589 Served block blk_5452063422538451352 to /212.58.116.70
060301 142832 590 Served block blk_6698132808529653958 to /212.58.116.70
060301 142832 Server connection on port 50050 from 212.58.116.70: starting
060301 142832 task_m_1s82kb  Client connection to 0.0.0.0:50050: starting
060301 142832 task_m_1s82kb 1.0% /user/root/dedup-hash-1013323129/part-00000:0+126
060301 142832 Task task_m_1s82kb is done.
060301 142832 Server connection on port 9000 from 127.0.0.1: exiting
060301 142832 Server connection on port 50050 from 212.58.116.70: exiting
060301 142832 Server connection on port 50050 from 212.58.116.70: exiting
060301 142832 591 Served block blk_5499388239431783540 to /212.58.116.70
060301 142832 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060301 142832 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml
060301 142832 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060301 142832 task_r_b29o76 0.0% reduce > copy >
060301 142832 task_m_66f1ln done; removing files.
060301 142833 Taskid 'task_m_1s82kb' has finished successfully.
060301 142833 Task 'task_m_1s82kb' has completed.
060301 142833 task_r_b29o76 0.0% reduce > copy >
060301 142833 task_m_6vdctb done; removing files.
060301 142833 task_r_b29o76 Got 1 map output locations.
060301 142834 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060301 142834 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml
060301 142834 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060301 142834 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060301 142834 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml
060301 142834 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060301 142834 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060301 142834 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml
060301 142834 parsing /usr/home/duche/nutch-nightly/hadoop/mapred/local/taskTracker/task_r_b29o76/job.xml
060301 142834 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060301 142835 task_r_b29o76  Child starting
060301 142835 task_r_b29o76  parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060301 142835 task_r_b29o76  parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060301 142835 Server connection on port 50050 from 212.58.116.70: starting
060301 142835 task_r_b29o76  Client connection to 0.0.0.0:50050: starting
060301 142836 task_r_b29o76  parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060301 142836 task_r_b29o76  parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml
060301 142836 task_r_b29o76  parsing /usr/home/duche/nutch-nightly/hadoop/mapred/local/taskTracker/task_r_b29o76/job.xml
060301 142836 task_r_b29o76  parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060301 142836 Server connection on port 9000 from 127.0.0.1: starting
060301 142836 task_r_b29o76  Client connection to 127.0.0.1:9000: starting
060301 142836 task_r_b29o76  parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060301 142836 task_r_b29o76  parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml
060301 142836 task_r_b29o76  parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060301 142836 task_r_b29o76  Client connection to 0.0.0.0:50050: starting
060301 142836 Server connection on port 50050 from 212.58.116.70: starting
060301 142836 task_r_b29o76 1.0% reduce > reduce
060301 142836 Task task_r_b29o76 is done.
060301 142837 Server connection on port 9000 from 127.0.0.1: exiting
060301 142837 Server connection on port 50050 from 212.58.116.70: exiting
060301 142837 Server connection on port 50050 from 212.58.116.70: exiting
060301 142839 Taskid 'task_r_b29o76' has finished successfully.
060301 142839 Task 'task_r_b29o76' has completed.
060301 142839 task_m_1s82kb done; removing files.
060301 142840 592 Served block blk_3446777712324074116 to /212.58.116.70
060301 142840 593 Served block blk_8083213610111745558 to /212.58.116.70
060301 142841 594 Received block blk_5137022603699873805 from /212.58.116.70
060301 142841 595 Received block blk_3898271747821768899 from /212.58.116.70

But not database directory was created....what's going on?!?!

You wrote 1 марта 2006 г., 16:36:13:

> Hi,

> In that case, I don't know what's going on, sorry.  As far as I can tell,
> start-all.sh just starts those services itself.  Try starting the services
> individually using the commands in start-all.sh, e.g. "hadoop-daemons.sh
> start datanode".  See which script asks for the password.

> Jon

>> -----Original Message-----
>> From: Dima Mazmanov [mailto:nuther@proservice.ge] 
>> Sent: 01 March 2006 13:39
>> To: Jon Blower
>> Subject: Re[8]: Problems with hadoop
>> 
>> Hi,Jon.
>> 
>> But I have no problems with  hadoop script.
>> If I'm executing manually
>> ./hadoop namenode
>> ./hadoop datanode
>> ./hadoop jobtracker
>> ./hadoop tasktracker
>> then everything goes fine!
>> but if I'm executing start-all.sh
>> then script asks for password...
>> 
>> You wrote 1 марта 2006 г., 16:25:40:
>> 
>> > No, it's a program whose name is "source".  It is called 
>> from line 53 of
>> > bin/hadoop.  It is not a shell program.   It is normally 
>> used to run a
>> > script to set a number of environment variables, which are then 
>> > available to the rest of the parent script.
>> 
>> > It is used in bin/hadoop to run the conf/hadoop-env.sh 
>> script.  If (as 
>> > in my
>> > installation) all the lines in hadoop-env.sh are commented out, you
>> > can delete this line in bin/hadoop since the hadoop-env 
>> script does nothing.
>> 
>> > Jon
>> 
>> >> -----Original Message-----
>> >> From: Dima Mazmanov [mailto:nuther@proservice.ge]
>> >> Sent: 01 March 2006 13:27
>> >> To: Jon Blower
>> >> Subject: Re[6]: Problems with hadoop
>> >> 
>> >> Hi,Jon.
>> >> 
>> >> are you talking about bash?
>> >> If so then FreeBSD has it.
>> >> It is called sh
>> >> You wrote 1 марта 2006 г., 16:15:03:
>> >> 
>> >> > It's a program that is called "source".  It is used in many
>> >> systems (Linux,
>> >> > Solaris) to run a shell script from within another shell script.
>> >> 
>> >> > I'm surprised that FreeBSD doesn't have this program.  
>> >> Perhaps you can
>> >> > download it.
>> >> 
>> >> > Cheers, Jon
>> >> 
>> >> >> -----Original Message-----
>> >> >> From: Dima Mazmanov [mailto:nuther@proservice.ge]
>> >> >> Sent: 01 March 2006 12:00
>> >> >> To: Jon Blower
>> >> >> Subject: Re[4]: Problems with hadoop
>> >> >> 
>> >> >> Hi,Jon.
>> >> >> 
>> >> >> What kind of source program?
>> >> >> You wrote 1 марта 2006 г., 14:25:29:
>> >> >> 
>> >> >> > My guess is that the "source" program is not available on
>> >> >> your version
>> >> >> > of FreeBSD.  Try running the "source" program (with no
>> >> >> arguments) from
>> >> >> > the command line or type "man source".  Do you see
>> >> >> anything?  If not,
>> >> >> > you probably don't have the "source" program, which is
>> >> >> called by the
>> >> >> > hadoop script.
>> >> >> 
>> >> >> > Jon
>> >> >> 
>> >> >> >> -----Original Message-----
>> >> >> >> From: Dima Mazmanov [mailto:nuther@proservice.ge]
>> >> >> >> Sent: 01 March 2006 11:16
>> >> >> >> To: Jon Blower
>> >> >> >> Subject: Re[2]: Problems with hadoop
>> >> >> >> 
>> >> >> >> Hi,Jon.
>> >> >> >> 
>> >> >> >> I'm running on FreeBSD 6.0.
>> >> >> >> And version of nutch is 26 February You wrote 28
>> >> февраля 2006 г.,
>> >> >> >> 21:22:05:
>> >> >> >> 
>> >> >> >> > Hi Dima,
>> >> >> >> 
>> >> >> >> > We probably need more information here.  What kind of
>> >> >> >> system are you
>> >> >> >> > running on and which version of the software did 
>> you download?
>> >> >> >> 
>> >> >> >> > Regards, Jon
>> >> >> >> 
>> >> >> >> >> -----Original Message-----
>> >> >> >> >> From: Dima Mazmanov [mailto:dima@proservice.ge]
>> >> >> >> >> Sent: 28 February 2006 07:16
>> >> >> >> >> To: nutch-user@lucene.apache.org
>> >> >> >> >> Subject: Problems with hadoop
>> >> >> >> >> 
>> >> >> >> >> I have a problem during executing hadoop scripts
>> >> >> >> ./start-all.sh gives
>> >> >> >> >> me following
>> >> >> >> >> 
>> >> >> >> >> source: not found
>> >> >> >> >> Password:
>> >> >> >> >> 
>> >> >> >> >> What does it mean??? what kind of source wasn't
>> >> found, and what
>> >> >> >> >> password I must type?
>> >> >> >> >> I configured ssh like it was written it tutorial, but with
>> >> >> >> no result.
>> >> >> >> >> Could you tell me what am I doing wrong?
>> >> >> >> >> Thanks.
>> >> >> >> >> Regards, Dima
>> >> >> >> >> 
>> >> >> >> 
>> >> >> >> 
>> >> >> >> 
>> >> >> >> > __________ NOD32 1.1420 (20060227) Information __________
>> >> >> >> 
>> >> >> >> > This message was checked by NOD32 antivirus system.
>> >> >> >> > http://www.eset.com
>> >> >> >> 
>> >> >> >> 
>> >> >> >> 
>> >> >> >> 
>> >> >> >> --
>> >> >> >> Regards,
>> >> >> >>  Dima                          mailto:nuther@proservice.ge
>> >> >> >> 
>> >> >> >> 
>> >> >> >> 
>> >> >> 
>> >> >> 
>> >> >> 
>> >> >> > __________ NOD32 1.1422 (20060301) Information __________
>> >> >> 
>> >> >> > This message was checked by NOD32 antivirus system.
>> >> >> > http://www.eset.com
>> >> >> 
>> >> >> 
>> >> >> 
>> >> >> 
>> >> >> --
>> >> >> Regards,
>> >> >>  Dima                          mailto:nuther@proservice.ge
>> >> >> 
>> >> >> 
>> >> >> 
>> >> 
>> >> 
>> >> 
>> >> 
>> >> > __________ NOD32 1.1422 (20060301) Information __________
>> >> 
>> >> > This message was checked by NOD32 antivirus system.
>> >> > http://www.eset.com
>> >> 
>> >> 
>> >> 
>> >> 
>> >> --
>> >> Regards,
>> >>  Dima                          mailto:nuther@proservice.ge
>> >> 
>> >> 
>> >> 
>> 
>> 
>> 
>> > __________ NOD32 1.1422 (20060301) Information __________
>> 
>> > This message was checked by NOD32 antivirus system.
>> > http://www.eset.com
>> 
>> 
>> 
>> 
>> --
>> Regards,
>>  Dima                          mailto:nuther@proservice.ge
>> 
>> 
>> 



> __________ NOD32 1.1422 (20060301) Information __________

> This message was checked by NOD32 antivirus system.
> http://www.eset.com




-- 
Regards,
 Dima                          mailto:nuther@proservice.ge


---------- Конец пересылаемого письма ----------
-- 
С уважением,
 Dima                          mailto:nuther@proservice.ge