You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Dima Mazmanov <di...@proservice.ge> on 2006/03/06 09:04:19 UTC

Where is database?!

Hi!
Here are my steps of crawling.
I started all hadoop daemins,
inserted url file into dfs.
then started to crawl.
Here is  part of crawl log.

060306 124851 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml
060306 124851 parsing /usr/home/duche/nutch-nightly/hadoop/mapred/local/taskTracker/task_m_568oxw/job.xml
060306 124851 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060306 124852 task_r_281ien 0.16666667% reduce > copy >
060306 124852  map 67%  reduce 17%
060306 124853 task_r_281ien 0.16666667% reduce > copy >
060306 124853 task_m_568oxw  Child starting
060306 124853 task_m_568oxw  parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060306 124853 task_m_568oxw  parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060306 124853 Server connection on port 50050 from 212.58.116.70: starting
060306 124853 task_m_568oxw  Client connection to 0.0.0.0:50050: starting
060306 124853 task_m_568oxw  parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060306 124854 task_m_568oxw  parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml
060306 124854 task_m_568oxw  parsing /usr/home/duche/nutch-nightly/hadoop/mapred/local/taskTracker/task_m_568oxw/job.xml
060306 124854 task_m_568oxw  parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060306 124854 task_m_568oxw  parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060306 124854 task_m_568oxw  parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml
060306 124854 task_m_568oxw  parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060306 124854 Server connection on port 9000 from 127.0.0.1: starting
060306 124854 task_m_568oxw  Client connection to 127.0.0.1:9000: starting
060306 124854 task_r_281ien 0.16666667% reduce > copy >
060306 124854 459 Served block blk_-3727406626879829125 to /212.58.116.70
060306 124854 460 Served block blk_-5496623489076405734 to /212.58.116.70
060306 124854 Server connection on port 50050 from 212.58.116.70: starting
060306 124854 task_m_568oxw  Client connection to 0.0.0.0:50050: starting
060306 124854 task_m_568oxw 0.99999994% /user/root/tmpdb/segments/20060306124638/parse_data/part-00000/data:0+61
060306 124854 Task task_m_568oxw is done.
060306 124854 Server connection on port 9000 from 127.0.0.1: exiting
060306 124854 Server connection on port 50050 from 212.58.116.70: exiting
060306 124854 Server connection on port 50050 from 212.58.116.70: exiting
060306 124855 Taskid 'task_m_568oxw' has finished successfully.
060306 124855 Task 'task_m_568oxw' has completed.
060306 124855 task_r_281ien 0.16666667% reduce > copy >
060306 124855 task_r_281ien Got 1 map output locations.
060306 124855 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060306 124855 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml
060306 124855 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060306 124855 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060306 124855 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml
060306 124855 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060306 124855 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060306 124855 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml
060306 124855 parsing /usr/home/duche/nutch-nightly/hadoop/mapred/local/taskTracker/task_r_281ien/job.xml
060306 124855 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060306 124856  map 100%  reduce 17%
060306 124857 task_r_281ien  Child starting
060306 124857 task_r_281ien  parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060306 124858 task_r_281ien  parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060306 124858 Server connection on port 50050 from 212.58.116.70: starting



060306 125101 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060306 125101 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml
060306 125101 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060306 125101 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060306 125101 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml
060306 125101 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060306 125101 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060306 125101 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml
060306 125101 parsing /usr/home/duche/nutch-nightly/hadoop/mapred/local/taskTracker/task_r_1iryja/job.xml
060306 125101 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060306 125102 task_m_4uq2k2 done; removing files.
060306 125102 task_r_1iryja  Child starting
060306 125103 task_r_1iryja  parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060306 125103 task_r_1iryja  parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060306 125103 Server connection on port 50050 from 212.58.116.70: starting
060306 125103 task_r_1iryja  Client connection to 0.0.0.0:50050: starting
060306 125106 task_r_1iryja  parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060306 125106 task_r_1iryja  parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml
060306 125106 task_r_1iryja  parsing /usr/home/duche/nutch-nightly/hadoop/mapred/local/taskTracker/task_r_1iryja/job.xml
060306 125106 task_r_1iryja  parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060306 125106 task_r_1iryja  parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060306 125106 task_r_1iryja  parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml
060306 125106 task_r_1iryja  parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060306 125106 task_r_1iryja  Client connection to 127.0.0.1:9000: starting
060306 125106 Server connection on port 9000 from 127.0.0.1: starting
060306 125106 task_r_1iryja  parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060306 125106 task_m_60jy1g done; removing files.
060306 125106 task_r_1iryja  parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060306 125106 568 Received block blk_937594491748799698 from /212.58.116.70
060306 125107 569 Received block blk_-292066997504083183 from /212.58.116.70
060306 125107 Server connection on port 50050 from 212.58.116.70: starting
060306 125107 task_r_1iryja  Client connection to 0.0.0.0:50050: starting
060306 125107 task_r_1iryja 1.0% reduce > reduce
060306 125108 Task task_r_1iryja is done.
060306 125108 Server connection on port 9000 from 127.0.0.1: exiting
060306 125108 Server connection on port 50050 from 212.58.116.70: exiting
060306 125108 Server connection on port 50050 from 212.58.116.70: exiting
060306 125109 Taskid 'task_r_1iryja' has finished successfully.
060306 125109 Task 'task_r_1iryja' has completed.
060306 125109 task_m_1vj7kz done; removing files.
060306 125109  map 100%  reduce 100%
060306 125109 Job complete: job_hldwxh
060306 125109 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060306 125109 parsing file:/usr/home/duche/nutch-nightly/conf/nutch-default.xml
060306 125109 parsing file:/usr/home/duche/nutch-nightly/conf/crawl-tool.xml
060306 125109 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml
060306 125109 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml
060306 125109 parsing file:/usr/home/duche/nutch-nightly/conf/nutch-site.xml
060306 125109 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060306 125109 Client connection to 127.0.0.1:9001: starting
060306 125109 Server connection on port 9001 from 127.0.0.1: starting
060306 125109 Client connection to 127.0.0.1:9000: starting
060306 125109 Server connection on port 9000 from 127.0.0.1: starting
060306 125112 570 Received block blk_-6705899863806264848 from /212.58.116.70
060306 125112 task_m_6cj8z8 done; removing files.
060306 125112 571 Received block blk_-5306879891270215512 from /212.58.116.70
060306 125113 572 Received block blk_-3324229811791817900 from /212.58.116.70
060306 125113 573 Received block blk_7536925015414314323 from /212.58.116.70
060306 125113 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060306 125113 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml
060306 125113 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060306 125113 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060306 125113 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml
060306 125113 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060306 125113 574 Served block blk_-3324229811791817900 to /212.58.116.70
060306 125113 575 Served block blk_7536925015414314323 to /212.58.116.70
060306 125113 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060306 125113 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml
060306 125113 parsing /usr/home/duche/nutch-nightly/hadoop/mapred/local/jobTracker/job_kspap7.xml
060306 125113 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060306 125114 576 Served block blk_-6705899863806264848 to /212.58.116.70
060306 125115 577 Served block blk_-5306879891270215512 to /212.58.116.70
060306 125115 Running job: job_kspap7
060306 125115 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060306 125115 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml
060306 125115 parsing /usr/home/duche/nutch-nightly/hadoop/mapred/local/jobTracker/job_kspap7.xml
060306 125115 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060306 125116 Adding task 'task_m_3hg2gq' to tip tip_1x3j1l, for tracker 'tracker_42329'
060306 125116 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060306 125116 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml
060306 125116 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060306 125116 578 Served block blk_-3324229811791817900 to /212.58.116.70
060306 125116 579 Served block blk_7536925015414314323 to /212.58.116.70
060306 125116 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060306 125116 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml
060306 125116 parsing /usr/home/duche/nutch-nightly/hadoop/mapred/local/taskTracker/task_m_3hg2gq/job.xml
060306 125116 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060306 125116  map 0%  reduce 0%
060306 125117 580 Served block blk_-6705899863806264848 to /212.58.116.70
060306 125117 581 Served block blk_-5306879891270215512 to /212.58.116.70
060306 125117 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060306 125117 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml
060306 125117 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060306 125117 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060306 125117 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml
060306 125117 parsing /usr/home/duche/nutch-nightly/hadoop/mapred/local/taskTracker/task_m_3hg2gq/job.xml
060306 125117 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060306 125118 Adding task 'task_r_2teqig' to tip tip_nzm5vg, for tracker 'tracker_42329'
060306 125118 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060306 125118 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml
060306 125118 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060306 125118 582 Served block blk_-3324229811791817900 to /212.58.116.70
060306 125118 583 Served block blk_7536925015414314323 to /212.58.116.70
060306 125118 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060306 125118 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml
060306 125118 parsing /usr/home/duche/nutch-nightly/hadoop/mapred/local/taskTracker/task_r_2teqig/job.xml
060306 125118 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060306 125119 task_m_3hg2gq  Child starting
060306 125119 task_m_3hg2gq  parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060306 125119 task_m_3hg2gq  parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060306 125119 Server connection on port 50050 from 212.58.116.70: starting
060306 125119 task_m_3hg2gq  Client connection to 0.0.0.0:50050: starting
060306 125119 584 Served block blk_-6705899863806264848 to /212.58.116.70
060306 125119 task_m_3hg2gq  parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060306 125119 task_m_3hg2gq  parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml
060306 125119 task_m_3hg2gq  parsing /usr/home/duche/nutch-nightly/hadoop/mapred/local/taskTracker/task_m_3hg2gq/job.xml
060306 125119 task_m_3hg2gq  parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060306 125119 task_m_3hg2gq  parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060306 125120 task_m_3hg2gq  parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml
060306 125120 task_m_3hg2gq  parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060306 125120 Server connection on port 9000 from 127.0.0.1: starting
060306 125120 task_m_3hg2gq  Client connection to 127.0.0.1:9000: starting
060306 125120 585 Served block blk_937594491748799698 to /212.58.116.70
060306 125120 586 Served block blk_-292066997504083183 to /212.58.116.70
060306 125120 Server connection on port 50050 from 212.58.116.70: starting
060306 125120 task_m_3hg2gq  Client connection to 0.0.0.0:50050: starting
060306 125120 task_m_3hg2gq 1.0% /user/root/dedup-hash-6473753/part-00000:0+126
060306 125120 587 Served block blk_-5306879891270215512 to /212.58.116.70
060306 125120 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060306 125120 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml
060306 125120 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060306 125120 task_r_2teqig 0.0% reduce > copy >
060306 125120 Task task_m_3hg2gq is done.
060306 125120 Server connection on port 9000 from 127.0.0.1: exiting
060306 125120 Server connection on port 50050 from 212.58.116.70: exiting
060306 125120 Server connection on port 50050 from 212.58.116.70: exiting
060306 125123 task_r_2teqig 0.0% reduce > copy >
060306 125123 Taskid 'task_m_3hg2gq' has finished successfully.
060306 125123 Task 'task_m_3hg2gq' has completed.
060306 125123 task_r_2teqig Got 1 map output locations.
060306 125123 task_r_2teqig 0.0% reduce > copy >
060306 125123 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060306 125123 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml
060306 125123 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060306 125123 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060306 125123 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml
060306 125123 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060306 125123 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060306 125123 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml
060306 125123 parsing /usr/home/duche/nutch-nightly/hadoop/mapred/local/taskTracker/task_r_2teqig/job.xml
060306 125123 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060306 125123  map 100%  reduce 0%
060306 125125 task_r_2teqig  Child starting
060306 125126 task_r_2teqig  parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060306 125126 task_r_2teqig  parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060306 125126 Server connection on port 50050 from 212.58.116.70: starting
060306 125126 task_r_2teqig  Client connection to 0.0.0.0:50050: starting
060306 125126 task_r_2teqig  parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060306 125126 task_r_2teqig  parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml
060306 125126 task_r_2teqig  parsing /usr/home/duche/nutch-nightly/hadoop/mapred/local/taskTracker/task_r_2teqig/job.xml
060306 125126 task_r_2teqig  parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060306 125126 Server connection on port 9000 from 127.0.0.1: starting
060306 125126 task_r_2teqig  Client connection to 127.0.0.1:9000: starting
060306 125126 task_r_2teqig  parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060306 125126 task_r_2teqig  parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml
060306 125126 task_r_2teqig  parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060306 125126 Server connection on port 50050 from 212.58.116.70: starting
060306 125126 task_r_2teqig  Client connection to 0.0.0.0:50050: starting
060306 125126 task_r_2teqig 1.0% reduce > reduce
060306 125126 Task task_r_2teqig is done.
060306 125127 Server connection on port 9000 from 127.0.0.1: exiting
060306 125127 Server connection on port 50050 from 212.58.116.70: exiting
060306 125127 Server connection on port 50050 from 212.58.116.70: exiting
060306 125129 Taskid 'task_r_2teqig' has finished successfully.
060306 125129 Task 'task_r_2teqig' has completed.
060306 125129 task_m_3hg2gq done; removing files.
060306 125129  map 100%  reduce 100%
060306 125129 Job complete: job_kspap7
060306 125130 Dedup: done
060306 125130 Adding /user/root/tmpdb/indexes/part-00000
060306 125130 588 Served block blk_-6249956399366891811 to /212.58.116.70
060306 125131 589 Served block blk_-4384461151725426132 to /212.58.116.70
060306 125131 590 Received block blk_3162385090057235567 from /212.58.116.70
060306 125131 591 Received block blk_3855280644798095426 from /212.58.116.70
060306 125131 crawl finished: tmpdb
060306 125132 Server connection on port 9000 from 127.0.0.1: exiting
060306 125132 Server connection on port 9001 from 127.0.0.1: exiting
060306 125132 Server connection on port 9000 from 127.0.0.1: exiting
060306 125132 Server connection on port 9000 from 127.0.0.1: exiting

But...where is the database of crawled sites????
./hadoop dfs -ls returns following results :

060306 125433 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml
060306 125434 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml
060306 125434 No FS indicated, using default:localhost:9000
060306 125434 Server connection on port 9000 from 127.0.0.1: starting
060306 125434 Client connection to 127.0.0.1:9000: starting
Found 3 items
/user/root/dfs  <dir>
/user/root/seeds        <dir>
/user/root/tmpdb        <dir>
060306 125434 Server connection on port 9000 from 127.0.0.1: exiting

but there is no /user/root/tmpdb folder!!!
Anyway, if it exists, what must I type into nutch-site.conf to point to it?

Thanks,
Regards, Dima.