You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Emmanuel <jo...@gmail.com> on 2007/07/29 10:52:13 UTC
Map ouput
Does anybody have set mapred.compress.map.output to TRUE ?
Have u seen any improvement with this properties ?
I'm wondering if the crawling is faster.
Thanks for your feedback.
Re: error merger index
Posted by Enzo Michelangeli <en...@gmail.com>.
----- Original Message -----
From: "Le Quoc Anh" <qu...@gmail.com>
Sent: Sunday, July 29, 2007 5:14 PM
> .Hi everyone,
>
> When I recrawl, I must delete indexes and index files, and re-create index
> file. If I only indexer segments that I have just fetched and merger with
> index existe, an error appear "index/merge-output exists". Anyone help me?
>
> Thanks a lot,
>
> Quoc Anh
Which command lines are you using? This works for me (in a non-distributed
environment), where the old index is in crawl/indexes and the one just
created is in crawl/indexes_new :
errexit() {
echo "## $(date): *** LAST COMMAND RETURNED NONZERO STATUS: $? ***"
exit 1
}
[...]
echo "## $(date): Starting to merge new indexes into old..."
MERGE_DATE=$(date +%Y%m%d%H%M)
bin/nutch merge crawl/indexes_merged/${MERGE_DATE} \
crawl/indexes crawl/indexes_new || errexit
# in nutch-site, hadoop.tmp.dir points to crawl/tmp
rm -rf crawl/tmp/*
# create the index.done flag
touch crawl/indexes_merged/${MERGE_DATE}/index.done
# delete indexes_new and replace indexes with indexes_merged
rm -rf crawl/indexes_new
mv crawl/indexes crawl/indexes_old
mv crawl/indexes_merged crawl/indexes
rm -rf crawl/indexes_old
echo "## $(date): Re-indexing completed, making webapp aware of that..."
touch -c /usr/share/tomcat5/webapps/nutch*/WEB-INF/web.xml
Enzo
error merger index
Posted by Le Quoc Anh <qu...@gmail.com>.
.Hi everyone,
When I recrawl, I must delete indexes and index files, and re-create index
file. If I only indexer segments that I have just fetched and merger with
index existe, an error appear "index/merge-output exists". Anyone help me?
Thanks a lot,
Quoc Anh