You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Andy Xue <an...@gmail.com> on 2012/03/21 09:00:00 UTC
Too much logging
Hi:
When I was using Nutch 1.4 and Hadoop 1.01 to do distributed crawling, the
log4j logging is way too verbose. Which entries in the *log4j.properties* file
should I modify in order to display only WARN and above messages? I have
tried to replaces "ALL", "DEBUG", and "INFO" to "WARN" in both hadoop and
nutch *log4j.properties* files, but it still doesn't work at all. I did
build a new job file and copy the *log4j.properties* files to each slave
node.
I appreciate your help.
Below is an example. Bold lines are the ones that I want to remove:
============================================================================================
12/03/21 18:42:39 INFO crawl.Injector: Injector: starting at 2012-03-21
18:42:39
12/03/21 18:42:39 INFO crawl.Injector: Injector: crawlDb:
/crawl_database/crawldb
12/03/21 18:42:39 INFO crawl.Injector: Injector: urlDir: /urls
12/03/21 18:42:39 INFO crawl.Injector: Injector: Converting injected urls
to crawl db entries.
*12/03/21 18:42:44 INFO mapred.FileInputFormat: Total input paths to
process : 1*
*12/03/21 18:42:44 INFO mapred.JobClient: Running job: job_201203211827_0011
*
*12/03/21 18:42:45 INFO mapred.JobClient: map 0% reduce 0%*
*12/03/21 18:42:58 INFO mapred.JobClient: map 10% reduce 0%*
*12/03/21 18:43:01 INFO mapred.JobClient: map 15% reduce 0%*
*12/03/21 18:43:04 INFO mapred.JobClient: map 20% reduce 0%*
*12/03/21 18:43:07 INFO mapred.JobClient: map 25% reduce 2%*
*12/03/21 18:43:10 INFO mapred.JobClient: map 35% reduce 5%*
*12/03/21 18:43:13 INFO mapred.JobClient: map 41% reduce 6%*
*12/03/21 18:43:16 INFO mapred.JobClient: map 51% reduce 7%*
*12/03/21 18:43:19 INFO mapred.JobClient: map 56% reduce 10%*
*12/03/21 18:43:22 INFO mapred.JobClient: map 66% reduce 13%*
*12/03/21 18:43:25 INFO mapred.JobClient: map 71% reduce 17%*
*12/03/21 18:43:28 INFO mapred.JobClient: map 82% reduce 19%*
*12/03/21 18:43:31 INFO mapred.JobClient: map 87% reduce 21%*
*12/03/21 18:43:34 INFO mapred.JobClient: map 97% reduce 23%*
*12/03/21 18:43:37 INFO mapred.JobClient: map 100% reduce 26%*
*12/03/21 18:43:40 INFO mapred.JobClient: map 100% reduce 29%*
*12/03/21 18:43:43 INFO mapred.JobClient: map 100% reduce 41%*
*12/03/21 18:43:46 INFO mapred.JobClient: map 100% reduce 65%*
*12/03/21 18:43:49 INFO mapred.JobClient: map 100% reduce 100%*
*12/03/21 18:43:54 INFO mapred.JobClient: Job complete:
job_201203211827_0011*
*12/03/21 18:43:54 INFO mapred.JobClient: Counters: 31*
*12/03/21 18:43:54 INFO mapred.JobClient: Job Counters *
*12/03/21 18:43:54 INFO mapred.JobClient: Launched reduce tasks=6*
*12/03/21 18:43:54 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=201894*
*12/03/21 18:43:54 INFO mapred.JobClient: Total time spent by all
reduces waiting after reserving slots (ms)=0*
*12/03/21 18:43:54 INFO mapred.JobClient: Total time spent by all maps
waiting after reserving slots (ms)=0*
*12/03/21 18:43:54 INFO mapred.JobClient: Rack-local map tasks=14*
*12/03/21 18:43:54 INFO mapred.JobClient: Launched map tasks=39*
*12/03/21 18:43:54 INFO mapred.JobClient: Data-local map tasks=25*
*12/03/21 18:43:54 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=274583*
*12/03/21 18:43:54 INFO mapred.JobClient: File Input Format Counters *
*12/03/21 18:43:54 INFO mapred.JobClient: Bytes Read=2339*
*12/03/21 18:43:54 INFO mapred.JobClient: File Output Format Counters *
*12/03/21 18:43:54 INFO mapred.JobClient: Bytes Written=817*
*12/03/21 18:43:54 INFO mapred.JobClient: FileSystemCounters*
*12/03/21 18:43:54 INFO mapred.JobClient: FILE_BYTES_READ=307*
*12/03/21 18:43:54 INFO mapred.JobClient: HDFS_BYTES_READ=5615*
*12/03/21 18:43:54 INFO mapred.JobClient: FILE_BYTES_WRITTEN=1357045*
*12/03/21 18:43:54 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=817*
*12/03/21 18:43:54 INFO mapred.JobClient: Map-Reduce Framework*
*12/03/21 18:43:54 INFO mapred.JobClient: Map output materialized
bytes=1675*
*12/03/21 18:43:54 INFO mapred.JobClient: Map input records=5*
*12/03/21 18:43:54 INFO mapred.JobClient: Reduce shuffle bytes=1651*
*12/03/21 18:43:54 INFO mapred.JobClient: Spilled Records=10*
*12/03/21 18:43:54 INFO mapred.JobClient: Map output bytes=261*
*12/03/21 18:43:54 INFO mapred.JobClient: Total committed heap usage
(bytes)=6684082176*
*12/03/21 18:43:54 INFO mapred.JobClient: CPU time spent (ms)=26660*
*12/03/21 18:43:54 INFO mapred.JobClient: Map input bytes=116*
*12/03/21 18:43:54 INFO mapred.JobClient: SPLIT_RAW_BYTES=3276*
*12/03/21 18:43:54 INFO mapred.JobClient: Combine input records=0*
*12/03/21 18:43:54 INFO mapred.JobClient: Reduce input records=5*
*12/03/21 18:43:54 INFO mapred.JobClient: Reduce input groups=5*
*12/03/21 18:43:54 INFO mapred.JobClient: Combine output records=0*
*12/03/21 18:43:54 INFO mapred.JobClient: Physical memory (bytes)
snapshot=8386580480*
*12/03/21 18:43:54 INFO mapred.JobClient: Reduce output records=5*
*12/03/21 18:43:54 INFO mapred.JobClient: Virtual memory (bytes)
snapshot=21784285184*
*12/03/21 18:43:54 INFO mapred.JobClient: Map output records=5*
============================================================================================
Regards
Andy
Re: Too much logging
Posted by Markus Jelsma <ma...@openindex.io>.
Nutch uses Log4J so you could try to configure different log levels for Hadoop
classes. But i think it's a bad practice as the JobClient logs important
information when it finishes.
On Wednesday 21 March 2012 09:00:00 Andy Xue wrote:
> Hi:
>
> When I was using Nutch 1.4 and Hadoop 1.01 to do distributed crawling, the
> log4j logging is way too verbose. Which entries in the *log4j.properties*
> file should I modify in order to display only WARN and above messages? I
> have tried to replaces "ALL", "DEBUG", and "INFO" to "WARN" in both hadoop
> and nutch *log4j.properties* files, but it still doesn't work at all. I
> did build a new job file and copy the *log4j.properties* files to each
> slave node.
>
> I appreciate your help.
>
> Below is an example. Bold lines are the ones that I want to remove:
> ===========================================================================
> ================= 12/03/21 18:42:39 INFO crawl.Injector: Injector: starting
> at 2012-03-21 18:42:39
> 12/03/21 18:42:39 INFO crawl.Injector: Injector: crawlDb:
> /crawl_database/crawldb
> 12/03/21 18:42:39 INFO crawl.Injector: Injector: urlDir: /urls
> 12/03/21 18:42:39 INFO crawl.Injector: Injector: Converting injected urls
> to crawl db entries.
> *12/03/21 18:42:44 INFO mapred.FileInputFormat: Total input paths to
> process : 1*
> *12/03/21 18:42:44 INFO mapred.JobClient: Running job:
> job_201203211827_0011 *
> *12/03/21 18:42:45 INFO mapred.JobClient: map 0% reduce 0%*
> *12/03/21 18:42:58 INFO mapred.JobClient: map 10% reduce 0%*
> *12/03/21 18:43:01 INFO mapred.JobClient: map 15% reduce 0%*
> *12/03/21 18:43:04 INFO mapred.JobClient: map 20% reduce 0%*
> *12/03/21 18:43:07 INFO mapred.JobClient: map 25% reduce 2%*
> *12/03/21 18:43:10 INFO mapred.JobClient: map 35% reduce 5%*
> *12/03/21 18:43:13 INFO mapred.JobClient: map 41% reduce 6%*
> *12/03/21 18:43:16 INFO mapred.JobClient: map 51% reduce 7%*
> *12/03/21 18:43:19 INFO mapred.JobClient: map 56% reduce 10%*
> *12/03/21 18:43:22 INFO mapred.JobClient: map 66% reduce 13%*
> *12/03/21 18:43:25 INFO mapred.JobClient: map 71% reduce 17%*
> *12/03/21 18:43:28 INFO mapred.JobClient: map 82% reduce 19%*
> *12/03/21 18:43:31 INFO mapred.JobClient: map 87% reduce 21%*
> *12/03/21 18:43:34 INFO mapred.JobClient: map 97% reduce 23%*
> *12/03/21 18:43:37 INFO mapred.JobClient: map 100% reduce 26%*
> *12/03/21 18:43:40 INFO mapred.JobClient: map 100% reduce 29%*
> *12/03/21 18:43:43 INFO mapred.JobClient: map 100% reduce 41%*
> *12/03/21 18:43:46 INFO mapred.JobClient: map 100% reduce 65%*
> *12/03/21 18:43:49 INFO mapred.JobClient: map 100% reduce 100%*
> *12/03/21 18:43:54 INFO mapred.JobClient: Job complete:
> job_201203211827_0011*
> *12/03/21 18:43:54 INFO mapred.JobClient: Counters: 31*
> *12/03/21 18:43:54 INFO mapred.JobClient: Job Counters *
> *12/03/21 18:43:54 INFO mapred.JobClient: Launched reduce tasks=6*
> *12/03/21 18:43:54 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=201894*
> *12/03/21 18:43:54 INFO mapred.JobClient: Total time spent by all
> reduces waiting after reserving slots (ms)=0*
> *12/03/21 18:43:54 INFO mapred.JobClient: Total time spent by all maps
> waiting after reserving slots (ms)=0*
> *12/03/21 18:43:54 INFO mapred.JobClient: Rack-local map tasks=14*
> *12/03/21 18:43:54 INFO mapred.JobClient: Launched map tasks=39*
> *12/03/21 18:43:54 INFO mapred.JobClient: Data-local map tasks=25*
> *12/03/21 18:43:54 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=274583*
> *12/03/21 18:43:54 INFO mapred.JobClient: File Input Format Counters *
> *12/03/21 18:43:54 INFO mapred.JobClient: Bytes Read=2339*
> *12/03/21 18:43:54 INFO mapred.JobClient: File Output Format Counters *
> *12/03/21 18:43:54 INFO mapred.JobClient: Bytes Written=817*
> *12/03/21 18:43:54 INFO mapred.JobClient: FileSystemCounters*
> *12/03/21 18:43:54 INFO mapred.JobClient: FILE_BYTES_READ=307*
> *12/03/21 18:43:54 INFO mapred.JobClient: HDFS_BYTES_READ=5615*
> *12/03/21 18:43:54 INFO mapred.JobClient: FILE_BYTES_WRITTEN=1357045*
> *12/03/21 18:43:54 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=817*
> *12/03/21 18:43:54 INFO mapred.JobClient: Map-Reduce Framework*
> *12/03/21 18:43:54 INFO mapred.JobClient: Map output materialized
> bytes=1675*
> *12/03/21 18:43:54 INFO mapred.JobClient: Map input records=5*
> *12/03/21 18:43:54 INFO mapred.JobClient: Reduce shuffle bytes=1651*
> *12/03/21 18:43:54 INFO mapred.JobClient: Spilled Records=10*
> *12/03/21 18:43:54 INFO mapred.JobClient: Map output bytes=261*
> *12/03/21 18:43:54 INFO mapred.JobClient: Total committed heap usage
> (bytes)=6684082176*
> *12/03/21 18:43:54 INFO mapred.JobClient: CPU time spent (ms)=26660*
> *12/03/21 18:43:54 INFO mapred.JobClient: Map input bytes=116*
> *12/03/21 18:43:54 INFO mapred.JobClient: SPLIT_RAW_BYTES=3276*
> *12/03/21 18:43:54 INFO mapred.JobClient: Combine input records=0*
> *12/03/21 18:43:54 INFO mapred.JobClient: Reduce input records=5*
> *12/03/21 18:43:54 INFO mapred.JobClient: Reduce input groups=5*
> *12/03/21 18:43:54 INFO mapred.JobClient: Combine output records=0*
> *12/03/21 18:43:54 INFO mapred.JobClient: Physical memory (bytes)
> snapshot=8386580480*
> *12/03/21 18:43:54 INFO mapred.JobClient: Reduce output records=5*
> *12/03/21 18:43:54 INFO mapred.JobClient: Virtual memory (bytes)
> snapshot=21784285184*
> *12/03/21 18:43:54 INFO mapred.JobClient: Map output records=5*
> ===========================================================================
> =================
>
> Regards
> Andy
--
Markus Jelsma - CTO - Openindex