You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Andy Xue <an...@gmail.com> on 2012/03/21 09:00:00 UTC

Too much logging

Hi:

When I was using Nutch 1.4 and Hadoop 1.01 to do distributed crawling, the
log4j logging is way too verbose. Which entries in the *log4j.properties* file
should I modify in order to display only WARN and above messages? I have
tried to replaces "ALL", "DEBUG", and "INFO" to "WARN" in both hadoop and
nutch *log4j.properties* files, but it still doesn't work at all. I did
build a new job file and copy the *log4j.properties* files to each slave
node.

I appreciate your help.

Below is an example. Bold lines are the ones that I want to remove:
============================================================================================
12/03/21 18:42:39 INFO crawl.Injector: Injector: starting at 2012-03-21
18:42:39
12/03/21 18:42:39 INFO crawl.Injector: Injector: crawlDb:
/crawl_database/crawldb
12/03/21 18:42:39 INFO crawl.Injector: Injector: urlDir: /urls
12/03/21 18:42:39 INFO crawl.Injector: Injector: Converting injected urls
to crawl db entries.
*12/03/21 18:42:44 INFO mapred.FileInputFormat: Total input paths to
process : 1*
*12/03/21 18:42:44 INFO mapred.JobClient: Running job: job_201203211827_0011
*
*12/03/21 18:42:45 INFO mapred.JobClient:  map 0% reduce 0%*
*12/03/21 18:42:58 INFO mapred.JobClient:  map 10% reduce 0%*
*12/03/21 18:43:01 INFO mapred.JobClient:  map 15% reduce 0%*
*12/03/21 18:43:04 INFO mapred.JobClient:  map 20% reduce 0%*
*12/03/21 18:43:07 INFO mapred.JobClient:  map 25% reduce 2%*
*12/03/21 18:43:10 INFO mapred.JobClient:  map 35% reduce 5%*
*12/03/21 18:43:13 INFO mapred.JobClient:  map 41% reduce 6%*
*12/03/21 18:43:16 INFO mapred.JobClient:  map 51% reduce 7%*
*12/03/21 18:43:19 INFO mapred.JobClient:  map 56% reduce 10%*
*12/03/21 18:43:22 INFO mapred.JobClient:  map 66% reduce 13%*
*12/03/21 18:43:25 INFO mapred.JobClient:  map 71% reduce 17%*
*12/03/21 18:43:28 INFO mapred.JobClient:  map 82% reduce 19%*
*12/03/21 18:43:31 INFO mapred.JobClient:  map 87% reduce 21%*
*12/03/21 18:43:34 INFO mapred.JobClient:  map 97% reduce 23%*
*12/03/21 18:43:37 INFO mapred.JobClient:  map 100% reduce 26%*
*12/03/21 18:43:40 INFO mapred.JobClient:  map 100% reduce 29%*
*12/03/21 18:43:43 INFO mapred.JobClient:  map 100% reduce 41%*
*12/03/21 18:43:46 INFO mapred.JobClient:  map 100% reduce 65%*
*12/03/21 18:43:49 INFO mapred.JobClient:  map 100% reduce 100%*
*12/03/21 18:43:54 INFO mapred.JobClient: Job complete:
job_201203211827_0011*
*12/03/21 18:43:54 INFO mapred.JobClient: Counters: 31*
*12/03/21 18:43:54 INFO mapred.JobClient:   Job Counters *
*12/03/21 18:43:54 INFO mapred.JobClient:     Launched reduce tasks=6*
*12/03/21 18:43:54 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=201894*
*12/03/21 18:43:54 INFO mapred.JobClient:     Total time spent by all
reduces waiting after reserving slots (ms)=0*
*12/03/21 18:43:54 INFO mapred.JobClient:     Total time spent by all maps
waiting after reserving slots (ms)=0*
*12/03/21 18:43:54 INFO mapred.JobClient:     Rack-local map tasks=14*
*12/03/21 18:43:54 INFO mapred.JobClient:     Launched map tasks=39*
*12/03/21 18:43:54 INFO mapred.JobClient:     Data-local map tasks=25*
*12/03/21 18:43:54 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=274583*
*12/03/21 18:43:54 INFO mapred.JobClient:   File Input Format Counters *
*12/03/21 18:43:54 INFO mapred.JobClient:     Bytes Read=2339*
*12/03/21 18:43:54 INFO mapred.JobClient:   File Output Format Counters *
*12/03/21 18:43:54 INFO mapred.JobClient:     Bytes Written=817*
*12/03/21 18:43:54 INFO mapred.JobClient:   FileSystemCounters*
*12/03/21 18:43:54 INFO mapred.JobClient:     FILE_BYTES_READ=307*
*12/03/21 18:43:54 INFO mapred.JobClient:     HDFS_BYTES_READ=5615*
*12/03/21 18:43:54 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1357045*
*12/03/21 18:43:54 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=817*
*12/03/21 18:43:54 INFO mapred.JobClient:   Map-Reduce Framework*
*12/03/21 18:43:54 INFO mapred.JobClient:     Map output materialized
bytes=1675*
*12/03/21 18:43:54 INFO mapred.JobClient:     Map input records=5*
*12/03/21 18:43:54 INFO mapred.JobClient:     Reduce shuffle bytes=1651*
*12/03/21 18:43:54 INFO mapred.JobClient:     Spilled Records=10*
*12/03/21 18:43:54 INFO mapred.JobClient:     Map output bytes=261*
*12/03/21 18:43:54 INFO mapred.JobClient:     Total committed heap usage
(bytes)=6684082176*
*12/03/21 18:43:54 INFO mapred.JobClient:     CPU time spent (ms)=26660*
*12/03/21 18:43:54 INFO mapred.JobClient:     Map input bytes=116*
*12/03/21 18:43:54 INFO mapred.JobClient:     SPLIT_RAW_BYTES=3276*
*12/03/21 18:43:54 INFO mapred.JobClient:     Combine input records=0*
*12/03/21 18:43:54 INFO mapred.JobClient:     Reduce input records=5*
*12/03/21 18:43:54 INFO mapred.JobClient:     Reduce input groups=5*
*12/03/21 18:43:54 INFO mapred.JobClient:     Combine output records=0*
*12/03/21 18:43:54 INFO mapred.JobClient:     Physical memory (bytes)
snapshot=8386580480*
*12/03/21 18:43:54 INFO mapred.JobClient:     Reduce output records=5*
*12/03/21 18:43:54 INFO mapred.JobClient:     Virtual memory (bytes)
snapshot=21784285184*
*12/03/21 18:43:54 INFO mapred.JobClient:     Map output records=5*
============================================================================================

Regards
Andy

Re: Too much logging

Posted by Markus Jelsma <ma...@openindex.io>.
Nutch uses Log4J so you could try to configure different log levels for Hadoop 
classes. But i think it's a bad practice as the JobClient logs important 
information when it finishes.

On Wednesday 21 March 2012 09:00:00 Andy Xue wrote:
> Hi:
> 
> When I was using Nutch 1.4 and Hadoop 1.01 to do distributed crawling, the
> log4j logging is way too verbose. Which entries in the *log4j.properties*
> file should I modify in order to display only WARN and above messages? I
> have tried to replaces "ALL", "DEBUG", and "INFO" to "WARN" in both hadoop
> and nutch *log4j.properties* files, but it still doesn't work at all. I
> did build a new job file and copy the *log4j.properties* files to each
> slave node.
> 
> I appreciate your help.
> 
> Below is an example. Bold lines are the ones that I want to remove:
> ===========================================================================
> ================= 12/03/21 18:42:39 INFO crawl.Injector: Injector: starting
> at 2012-03-21 18:42:39
> 12/03/21 18:42:39 INFO crawl.Injector: Injector: crawlDb:
> /crawl_database/crawldb
> 12/03/21 18:42:39 INFO crawl.Injector: Injector: urlDir: /urls
> 12/03/21 18:42:39 INFO crawl.Injector: Injector: Converting injected urls
> to crawl db entries.
> *12/03/21 18:42:44 INFO mapred.FileInputFormat: Total input paths to
> process : 1*
> *12/03/21 18:42:44 INFO mapred.JobClient: Running job:
> job_201203211827_0011 *
> *12/03/21 18:42:45 INFO mapred.JobClient:  map 0% reduce 0%*
> *12/03/21 18:42:58 INFO mapred.JobClient:  map 10% reduce 0%*
> *12/03/21 18:43:01 INFO mapred.JobClient:  map 15% reduce 0%*
> *12/03/21 18:43:04 INFO mapred.JobClient:  map 20% reduce 0%*
> *12/03/21 18:43:07 INFO mapred.JobClient:  map 25% reduce 2%*
> *12/03/21 18:43:10 INFO mapred.JobClient:  map 35% reduce 5%*
> *12/03/21 18:43:13 INFO mapred.JobClient:  map 41% reduce 6%*
> *12/03/21 18:43:16 INFO mapred.JobClient:  map 51% reduce 7%*
> *12/03/21 18:43:19 INFO mapred.JobClient:  map 56% reduce 10%*
> *12/03/21 18:43:22 INFO mapred.JobClient:  map 66% reduce 13%*
> *12/03/21 18:43:25 INFO mapred.JobClient:  map 71% reduce 17%*
> *12/03/21 18:43:28 INFO mapred.JobClient:  map 82% reduce 19%*
> *12/03/21 18:43:31 INFO mapred.JobClient:  map 87% reduce 21%*
> *12/03/21 18:43:34 INFO mapred.JobClient:  map 97% reduce 23%*
> *12/03/21 18:43:37 INFO mapred.JobClient:  map 100% reduce 26%*
> *12/03/21 18:43:40 INFO mapred.JobClient:  map 100% reduce 29%*
> *12/03/21 18:43:43 INFO mapred.JobClient:  map 100% reduce 41%*
> *12/03/21 18:43:46 INFO mapred.JobClient:  map 100% reduce 65%*
> *12/03/21 18:43:49 INFO mapred.JobClient:  map 100% reduce 100%*
> *12/03/21 18:43:54 INFO mapred.JobClient: Job complete:
> job_201203211827_0011*
> *12/03/21 18:43:54 INFO mapred.JobClient: Counters: 31*
> *12/03/21 18:43:54 INFO mapred.JobClient:   Job Counters *
> *12/03/21 18:43:54 INFO mapred.JobClient:     Launched reduce tasks=6*
> *12/03/21 18:43:54 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=201894*
> *12/03/21 18:43:54 INFO mapred.JobClient:     Total time spent by all
> reduces waiting after reserving slots (ms)=0*
> *12/03/21 18:43:54 INFO mapred.JobClient:     Total time spent by all maps
> waiting after reserving slots (ms)=0*
> *12/03/21 18:43:54 INFO mapred.JobClient:     Rack-local map tasks=14*
> *12/03/21 18:43:54 INFO mapred.JobClient:     Launched map tasks=39*
> *12/03/21 18:43:54 INFO mapred.JobClient:     Data-local map tasks=25*
> *12/03/21 18:43:54 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=274583*
> *12/03/21 18:43:54 INFO mapred.JobClient:   File Input Format Counters *
> *12/03/21 18:43:54 INFO mapred.JobClient:     Bytes Read=2339*
> *12/03/21 18:43:54 INFO mapred.JobClient:   File Output Format Counters *
> *12/03/21 18:43:54 INFO mapred.JobClient:     Bytes Written=817*
> *12/03/21 18:43:54 INFO mapred.JobClient:   FileSystemCounters*
> *12/03/21 18:43:54 INFO mapred.JobClient:     FILE_BYTES_READ=307*
> *12/03/21 18:43:54 INFO mapred.JobClient:     HDFS_BYTES_READ=5615*
> *12/03/21 18:43:54 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1357045*
> *12/03/21 18:43:54 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=817*
> *12/03/21 18:43:54 INFO mapred.JobClient:   Map-Reduce Framework*
> *12/03/21 18:43:54 INFO mapred.JobClient:     Map output materialized
> bytes=1675*
> *12/03/21 18:43:54 INFO mapred.JobClient:     Map input records=5*
> *12/03/21 18:43:54 INFO mapred.JobClient:     Reduce shuffle bytes=1651*
> *12/03/21 18:43:54 INFO mapred.JobClient:     Spilled Records=10*
> *12/03/21 18:43:54 INFO mapred.JobClient:     Map output bytes=261*
> *12/03/21 18:43:54 INFO mapred.JobClient:     Total committed heap usage
> (bytes)=6684082176*
> *12/03/21 18:43:54 INFO mapred.JobClient:     CPU time spent (ms)=26660*
> *12/03/21 18:43:54 INFO mapred.JobClient:     Map input bytes=116*
> *12/03/21 18:43:54 INFO mapred.JobClient:     SPLIT_RAW_BYTES=3276*
> *12/03/21 18:43:54 INFO mapred.JobClient:     Combine input records=0*
> *12/03/21 18:43:54 INFO mapred.JobClient:     Reduce input records=5*
> *12/03/21 18:43:54 INFO mapred.JobClient:     Reduce input groups=5*
> *12/03/21 18:43:54 INFO mapred.JobClient:     Combine output records=0*
> *12/03/21 18:43:54 INFO mapred.JobClient:     Physical memory (bytes)
> snapshot=8386580480*
> *12/03/21 18:43:54 INFO mapred.JobClient:     Reduce output records=5*
> *12/03/21 18:43:54 INFO mapred.JobClient:     Virtual memory (bytes)
> snapshot=21784285184*
> *12/03/21 18:43:54 INFO mapred.JobClient:     Map output records=5*
> ===========================================================================
> =================
> 
> Regards
> Andy

-- 
Markus Jelsma - CTO - Openindex