You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Emmanuel JOKE <jo...@gmail.com> on 2007/06/28 14:54:36 UTC

Crawl error with hadoop

Hi Guys,

I have a cluster of 2 machine : Linux; Java 1.6
I started a crawl on a list of few website only. I used the command
bin/nutch crawl urls/site1 -dir crawld -depth 10 -topN 100000 -threads 30

I had an error on my 6th depth.

CrawlDb update: starting
CrawlDb update: db: crawld/crawldb
CrawlDb update: segments: [crawld/segments/20070627053531]
CrawlDb update: additions allowed: true
CrawlDb update: URL normalizing: true
CrawlDb update: URL filtering: true
CrawlDb update: Merging segment data into db.
task_0035_m_000005_0: log4j:ERROR setFile(null,true) call failed.
task_0035_m_000005_0: java.io.FileNotFoundException:
/data/sengine/search/bin/../logs (Is a directory)
task_0035_m_000005_0:   at java.io.FileOutputStream.openAppend(Native
Method)
task_0035_m_000005_0:   at java.io.FileOutputStream.<init>(
FileOutputStream.java:177)
task_0035_m_000005_0:   at java.io.FileOutputStream.<init>(
FileOutputStream.java:102)
task_0035_m_000005_0:   at org.apache.log4j.FileAppender.setFile(
FileAppender.java:289)
task_0035_m_000005_0:   at org.apache.log4j.FileAppender.activateOptions(
FileAppender.java:163)
task_0035_m_000005_0:   at
org.apache.log4j.DailyRollingFileAppender.activateOptions(
DailyRollingFileAppender.java:215)
task_0035_m_000005_0:   at org.apache.log4j.config.PropertySetter.activate(
PropertySetter.java:256)
task_0035_m_000005_0:   at
org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java
:132)
task_0035_m_000005_0:   at
org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:96)
task_0035_m_000005_0:   at
org.apache.log4j.PropertyConfigurator.parseAppender(
PropertyConfigurator.java:654)
task_0035_m_000005_0:   at
org.apache.log4j.PropertyConfigurator.parseCategory(
PropertyConfigurator.java:612)
task_0035_m_000005_0:   at
org.apache.log4j.PropertyConfigurator.configureRootCategory(
PropertyConfigurator.java:509)
task_0035_m_000005_0:   at org.apache.log4j.PropertyConfigurator.doConfigure
(PropertyConfigurator.java:415)
task_0035_m_000005_0:   at org.apache.log4j.PropertyConfigurator.doConfigure
(PropertyConfigurator.java:441)
task_0035_m_000005_0:   at
org.apache.log4j.helpers.OptionConverter.selectAndConfigure(
OptionConverter.java:468)
task_0035_m_000005_0:   at org.apache.log4j.LogManager.<clinit>(
LogManager.java:122)
task_0035_m_000005_0:   at org.apache.log4j.Logger.getLogger(Logger.java
:104)
task_0035_m_000005_0:   at
org.apache.commons.logging.impl.Log4JLogger.getLogger(Log4JLogger.java:229)
task_0035_m_000005_0:   at org.apache.commons.logging.impl.Log4JLogger
.<init>(Log4JLogger.java:65)
task_0035_m_000005_0:   at
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
task_0035_m_000005_0:   at
sun.reflect.NativeConstructorAccessorImpl.newInstance(
NativeConstructorAccessorImpl.java:39)
task_0035_m_000005_0:   at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
DelegatingConstructorAccessorImpl.java:27)
task_0035_m_000005_0:   at java.lang.reflect.Constructor.newInstance(
Constructor.java:513)
task_0035_m_000005_0:   at
org.apache.commons.logging.impl.LogFactoryImpl.newInstance(
LogFactoryImpl.java:529)
task_0035_m_000005_0:   at
org.apache.commons.logging.impl.LogFactoryImpl.getInstance(
LogFactoryImpl.java:235)
task_0035_m_000005_0:   at org.apache.commons.logging.LogFactory.getLog(
LogFactory.java:370)
task_0035_m_000005_0:   at org.apache.hadoop.mapred.TaskTracker.<clinit>(
TaskTracker.java:82)
task_0035_m_000005_0:   at org.apache.hadoop.mapred.TaskTracker$Child.main(
TaskTracker.java:1423)
task_0035_m_000005_0: log4j:ERROR Either File or DatePattern options are not
set for appender [DRFA].
task_0035_m_000004_0: log4j:ERROR setFile(null,true) call failed.


I notice that every time I had a nutch error when i use my cluster it give
me this exception regardings the log4j. I don't understand why. Any idea ?
Besides my crawl stopped and I don't why too. How can i get any details ?

cheers
E

Re: Crawl error with hadoop

Posted by Mathijs Homminga <ma...@knowlogy.nl>.
Hi Emmanuel,

I think it has something to do with the fact that the environment 
variables ${hadoop.log.dir} and ${hadoop.log.file} are not propagated to 
the child JVM's which are spawned by the TaskTracker.
The log4j.properties file which comes with *Hadoop* solves this by 
putting the following lines at the top of the log4j.properties file:

# Define some default values that can be overridden by system properties
hadoop.log.dir=.
hadoop.log.file=hadoop.log

I don't know why the Nutch version of the log4j.properties file hasn't 
got this in it.
Anyway, adding this solved it for me.

Mathijs

Emmanuel JOKE wrote:
> Hi Guys,
>
> I have a cluster of 2 machine : Linux; Java 1.6
> I started a crawl on a list of few website only. I used the command
> bin/nutch crawl urls/site1 -dir crawld -depth 10 -topN 100000 -threads 30
>
> I had an error on my 6th depth.
>
> CrawlDb update: starting
> CrawlDb update: db: crawld/crawldb
> CrawlDb update: segments: [crawld/segments/20070627053531]
> CrawlDb update: additions allowed: true
> CrawlDb update: URL normalizing: true
> CrawlDb update: URL filtering: true
> CrawlDb update: Merging segment data into db.
> task_0035_m_000005_0: log4j:ERROR setFile(null,true) call failed.
> task_0035_m_000005_0: java.io.FileNotFoundException:
> /data/sengine/search/bin/../logs (Is a directory)
> task_0035_m_000005_0:   at java.io.FileOutputStream.openAppend(Native
> Method)
> task_0035_m_000005_0:   at java.io.FileOutputStream.<init>(
> FileOutputStream.java:177)
> task_0035_m_000005_0:   at java.io.FileOutputStream.<init>(
> FileOutputStream.java:102)
> task_0035_m_000005_0:   at org.apache.log4j.FileAppender.setFile(
> FileAppender.java:289)
> task_0035_m_000005_0:   at org.apache.log4j.FileAppender.activateOptions(
> FileAppender.java:163)
> task_0035_m_000005_0:   at
> org.apache.log4j.DailyRollingFileAppender.activateOptions(
> DailyRollingFileAppender.java:215)
> task_0035_m_000005_0:   at 
> org.apache.log4j.config.PropertySetter.activate(
> PropertySetter.java:256)
> task_0035_m_000005_0:   at
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java
> :132)
> task_0035_m_000005_0:   at
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:96) 
>
> task_0035_m_000005_0:   at
> org.apache.log4j.PropertyConfigurator.parseAppender(
> PropertyConfigurator.java:654)
> task_0035_m_000005_0:   at
> org.apache.log4j.PropertyConfigurator.parseCategory(
> PropertyConfigurator.java:612)
> task_0035_m_000005_0:   at
> org.apache.log4j.PropertyConfigurator.configureRootCategory(
> PropertyConfigurator.java:509)
> task_0035_m_000005_0:   at 
> org.apache.log4j.PropertyConfigurator.doConfigure
> (PropertyConfigurator.java:415)
> task_0035_m_000005_0:   at 
> org.apache.log4j.PropertyConfigurator.doConfigure
> (PropertyConfigurator.java:441)
> task_0035_m_000005_0:   at
> org.apache.log4j.helpers.OptionConverter.selectAndConfigure(
> OptionConverter.java:468)
> task_0035_m_000005_0:   at org.apache.log4j.LogManager.<clinit>(
> LogManager.java:122)
> task_0035_m_000005_0:   at org.apache.log4j.Logger.getLogger(Logger.java
> :104)
> task_0035_m_000005_0:   at
> org.apache.commons.logging.impl.Log4JLogger.getLogger(Log4JLogger.java:229) 
>
> task_0035_m_000005_0:   at org.apache.commons.logging.impl.Log4JLogger
> .<init>(Log4JLogger.java:65)
> task_0035_m_000005_0:   at
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> task_0035_m_000005_0:   at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(
> NativeConstructorAccessorImpl.java:39)
> task_0035_m_000005_0:   at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
> DelegatingConstructorAccessorImpl.java:27)
> task_0035_m_000005_0:   at java.lang.reflect.Constructor.newInstance(
> Constructor.java:513)
> task_0035_m_000005_0:   at
> org.apache.commons.logging.impl.LogFactoryImpl.newInstance(
> LogFactoryImpl.java:529)
> task_0035_m_000005_0:   at
> org.apache.commons.logging.impl.LogFactoryImpl.getInstance(
> LogFactoryImpl.java:235)
> task_0035_m_000005_0:   at org.apache.commons.logging.LogFactory.getLog(
> LogFactory.java:370)
> task_0035_m_000005_0:   at org.apache.hadoop.mapred.TaskTracker.<clinit>(
> TaskTracker.java:82)
> task_0035_m_000005_0:   at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(
> TaskTracker.java:1423)
> task_0035_m_000005_0: log4j:ERROR Either File or DatePattern options 
> are not
> set for appender [DRFA].
> task_0035_m_000004_0: log4j:ERROR setFile(null,true) call failed.
>
>
> I notice that every time I had a nutch error when i use my cluster it 
> give
> me this exception regardings the log4j. I don't understand why. Any 
> idea ?
> Besides my crawl stopped and I don't why too. How can i get any details ?
>
> cheers
> E
>

-- 
Knowlogy
Helperpark 290 C
9723 ZA Groningen

mathijs.homminga@knowlogy.nl
+31 (0)6 15312977
http://www.knowlogy.nl