You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Emmanuel JOKE <jo...@gmail.com> on 2007/07/03 16:31:58 UTC

Re: Crawl error with hadoop

Thanks Mathijs, it was exactly what i needed.

We should create a JIRA to add on the next Nutch release.

> Hi Emmanuel,
>
> I think it has something to do with the fact that the environment
> variables ${hadoop.log.dir} and ${hadoop.log.file} are not propagated to
> the child JVM's which are spawned by the TaskTracker.
> The log4j.properties file which comes with *Hadoop* solves this by
> putting the following lines at the top of the log4j.properties file:
>
> # Define some default values that can be overridden by system properties
> hadoop.log.dir=.
> hadoop.log.file=hadoop.log
>
> I don't know why the Nutch version of the log4j.properties file hasn't
> got this in it.
> Anyway, adding this solved it for me.
>
> Mathijs
>
> Emmanuel JOKE wrote:
>> Hi Guys,
>>
>> I have a cluster of 2 machine : Linux; Java 1.6
>> I started a crawl on a list of few website only. I used the command
>> bin/nutch crawl urls/site1 -dir crawld -depth 10 -topN 100000 -threads
>> 30
>>
>> I had an error on my 6th depth.
>>
>> CrawlDb update: starting
>> CrawlDb update: db: crawld/crawldb
>> CrawlDb update: segments: [crawld/segments/20070627053531]
>> CrawlDb update: additions allowed: true
>> CrawlDb update: URL normalizing: true
>> CrawlDb update: URL filtering: true
>> CrawlDb update: Merging segment data into db.
>> task_0035_m_000005_0: log4j:ERROR setFile(null,true) call failed.
>> task_0035_m_000005_0: java.io.FileNotFoundException:
>> /data/sengine/search/bin/../logs (Is a directory)
>> task_0035_m_000005_0:   at java.io.FileOutputStream.openAppend(Native
>> Method)
>> task_0035_m_000005_0:   at java.io.FileOutputStream.<init>(
>> FileOutputStream.java:177)
>> task_0035_m_000005_0:   at java.io.FileOutputStream.<init>(
>> FileOutputStream.java:102)
>> task_0035_m_000005_0:   at org.apache.log4j.FileAppender.setFile(
>> FileAppender.java:289)
>> task_0035_m_000005_0:   at
>> org.apache.log4j.FileAppender.activateOptions(
>> FileAppender.java:163)
>> task_0035_m_000005_0:   at
>> org.apache.log4j.DailyRollingFileAppender.activateOptions(
>> DailyRollingFileAppender.java:215)
>> task_0035_m_000005_0:   at
>> org.apache.log4j.config.PropertySetter.activate(
>> PropertySetter.java:256)
>> task_0035_m_000005_0:   at
>> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java
>> :132)
>> task_0035_m_000005_0:   at
>> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java
:96)
>>
>> task_0035_m_000005_0:   at
>> org.apache.log4j.PropertyConfigurator.parseAppender(
>> PropertyConfigurator.java:654)
>> task_0035_m_000005_0:   at
>> org.apache.log4j.PropertyConfigurator.parseCategory(
>> PropertyConfigurator.java:612)
>> task_0035_m_000005_0:   at
>> org.apache.log4j.PropertyConfigurator.configureRootCategory(
>> PropertyConfigurator.java:509)
>> task_0035_m_000005_0:   at
>> org.apache.log4j.PropertyConfigurator.doConfigure
>> (PropertyConfigurator.java:415)
>> task_0035_m_000005_0:   at
>> org.apache.log4j.PropertyConfigurator.doConfigure
>> (PropertyConfigurator.java:441)
>> task_0035_m_000005_0:   at
>> org.apache.log4j.helpers.OptionConverter.selectAndConfigure(
>> OptionConverter.java:468)
>> task_0035_m_000005_0:   at org.apache.log4j.LogManager.<clinit>(
>> LogManager.java:122)
>> task_0035_m_000005_0:   at org.apache.log4j.Logger.getLogger(Logger.java
>> :104)
>> task_0035_m_000005_0:   at
>> org.apache.commons.logging.impl.Log4JLogger.getLogger(Log4JLogger.java
:229)
>>
>> task_0035_m_000005_0:   at org.apache.commons.logging.impl.Log4JLogger
>> .<init>(Log4JLogger.java:65)
>> task_0035_m_000005_0:   at
>> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>> task_0035_m_000005_0:   at
>> sun.reflect.NativeConstructorAccessorImpl.newInstance(
>> NativeConstructorAccessorImpl.java:39)
>> task_0035_m_000005_0:   at
>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
>> DelegatingConstructorAccessorImpl.java:27)
>> task_0035_m_000005_0:   at java.lang.reflect.Constructor.newInstance(
>> Constructor.java:513)
>> task_0035_m_000005_0:   at
>> org.apache.commons.logging.impl.LogFactoryImpl.newInstance(
>> LogFactoryImpl.java:529)
>> task_0035_m_000005_0:   at
>> org.apache.commons.logging.impl.LogFactoryImpl.getInstance(
>> LogFactoryImpl.java:235)
>> task_0035_m_000005_0:   at org.apache.commons.logging.LogFactory.getLog(
>> LogFactory.java:370)
>> task_0035_m_000005_0:   at
>> org.apache.hadoop.mapred.TaskTracker.<clinit>(
>> TaskTracker.java:82)
>> task_0035_m_000005_0:   at
>> org.apache.hadoop.mapred.TaskTracker$Child.main(
>> TaskTracker.java:1423)
>> task_0035_m_000005_0: log4j:ERROR Either File or DatePattern options
>> are not
>> set for appender [DRFA].
>> task_0035_m_000004_0: log4j:ERROR setFile(null,true) call failed.
>>
>>
>> I notice that every time I had a nutch error when i use my cluster it
>> give
>> me this exception regardings the log4j. I don't understand why. Any
>> idea ?
>> Besides my crawl stopped and I don't why too. How can i get any details
>> ?
>>
>> cheers
>> E
>>
>
> --
> Knowlogy
> Helperpark 290 C
> 9723 ZA Groningen
>
> mathijs.homminga@knowlogy.nl
> +31 (0)6 15312977
> http://www.knowlogy.nl
>
>
>

multiple sites run

Posted by Tsengtan A Shuy <tt...@sbcglobal.net>.
I follow the RunNutchInEclipse wiki article to run 1002 websites.
I got all the five folders, but the size of the these folders is smaller
then the one only running my own website.

What went wrong with this 1002 websites run.

How do you run Java 1.4 and 1.5 at the same time in Eclipse environment?

Adam Shuy, President
ePacific Web Design & Hosting
Professional Web/Software developer
TEL: 408-272-6946
www.epacificweb.com