You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Andrew Groh (JIRA)" <ji...@apache.org> on 2007/01/26 15:13:49 UTC
[jira] Updated: (NUTCH-436) Incorrect handling of relative paths
when the embedded URL path is empty
[ https://issues.apache.org/jira/browse/NUTCH-436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Groh updated NUTCH-436:
------------------------------
Description:
If you have a base URL of the form:
http://a/b/c/d;p?q#f
Embedded URL: ?y
Correct Absolute URL: http://a/b/c/d;p?y
Nutch Generated URL: http://a/b/c/?y
Embedded URL: ;x
Correct Absolute URL: http://a/b/c/d;x
Nutch Generated URL: http://a/b/c/;x
See section 4, steps 5-7 of RFC 1808 for the definition of the correct set of steps, and section 5.1 for example
http://www.ietf.org/rfc/rfc1808.txt
was:
If you have a base URL of the form:
http://a/b/c/d;p?q#f
Embedded URL Correct Absolute URL Nutch Generated URL
?y http://a/b/c/d;p?y http://a/b/c/?y
;x http://a/b/c/d;x http://a/b/c/;x
See section 4, steps 5-7 of RFC 1808 for the definition of the correct set of steps, and section 5.1 for example
http://www.ietf.org/rfc/rfc1808.txt
> Incorrect handling of relative paths when the embedded URL path is empty
> ------------------------------------------------------------------------
>
> Key: NUTCH-436
> URL: https://issues.apache.org/jira/browse/NUTCH-436
> Project: Nutch
> Issue Type: Bug
> Components: fetcher
> Reporter: Andrew Groh
> Priority: Critical
>
> If you have a base URL of the form:
> http://a/b/c/d;p?q#f
> Embedded URL: ?y
> Correct Absolute URL: http://a/b/c/d;p?y
> Nutch Generated URL: http://a/b/c/?y
> Embedded URL: ;x
> Correct Absolute URL: http://a/b/c/d;x
> Nutch Generated URL: http://a/b/c/;x
> See section 4, steps 5-7 of RFC 1808 for the definition of the correct set of steps, and section 5.1 for example
> http://www.ietf.org/rfc/rfc1808.txt
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
Re: java.io.FileNotFoundException: / (Is a directory)
Posted by Dennis Kubes <nu...@dragonflymc.com>.
That is a hadoop.log.dir problem value not being set. It is trying to
use the DRFA appender to a file and can't find the log directory.
Dennis
Gal Nitzan wrote:
>
> Just installed latest from trunk.
>
> I run mergesegs and I get the following error in all tasks log files (I use
> default log4j.properties):
>
> log4j:ERROR setFile(null,true) call failed.
> java.io.FileNotFoundException: / (Is a directory)
> at java.io.FileOutputStream.openAppend(Native Method)
> at java.io.FileOutputStream.(FileOutputStream.java:177)
> at java.io.FileOutputStream.(FileOutputStream.java:102)
> at org.apache.log4j.FileAppender.setFile(FileAppender.java:289)
> at
> org.apache.log4j.FileAppender.activateOptions(FileAppender.java:163)
> at
> org.apache.log4j.DailyRollingFileAppender.activateOptions(DailyRollingFileAp
> pender.java:215)
> at
> org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:256)
> at
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:132
> )
> at
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:96)
> at
> org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.jav
> a:654)
> at
> org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.jav
> a:612)
> at
> org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigur
> ator.java:509)
> at
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:
> 415)
> at
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:
> 441)
> at
> org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.
> java:468)
> at org.apache.log4j.LogManager.(LogManager.java:122)
> at org.apache.log4j.Logger.getLogger(Logger.java:104)
> at
> org.apache.commons.logging.impl.Log4JLogger.getLogger(Log4JLogger.java:229)
> at org.apache.commons.logging.impl.Log4JLogger.(Log4JLogger.java:65)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAcces
> sorImpl.java:39)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstruc
> torAccessorImpl.java:27)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> at
> org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.ja
> va:529)
> at
> org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.ja
> va:235)
> at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:370)
> at org.apache.hadoop.mapred.TaskTracker.(TaskTracker.java:59)
> at
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1346)
> log4j:ERROR Either File or DatePattern options are not set for appender
> [DRFA].
>
>
java.io.FileNotFoundException: / (Is a directory)
Posted by Gal Nitzan <gn...@usa.net>.
Just installed latest from trunk.
I run mergesegs and I get the following error in all tasks log files (I use
default log4j.properties):
log4j:ERROR setFile(null,true) call failed.
java.io.FileNotFoundException: / (Is a directory)
at java.io.FileOutputStream.openAppend(Native Method)
at java.io.FileOutputStream.(FileOutputStream.java:177)
at java.io.FileOutputStream.(FileOutputStream.java:102)
at org.apache.log4j.FileAppender.setFile(FileAppender.java:289)
at
org.apache.log4j.FileAppender.activateOptions(FileAppender.java:163)
at
org.apache.log4j.DailyRollingFileAppender.activateOptions(DailyRollingFileAp
pender.java:215)
at
org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:256)
at
org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:132
)
at
org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:96)
at
org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.jav
a:654)
at
org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.jav
a:612)
at
org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigur
ator.java:509)
at
org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:
415)
at
org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:
441)
at
org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.
java:468)
at org.apache.log4j.LogManager.(LogManager.java:122)
at org.apache.log4j.Logger.getLogger(Logger.java:104)
at
org.apache.commons.logging.impl.Log4JLogger.getLogger(Log4JLogger.java:229)
at org.apache.commons.logging.impl.Log4JLogger.(Log4JLogger.java:65)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAcces
sorImpl.java:39)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstruc
torAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at
org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.ja
va:529)
at
org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.ja
va:235)
at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:370)
at org.apache.hadoop.mapred.TaskTracker.(TaskTracker.java:59)
at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1346)
log4j:ERROR Either File or DatePattern options are not set for appender
[DRFA].
Re: record version mismatch occured
Posted by Sami Siren <ss...@gmail.com>.
Gal Nitzan wrote:
> Thanks Sami,
>
> By redo do you mean re-parse or re-fetch + re-parse
generate -> fetch -> parse
--
Sami Siren
RE: record version mismatch occured
Posted by Gal Nitzan <gn...@usa.net>.
Thanks Sami,
By redo do you mean re-parse or re-fetch + re-parse
-----Original Message-----
From: Sami Siren [mailto:ssiren@gmail.com]
Sent: Friday, January 26, 2007 10:49 PM
To: nutch-dev@lucene.apache.org
Subject: Re: record version mismatch occured
Gal Nitzan wrote:
> Got it. I used latest trunk for a few hours and it seems that it changed
the
> version of Crawldatum to ver. 5 :(
Earlier one left too early, one(ore more) of your segments has data
written with newer version. If you haven't updated crawldb then you just
need to redo that(those) segment(s).
--
Sami Siren
Re: record version mismatch occured
Posted by Sami Siren <ss...@gmail.com>.
Gal Nitzan wrote:
> Got it. I used latest trunk for a few hours and it seems that it changed the
> version of Crawldatum to ver. 5 :(
Earlier one left too early, one(ore more) of your segments has data
written with newer version. If you haven't updated crawldb then you just
need to redo that(those) segment(s).
--
Sami Siren
Re: record version mismatch occured
Posted by Sami Siren <ss...@gmail.com>.
Gal Nitzan wrote:
> Got it. I used latest trunk for a few hours and it seems that it changed the
> version of Crawldatum to ver. 5 :(
yes, version is updated on write
RE: record version mismatch occured
Posted by Gal Nitzan <gn...@usa.net>.
Got it. I used latest trunk for a few hours and it seems that it changed the
version of Crawldatum to ver. 5 :(
-----Original Message-----
From: Gal Nitzan [mailto:gnitzan@usa.net]
Sent: Friday, January 26, 2007 4:57 PM
To: nutch-dev@lucene.apache.org
Subject: record version mismatch occured
Trying to mergesegs I get the following, any idea?
A record version mismatch occured. Expecting v4, found v5
at org.apache.nutch.crawl.CrawlDatum.readFields(CrawlDatum.java:147)
at
org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1
175)
at
org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1258)
at
org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordRea
der.java:69)
at
org.apache.nutch.segment.SegmentMerger$ObjectInputFormat$1.next(SegmentMerge
r.java:139)
at org.apache.hadoop.mapred.MapTask$3.next(MapTask.java:201)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:44)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:213)
at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1211)
record version mismatch occured
Posted by Gal Nitzan <gn...@usa.net>.
Trying to mergesegs I get the following, any idea?
A record version mismatch occured. Expecting v4, found v5
at org.apache.nutch.crawl.CrawlDatum.readFields(CrawlDatum.java:147)
at
org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1
175)
at
org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1258)
at
org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordRea
der.java:69)
at
org.apache.nutch.segment.SegmentMerger$ObjectInputFormat$1.next(SegmentMerge
r.java:139)
at org.apache.hadoop.mapred.MapTask$3.next(MapTask.java:201)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:44)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:213)
at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1211)