You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Andrew Groh (JIRA)" <ji...@apache.org> on 2007/01/26 15:13:49 UTC

[jira] Updated: (NUTCH-436) Incorrect handling of relative paths when the embedded URL path is empty

     [ https://issues.apache.org/jira/browse/NUTCH-436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Groh updated NUTCH-436:
------------------------------

    Description: 
If you have a base URL of the form:
http://a/b/c/d;p?q#f

Embedded URL: ?y
Correct Absolute URL: http://a/b/c/d;p?y 
Nutch Generated URL: http://a/b/c/?y

Embedded URL: ;x
Correct Absolute URL: http://a/b/c/d;x 
Nutch Generated URL: http://a/b/c/;x


See section 4, steps 5-7 of RFC 1808 for the definition of the correct set of steps, and section 5.1 for example

http://www.ietf.org/rfc/rfc1808.txt




  was:
If you have a base URL of the form:
http://a/b/c/d;p?q#f

Embedded URL      Correct Absolute URL     Nutch Generated URL
?y                                http://a/b/c/d;p?y               http://a/b/c/?y
;x                                 http://a/b/c/d;x                    http://a/b/c/;x


See section 4, steps 5-7 of RFC 1808 for the definition of the correct set of steps, and section 5.1 for example

http://www.ietf.org/rfc/rfc1808.txt





> Incorrect handling of relative paths when the embedded URL path is empty
> ------------------------------------------------------------------------
>
>                 Key: NUTCH-436
>                 URL: https://issues.apache.org/jira/browse/NUTCH-436
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>            Reporter: Andrew Groh
>            Priority: Critical
>
> If you have a base URL of the form:
> http://a/b/c/d;p?q#f
> Embedded URL: ?y
> Correct Absolute URL: http://a/b/c/d;p?y 
> Nutch Generated URL: http://a/b/c/?y
> Embedded URL: ;x
> Correct Absolute URL: http://a/b/c/d;x 
> Nutch Generated URL: http://a/b/c/;x
> See section 4, steps 5-7 of RFC 1808 for the definition of the correct set of steps, and section 5.1 for example
> http://www.ietf.org/rfc/rfc1808.txt

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: java.io.FileNotFoundException: / (Is a directory)

Posted by Dennis Kubes <nu...@dragonflymc.com>.
That is a hadoop.log.dir problem value not being set.  It is trying to 
use the DRFA appender to a file and can't find the log directory.

Dennis

Gal Nitzan wrote:
> 
> Just installed latest from trunk.
> 
> I run mergesegs and I get the following error in all tasks log files (I use
> default log4j.properties):
> 
> log4j:ERROR setFile(null,true) call failed.
> java.io.FileNotFoundException: / (Is a directory)
>         at java.io.FileOutputStream.openAppend(Native Method)
>         at java.io.FileOutputStream.(FileOutputStream.java:177)
>         at java.io.FileOutputStream.(FileOutputStream.java:102)
>         at org.apache.log4j.FileAppender.setFile(FileAppender.java:289)
>         at
> org.apache.log4j.FileAppender.activateOptions(FileAppender.java:163)
>         at
> org.apache.log4j.DailyRollingFileAppender.activateOptions(DailyRollingFileAp
> pender.java:215)
>         at
> org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:256)
>         at
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:132
> )
>         at
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:96)
>         at
> org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.jav
> a:654)
>         at
> org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.jav
> a:612)
>         at
> org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigur
> ator.java:509)
>         at
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:
> 415)
>         at
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:
> 441)
>         at
> org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.
> java:468)
>         at org.apache.log4j.LogManager.(LogManager.java:122)
>         at org.apache.log4j.Logger.getLogger(Logger.java:104)
>         at
> org.apache.commons.logging.impl.Log4JLogger.getLogger(Log4JLogger.java:229)
>         at org.apache.commons.logging.impl.Log4JLogger.(Log4JLogger.java:65)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
>         at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAcces
> sorImpl.java:39)
>         at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstruc
> torAccessorImpl.java:27)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>         at
> org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.ja
> va:529)
>         at
> org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.ja
> va:235)
>         at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:370)
>         at org.apache.hadoop.mapred.TaskTracker.(TaskTracker.java:59)
>         at
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1346)
> log4j:ERROR Either File or DatePattern options are not set for appender
> [DRFA].
> 
> 

java.io.FileNotFoundException: / (Is a directory)

Posted by Gal Nitzan <gn...@usa.net>.

Just installed latest from trunk.

I run mergesegs and I get the following error in all tasks log files (I use
default log4j.properties):

log4j:ERROR setFile(null,true) call failed.
java.io.FileNotFoundException: / (Is a directory)
        at java.io.FileOutputStream.openAppend(Native Method)
        at java.io.FileOutputStream.(FileOutputStream.java:177)
        at java.io.FileOutputStream.(FileOutputStream.java:102)
        at org.apache.log4j.FileAppender.setFile(FileAppender.java:289)
        at
org.apache.log4j.FileAppender.activateOptions(FileAppender.java:163)
        at
org.apache.log4j.DailyRollingFileAppender.activateOptions(DailyRollingFileAp
pender.java:215)
        at
org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:256)
        at
org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:132
)
        at
org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:96)
        at
org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.jav
a:654)
        at
org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.jav
a:612)
        at
org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigur
ator.java:509)
        at
org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:
415)
        at
org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:
441)
        at
org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.
java:468)
        at org.apache.log4j.LogManager.(LogManager.java:122)
        at org.apache.log4j.Logger.getLogger(Logger.java:104)
        at
org.apache.commons.logging.impl.Log4JLogger.getLogger(Log4JLogger.java:229)
        at org.apache.commons.logging.impl.Log4JLogger.(Log4JLogger.java:65)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
        at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAcces
sorImpl.java:39)
        at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstruc
torAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
        at
org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.ja
va:529)
        at
org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.ja
va:235)
        at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:370)
        at org.apache.hadoop.mapred.TaskTracker.(TaskTracker.java:59)
        at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1346)
log4j:ERROR Either File or DatePattern options are not set for appender
[DRFA].



Re: record version mismatch occured

Posted by Sami Siren <ss...@gmail.com>.
Gal Nitzan wrote:
> Thanks Sami,
> 
> By redo do you mean re-parse or re-fetch + re-parse

generate -> fetch -> parse

--
 Sami Siren


RE: record version mismatch occured

Posted by Gal Nitzan <gn...@usa.net>.
Thanks Sami,

By redo do you mean re-parse or re-fetch + re-parse

-----Original Message-----
From: Sami Siren [mailto:ssiren@gmail.com] 
Sent: Friday, January 26, 2007 10:49 PM
To: nutch-dev@lucene.apache.org
Subject: Re: record version mismatch occured

Gal Nitzan wrote:
> Got it. I used latest trunk for a few hours and it seems that it changed
the
> version of Crawldatum to ver. 5 :(

Earlier one left too early, one(ore more) of your segments has data
written with newer version. If you haven't updated crawldb then you just
need to redo that(those) segment(s).

--
 Sami Siren




Re: record version mismatch occured

Posted by Sami Siren <ss...@gmail.com>.
Gal Nitzan wrote:
> Got it. I used latest trunk for a few hours and it seems that it changed the
> version of Crawldatum to ver. 5 :(

Earlier one left too early, one(ore more) of your segments has data
written with newer version. If you haven't updated crawldb then you just
need to redo that(those) segment(s).

--
 Sami Siren


Re: record version mismatch occured

Posted by Sami Siren <ss...@gmail.com>.
Gal Nitzan wrote:
> Got it. I used latest trunk for a few hours and it seems that it changed the
> version of Crawldatum to ver. 5 :(

yes, version is updated on write

RE: record version mismatch occured

Posted by Gal Nitzan <gn...@usa.net>.
Got it. I used latest trunk for a few hours and it seems that it changed the
version of Crawldatum to ver. 5 :(



-----Original Message-----
From: Gal Nitzan [mailto:gnitzan@usa.net] 
Sent: Friday, January 26, 2007 4:57 PM
To: nutch-dev@lucene.apache.org
Subject: record version mismatch occured

Trying to mergesegs I get the following, any idea?


A record version mismatch occured. Expecting v4, found v5
	at org.apache.nutch.crawl.CrawlDatum.readFields(CrawlDatum.java:147)
	at
org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1
175)
	at
org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1258)
	at
org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordRea
der.java:69)
	at
org.apache.nutch.segment.SegmentMerger$ObjectInputFormat$1.next(SegmentMerge
r.java:139)
	at org.apache.hadoop.mapred.MapTask$3.next(MapTask.java:201)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:44)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:213)
	at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1211)





record version mismatch occured

Posted by Gal Nitzan <gn...@usa.net>.
Trying to mergesegs I get the following, any idea?


A record version mismatch occured. Expecting v4, found v5
	at org.apache.nutch.crawl.CrawlDatum.readFields(CrawlDatum.java:147)
	at
org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1
175)
	at
org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1258)
	at
org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordRea
der.java:69)
	at
org.apache.nutch.segment.SegmentMerger$ObjectInputFormat$1.next(SegmentMerge
r.java:139)
	at org.apache.hadoop.mapred.MapTask$3.next(MapTask.java:201)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:44)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:213)
	at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1211)