You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Piotr Kosiorowski (JIRA)" <ji...@apache.org> on 2005/12/22 19:37:30 UTC

[jira] Commented: (NUTCH-148) org.apache.nutch.tools.CrawlTool throws error while doing deleteduplicates

    [ http://issues.apache.org/jira/browse/NUTCH-148?page=comments#action_12361128 ] 

Piotr Kosiorowski commented on NUTCH-148:
-----------------------------------------

Do you have Cygwin installed? 
Is 'df' working in your cygwin installation?
Do you run crawl from cygwin shell?

Nutch requires cygwin on Windows.

> org.apache.nutch.tools.CrawlTool throws error while doing deleteduplicates
> --------------------------------------------------------------------------
>
>          Key: NUTCH-148
>          URL: http://issues.apache.org/jira/browse/NUTCH-148
>      Project: Nutch
>         Type: Bug
>   Components: indexer
>     Versions: 0.8-dev
>  Environment: Windows XP Home
>     Reporter: raghavendra prabhu

>
> I get the following error while running org.apache.nutch.tools.CrawlTool
> The error actually is in deleteduplicates 
> 51223 001121 Reading url hashes...
> 051223 001121 Sorting url hashes...
> 051223 001121 Deleting url duplicates...
> 051223 001121 Error moving bad file 
> G:\apache-tomcat-5.5.12\webapps\crux\WEB-INF
> \classes\ddup-workingdir\ddup-20051223001121: java.io.IOException: 
> CreateProcess
> : df -k  G:\apache-tomcat-5.5.12\webapps\crux\WEB-INF\classes\ddup-workingdir\ddup-20051223001121 error=2
> It throws the error here in NFSDataInputStream.java
> The exception is org.apache.nutch.fs.ChecksumException: Checksum 
> error: G:\apach
> e-tomcat-5.5.12\webapps\crux\WEB-INF\classes\ddup-workingdir\ddup-20051223001121 at 0

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira