You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "raghavendra prabhu (JIRA)" <ji...@apache.org> on 2005/12/22 10:49:30 UTC
[jira] Created: (NUTCH-148) org.apache.nutch.tools.CrawlTool throws error while doing deleteduplicates
org.apache.nutch.tools.CrawlTool throws error while doing deleteduplicates
--------------------------------------------------------------------------
Key: NUTCH-148
URL: http://issues.apache.org/jira/browse/NUTCH-148
Project: Nutch
Type: Bug
Components: indexer
Versions: 0.8-dev
Environment: Windows XP Home
Reporter: raghavendra prabhu
I get the following error while running org.apache.nutch.tools.CrawlTool
The error actually is in deleteduplicates
51223 001121 Reading url hashes...
051223 001121 Sorting url hashes...
051223 001121 Deleting url duplicates...
051223 001121 Error moving bad file
G:\apache-tomcat-5.5.12\webapps\crux\WEB-INF
\classes\ddup-workingdir\ddup-20051223001121: java.io.IOException:
CreateProcess
: df -k G:\apache-tomcat-5.5.12\webapps\crux\WEB-INF\classes\ddup-workingdir\ddup-20051223001121 error=2
It throws the error here in NFSDataInputStream.java
The exception is org.apache.nutch.fs.ChecksumException: Checksum
error: G:\apach
e-tomcat-5.5.12\webapps\crux\WEB-INF\classes\ddup-workingdir\ddup-20051223001121 at 0
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Commented: (NUTCH-148) org.apache.nutch.tools.CrawlTool throws error while doing deleteduplicates
Posted by "Stefan Groschupf (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/NUTCH-148?page=comments#action_12361199 ]
Stefan Groschupf commented on NUTCH-148:
----------------------------------------
nutch require cygwin or a unix operation system for 0.7 and 0.8.
> org.apache.nutch.tools.CrawlTool throws error while doing deleteduplicates
> --------------------------------------------------------------------------
>
> Key: NUTCH-148
> URL: http://issues.apache.org/jira/browse/NUTCH-148
> Project: Nutch
> Type: Bug
> Components: indexer
> Versions: 0.8-dev
> Environment: Windows XP Home
> Reporter: raghavendra prabhu
>
> I get the following error while running org.apache.nutch.tools.CrawlTool
> The error actually is in deleteduplicates
> 51223 001121 Reading url hashes...
> 051223 001121 Sorting url hashes...
> 051223 001121 Deleting url duplicates...
> 051223 001121 Error moving bad file
> G:\apache-tomcat-5.5.12\webapps\crux\WEB-INF
> \classes\ddup-workingdir\ddup-20051223001121: java.io.IOException:
> CreateProcess
> : df -k G:\apache-tomcat-5.5.12\webapps\crux\WEB-INF\classes\ddup-workingdir\ddup-20051223001121 error=2
> It throws the error here in NFSDataInputStream.java
> The exception is org.apache.nutch.fs.ChecksumException: Checksum
> error: G:\apach
> e-tomcat-5.5.12\webapps\crux\WEB-INF\classes\ddup-workingdir\ddup-20051223001121 at 0
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Commented: (NUTCH-148) org.apache.nutch.tools.CrawlTool throws error while doing deleteduplicates
Posted by "Piotr Kosiorowski (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/NUTCH-148?page=comments#action_12361128 ]
Piotr Kosiorowski commented on NUTCH-148:
-----------------------------------------
Do you have Cygwin installed?
Is 'df' working in your cygwin installation?
Do you run crawl from cygwin shell?
Nutch requires cygwin on Windows.
> org.apache.nutch.tools.CrawlTool throws error while doing deleteduplicates
> --------------------------------------------------------------------------
>
> Key: NUTCH-148
> URL: http://issues.apache.org/jira/browse/NUTCH-148
> Project: Nutch
> Type: Bug
> Components: indexer
> Versions: 0.8-dev
> Environment: Windows XP Home
> Reporter: raghavendra prabhu
>
> I get the following error while running org.apache.nutch.tools.CrawlTool
> The error actually is in deleteduplicates
> 51223 001121 Reading url hashes...
> 051223 001121 Sorting url hashes...
> 051223 001121 Deleting url duplicates...
> 051223 001121 Error moving bad file
> G:\apache-tomcat-5.5.12\webapps\crux\WEB-INF
> \classes\ddup-workingdir\ddup-20051223001121: java.io.IOException:
> CreateProcess
> : df -k G:\apache-tomcat-5.5.12\webapps\crux\WEB-INF\classes\ddup-workingdir\ddup-20051223001121 error=2
> It throws the error here in NFSDataInputStream.java
> The exception is org.apache.nutch.fs.ChecksumException: Checksum
> error: G:\apach
> e-tomcat-5.5.12\webapps\crux\WEB-INF\classes\ddup-workingdir\ddup-20051223001121 at 0
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Closed: (NUTCH-148) org.apache.nutch.tools.CrawlTool throws error while doing deleteduplicates
Posted by "Piotr Kosiorowski (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/NUTCH-148?page=all ]
Piotr Kosiorowski closed NUTCH-148:
-----------------------------------
Resolution: Invalid
> org.apache.nutch.tools.CrawlTool throws error while doing deleteduplicates
> --------------------------------------------------------------------------
>
> Key: NUTCH-148
> URL: http://issues.apache.org/jira/browse/NUTCH-148
> Project: Nutch
> Type: Bug
> Components: indexer
> Versions: 0.8-dev
> Environment: Windows XP Home
> Reporter: raghavendra prabhu
>
> I get the following error while running org.apache.nutch.tools.CrawlTool
> The error actually is in deleteduplicates
> 51223 001121 Reading url hashes...
> 051223 001121 Sorting url hashes...
> 051223 001121 Deleting url duplicates...
> 051223 001121 Error moving bad file
> G:\apache-tomcat-5.5.12\webapps\crux\WEB-INF
> \classes\ddup-workingdir\ddup-20051223001121: java.io.IOException:
> CreateProcess
> : df -k G:\apache-tomcat-5.5.12\webapps\crux\WEB-INF\classes\ddup-workingdir\ddup-20051223001121 error=2
> It throws the error here in NFSDataInputStream.java
> The exception is org.apache.nutch.fs.ChecksumException: Checksum
> error: G:\apach
> e-tomcat-5.5.12\webapps\crux\WEB-INF\classes\ddup-workingdir\ddup-20051223001121 at 0
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Commented: (NUTCH-148) org.apache.nutch.tools.CrawlTool throws error while doing deleteduplicates
Posted by "Piotr Kosiorowski (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/NUTCH-148?page=comments#action_12361206 ]
Piotr Kosiorowski commented on NUTCH-148:
-----------------------------------------
'df' command is required for NDFS operation so if you were not using NDFS in 0.7.1 and nutch shell scripts you were able to run it on Windows without cygwin. Now majority of tools use NDFS so cygwin is required on Windows. I would asssume the other bug is also cygwin related - please test it with cygwin and report if it fixed the issue.
In future in case if doubts it is better to ask on the nutch-user mailing list rather than create JIRA issue first. I will close both your issues now assuming they are cygwin related. If you fins that it still does not work with cygwin please reopen.
> org.apache.nutch.tools.CrawlTool throws error while doing deleteduplicates
> --------------------------------------------------------------------------
>
> Key: NUTCH-148
> URL: http://issues.apache.org/jira/browse/NUTCH-148
> Project: Nutch
> Type: Bug
> Components: indexer
> Versions: 0.8-dev
> Environment: Windows XP Home
> Reporter: raghavendra prabhu
>
> I get the following error while running org.apache.nutch.tools.CrawlTool
> The error actually is in deleteduplicates
> 51223 001121 Reading url hashes...
> 051223 001121 Sorting url hashes...
> 051223 001121 Deleting url duplicates...
> 051223 001121 Error moving bad file
> G:\apache-tomcat-5.5.12\webapps\crux\WEB-INF
> \classes\ddup-workingdir\ddup-20051223001121: java.io.IOException:
> CreateProcess
> : df -k G:\apache-tomcat-5.5.12\webapps\crux\WEB-INF\classes\ddup-workingdir\ddup-20051223001121 error=2
> It throws the error here in NFSDataInputStream.java
> The exception is org.apache.nutch.fs.ChecksumException: Checksum
> error: G:\apach
> e-tomcat-5.5.12\webapps\crux\WEB-INF\classes\ddup-workingdir\ddup-20051223001121 at 0
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Commented: (NUTCH-148) org.apache.nutch.tools.CrawlTool throws error while doing deleteduplicates
Posted by "raghavendra prabhu (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/NUTCH-148?page=comments#action_12361197 ]
raghavendra prabhu commented on NUTCH-148:
------------------------------------------
Does nutch-0.8-dev require cygwin
Till now i had been using nutch-0.7.1
I have also raised another bug that org.apache.nutch.crawl.Crawl runs in a loop
Is that also because on cygwin
Can you please confirm.
Doubts
1)Does nutch-0.8-dev has dependency on cygwin?
2) Was this dependency there in nutch-0.7
Thanks for responding soon
> org.apache.nutch.tools.CrawlTool throws error while doing deleteduplicates
> --------------------------------------------------------------------------
>
> Key: NUTCH-148
> URL: http://issues.apache.org/jira/browse/NUTCH-148
> Project: Nutch
> Type: Bug
> Components: indexer
> Versions: 0.8-dev
> Environment: Windows XP Home
> Reporter: raghavendra prabhu
>
> I get the following error while running org.apache.nutch.tools.CrawlTool
> The error actually is in deleteduplicates
> 51223 001121 Reading url hashes...
> 051223 001121 Sorting url hashes...
> 051223 001121 Deleting url duplicates...
> 051223 001121 Error moving bad file
> G:\apache-tomcat-5.5.12\webapps\crux\WEB-INF
> \classes\ddup-workingdir\ddup-20051223001121: java.io.IOException:
> CreateProcess
> : df -k G:\apache-tomcat-5.5.12\webapps\crux\WEB-INF\classes\ddup-workingdir\ddup-20051223001121 error=2
> It throws the error here in NFSDataInputStream.java
> The exception is org.apache.nutch.fs.ChecksumException: Checksum
> error: G:\apach
> e-tomcat-5.5.12\webapps\crux\WEB-INF\classes\ddup-workingdir\ddup-20051223001121 at 0
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira