You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nutch.apache.org by le...@apache.org on 2015/12/10 04:02:25 UTC
svn commit: r1719004 - in /nutch/trunk: CHANGES.txt
src/java/org/apache/nutch/tools/FileDumper.java
Author: lewismc
Date: Thu Dec 10 03:02:25 2015
New Revision: 1719004
URL: http://svn.apache.org/viewvc?rev=1719004&view=rev
Log:
NUTCH-2180 FileDumper skips Corrupt Segments this closes #85
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/src/java/org/apache/nutch/tools/FileDumper.java
Modified: nutch/trunk/CHANGES.txt
URL: http://svn.apache.org/viewvc/nutch/trunk/CHANGES.txt?rev=1719004&r1=1719003&r2=1719004&view=diff
==============================================================================
--- nutch/trunk/CHANGES.txt (original)
+++ nutch/trunk/CHANGES.txt Thu Dec 10 03:02:25 2015
@@ -1,5 +1,7 @@
Nutch Change Log
+* NUTCH-2180 FileDumper skips Corrupt Segments (Harshavardhan Manjunatha via lewismc)
+
* NUTCH-2042 parse-html increase chunk size used to detect charset (snagel)
* NUTCH-2172 index-more: document format of contenttype-mapping.txt (Nicola Tonellotto, snagel)
Modified: nutch/trunk/src/java/org/apache/nutch/tools/FileDumper.java
URL: http://svn.apache.org/viewvc/nutch/trunk/src/java/org/apache/nutch/tools/FileDumper.java?rev=1719004&r1=1719003&r2=1719004&view=diff
==============================================================================
--- nutch/trunk/src/java/org/apache/nutch/tools/FileDumper.java (original)
+++ nutch/trunk/src/java/org/apache/nutch/tools/FileDumper.java Thu Dec 10 03:02:25 2015
@@ -172,6 +172,11 @@ public class FileDumper {
}
});
+ if (partDirs == null) {
+ LOG.warn("Skipping Corrupt Segment: [{}]", segment.getAbsolutePath());
+ continue;
+ }
+
for (File partDir : partDirs) {
try {
String segmentPath = partDir + "/data";