You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nutch.apache.org by le...@apache.org on 2015/12/10 04:02:25 UTC

svn commit: r1719004 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/tools/FileDumper.java

Author: lewismc
Date: Thu Dec 10 03:02:25 2015
New Revision: 1719004

URL: http://svn.apache.org/viewvc?rev=1719004&view=rev
Log:
NUTCH-2180 FileDumper skips Corrupt Segments this closes #85

Modified:
    nutch/trunk/CHANGES.txt
    nutch/trunk/src/java/org/apache/nutch/tools/FileDumper.java

Modified: nutch/trunk/CHANGES.txt
URL: http://svn.apache.org/viewvc/nutch/trunk/CHANGES.txt?rev=1719004&r1=1719003&r2=1719004&view=diff
==============================================================================
--- nutch/trunk/CHANGES.txt (original)
+++ nutch/trunk/CHANGES.txt Thu Dec 10 03:02:25 2015
@@ -1,5 +1,7 @@
 Nutch Change Log
 
+* NUTCH-2180 FileDumper skips Corrupt Segments (Harshavardhan Manjunatha via lewismc)
+
 * NUTCH-2042 parse-html increase chunk size used to detect charset (snagel)
 
 * NUTCH-2172 index-more: document format of contenttype-mapping.txt (Nicola Tonellotto, snagel)

Modified: nutch/trunk/src/java/org/apache/nutch/tools/FileDumper.java
URL: http://svn.apache.org/viewvc/nutch/trunk/src/java/org/apache/nutch/tools/FileDumper.java?rev=1719004&r1=1719003&r2=1719004&view=diff
==============================================================================
--- nutch/trunk/src/java/org/apache/nutch/tools/FileDumper.java (original)
+++ nutch/trunk/src/java/org/apache/nutch/tools/FileDumper.java Thu Dec 10 03:02:25 2015
@@ -172,6 +172,11 @@ public class FileDumper {
         }
       });
 
+      if (partDirs == null) {
+        LOG.warn("Skipping Corrupt Segment: [{}]", segment.getAbsolutePath());
+        continue;
+      }
+
       for (File partDir : partDirs) {
         try {
           String segmentPath = partDir + "/data";