You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nutch.apache.org by ab...@apache.org on 2007/09/18 21:07:40 UTC

svn commit: r577018 - in /lucene/nutch/trunk: CHANGES.txt src/java/org/apache/nutch/crawl/Generator.java

Author: ab
Date: Tue Sep 18 12:07:39 2007
New Revision: 577018

URL: http://svn.apache.org/viewvc?rev=577018&view=rev
Log:
NUTCH-554 - Generator throws IOException on invalid urls.

Modified:
    lucene/nutch/trunk/CHANGES.txt
    lucene/nutch/trunk/src/java/org/apache/nutch/crawl/Generator.java

Modified: lucene/nutch/trunk/CHANGES.txt
URL: http://svn.apache.org/viewvc/lucene/nutch/trunk/CHANGES.txt?rev=577018&r1=577017&r2=577018&view=diff
==============================================================================
--- lucene/nutch/trunk/CHANGES.txt (original)
+++ lucene/nutch/trunk/CHANGES.txt Tue Sep 18 12:07:39 2007
@@ -133,6 +133,9 @@
 
 45. NUTCH-546 - file URL are filtered out by the crawler. (dogacan)
 
+46. NUTCH-554 - Generator throws IOException on invalid urls.
+    (Brian Whitman via ab)
+
 Release 0.9 - 2007-04-02
 
  1. Changed log4j confiquration to log to stdout on commandline

Modified: lucene/nutch/trunk/src/java/org/apache/nutch/crawl/Generator.java
URL: http://svn.apache.org/viewvc/lucene/nutch/trunk/src/java/org/apache/nutch/crawl/Generator.java?rev=577018&r1=577017&r2=577018&view=diff
==============================================================================
--- lucene/nutch/trunk/src/java/org/apache/nutch/crawl/Generator.java (original)
+++ lucene/nutch/trunk/src/java/org/apache/nutch/crawl/Generator.java Tue Sep 18 12:07:39 2007
@@ -184,7 +184,13 @@
         Text url = entry.url;
 
         if (maxPerHost > 0) {                     // are we counting hosts?
-          URL u = new URL(url.toString());
+          URL u = null;
+          try {
+            u = new URL(url.toString());
+          } catch (MalformedURLException e) {
+            LOG.info("Bad protocol in url: " + url.toString());
+            continue;
+          }
           String host = u.getHost();
           if (host == null) {
             // unknown host, skip