You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nutch.apache.org by sn...@apache.org on 2017/12/05 11:10:06 UTC

[nutch] branch master updated (d8754b7 -> 9931acc)

This is an automated email from the ASF dual-hosted git repository.

snagel pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/nutch.git.


    from d8754b7  NUTCH-2468 should filter out invalid URLs by default - enable plugin urlfilter-validate by default
     new b4d00e3  This suggested change seems to work. MalformedURLExceptions no longer occur.
     new 5b3cf0e  NUTCH-2451 protocol-ftp to resolve relative URL when following redirects - return empty protocol output instead of throwing exception if   relative redirect URL fails to resolve - format source code - complete LOG message
     new 9931acc  Merge branch 'NUTCH-2451' - cherry-picked e159ad4 from HiranChaudhuri:NUTCH-2451 - closes #241

The 3 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../src/java/org/apache/nutch/protocol/ftp/Ftp.java         | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

-- 
To stop receiving notification emails like this one, please contact
['"commits@nutch.apache.org" <co...@nutch.apache.org>'].

[nutch] 01/03: This suggested change seems to work. MalformedURLExceptions no longer occur.

Posted by sn...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

snagel pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/nutch.git

commit b4d00e3714d02d45c7ec309182736cf22c28c77c
Author: Hiran Chaudhuri <hi...@mail.de>
AuthorDate: Fri Nov 10 00:16:18 2017 +0100

    This suggested change seems to work. MalformedURLExceptions no longer occur.
---
 .../src/java/org/apache/nutch/protocol/ftp/Ftp.java         | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/src/plugin/protocol-ftp/src/java/org/apache/nutch/protocol/ftp/Ftp.java b/src/plugin/protocol-ftp/src/java/org/apache/nutch/protocol/ftp/Ftp.java
index 84aa823..ae73941 100644
--- a/src/plugin/protocol-ftp/src/java/org/apache/nutch/protocol/ftp/Ftp.java
+++ b/src/plugin/protocol-ftp/src/java/org/apache/nutch/protocol/ftp/Ftp.java
@@ -36,6 +36,7 @@ import org.apache.nutch.protocol.ProtocolStatus;
 import crawlercommons.robots.BaseRobotRules;
 
 import java.lang.invoke.MethodHandles;
+import java.net.MalformedURLException;
 import java.net.URL;
 import java.util.List;
 import java.io.IOException;
@@ -142,7 +143,16 @@ public class Ftp implements Protocol {
         } else if (code >= 300 && code < 400) { // handle redirect
           if (redirects == MAX_REDIRECTS)
             throw new FtpException("Too many redirects: " + url);
-          u = new URL(response.getHeader("Location"));
+          
+          String loc = response.getHeader("Location");
+          try {
+            u = new URL(u, loc);
+          }
+          catch(MalformedURLException mue) {
+            LOG.error("Could not create redirectURL for {} with {}", url, loc);
+            throw mue;
+          }
+          
           redirects++;
           if (LOG.isTraceEnabled()) {
             LOG.trace("redirect to " + u);
@@ -152,6 +162,7 @@ public class Ftp implements Protocol {
         }
       }
     } catch (Exception e) {
+      LOG.error("Could not get protocol output for {}", url, e);
       return new ProtocolOutput(null, new ProtocolStatus(e));
     }
   }

-- 
To stop receiving notification emails like this one, please contact
"commits@nutch.apache.org" <co...@nutch.apache.org>.

[nutch] 02/03: NUTCH-2451 protocol-ftp to resolve relative URL when following redirects - return empty protocol output instead of throwing exception if relative redirect URL fails to resolve - format source code - complete LOG message

Posted by sn...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

snagel pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/nutch.git

commit 5b3cf0e2028aed576d080be70fc9028796616b94
Author: Sebastian Nagel <sn...@apache.org>
AuthorDate: Tue Dec 5 11:49:44 2017 +0100

    NUTCH-2451 protocol-ftp to resolve relative URL when following redirects
    - return empty protocol output instead of throwing exception if
      relative redirect URL fails to resolve
    - format source code
    - complete LOG message
---
 .../protocol-ftp/src/java/org/apache/nutch/protocol/ftp/Ftp.java  | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/plugin/protocol-ftp/src/java/org/apache/nutch/protocol/ftp/Ftp.java b/src/plugin/protocol-ftp/src/java/org/apache/nutch/protocol/ftp/Ftp.java
index ae73941..eeba776 100644
--- a/src/plugin/protocol-ftp/src/java/org/apache/nutch/protocol/ftp/Ftp.java
+++ b/src/plugin/protocol-ftp/src/java/org/apache/nutch/protocol/ftp/Ftp.java
@@ -147,10 +147,9 @@ public class Ftp implements Protocol {
           String loc = response.getHeader("Location");
           try {
             u = new URL(u, loc);
-          }
-          catch(MalformedURLException mue) {
+          } catch (MalformedURLException mue) {
             LOG.error("Could not create redirectURL for {} with {}", url, loc);
-            throw mue;
+            return new ProtocolOutput(null, new ProtocolStatus(mue));
           }
           
           redirects++;
@@ -162,7 +161,8 @@ public class Ftp implements Protocol {
         }
       }
     } catch (Exception e) {
-      LOG.error("Could not get protocol output for {}", url, e);
+      LOG.error("Could not get protocol output for {}: {}", url,
+          e.getMessage());
       return new ProtocolOutput(null, new ProtocolStatus(e));
     }
   }

-- 
To stop receiving notification emails like this one, please contact
"commits@nutch.apache.org" <co...@nutch.apache.org>.

[nutch] 03/03: Merge branch 'NUTCH-2451' - cherry-picked e159ad4 from HiranChaudhuri:NUTCH-2451 - closes #241

Posted by sn...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

snagel pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/nutch.git

commit 9931acc489dd24db6e7e4993a694b26170a1be31
Merge: d8754b7 5b3cf0e
Author: Sebastian Nagel <sn...@apache.org>
AuthorDate: Tue Dec 5 12:07:56 2017 +0100

    Merge branch 'NUTCH-2451'
    - cherry-picked e159ad4 from HiranChaudhuri:NUTCH-2451
    - closes #241

 .../src/java/org/apache/nutch/protocol/ftp/Ftp.java         | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

-- 
To stop receiving notification emails like this one, please contact
"commits@nutch.apache.org" <co...@nutch.apache.org>.