You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@manifoldcf.apache.org by Butz Joachim <Jo...@hcsolutions.at> on 2017/03/23 14:46:56 UTC

rdf:RDF not detected, "not valid feed" in DEBUG log

Hi,

I am using ManifoldCF 2.6.

The rss connector does not crawl the feed http://rss.orf.at/news.xml.
In manifoldcf.log the following line appears:
org.apache.manifoldcf.crawler.connectors.rss.RSSConnector$OuterContextClass DEBUG 2017-03-23 14:29:54,718 (Worker thread '1') - RSS: RSS document 'http://rss.orf.at/news.xml' does not have rss, feed, or rdf:RDF tag - not valid feed

I tried the following change in RSSConnector (on branch release-2.6-branch) and now the feed is crawled.
It is maybe a bug in the RSSConnector.

Kind Regards,
Joachim

--- a/connectors/rss/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/rss/RSSConnector.java
+++ b/connectors/rss/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/rss/RSSConnector.java
@@ -3311,7 +3311,7 @@ public class RSSConnector extends org.apache.manifoldcf.crawler.connectors.BaseR
           Logging.connectors.debug("RSS: Parsed bottom-level XML for RSS document '"+documentIdentifier+"'");
         return new RSSContextClass(theStream,namespace,localName,qName,atts,documentIdentifier,activities,filter);
       }
-      else if (localName.equals("RDF"))
+      else if (localName.toUpperCase().equals("RDF"))
       {
         // RDF/Atom feed detected
         outerTagCount++;
@@ -3345,7 +3345,7 @@ public class RSSConnector extends org.apache.manifoldcf.crawler.connectors.BaseR
       {
         rescanTimeSet = ((RSSContextClass)context).process();
       }
-      else if (tagName.equals("RDF"))
+      else if (tagName.toUpperCase().equals("RDF"))
       {
         rescanTimeSet = ((RDFContextClass)context).process();
       }

_______________________________________________

Dipl.-Ing. Joachim Butz
Softwareentwickler

HC SOLUTIONS GesmbH
A - 4030 Linz, Dauphinestraße 5
Telefon: +43 (0)732 / 9394 0
Mobil:
Fax:     +43 (0)732 / 9394 800
E-Mail:  Joachim.Butz@hcsolutions.at
Home:   http://www.hcsolutions.at/
            http://www.tomo-base.at/

Firmenbuchnummer: FN 115314 F
Firmenbuchgericht: Landesgericht Linz
Rechtsform: GesmbH
UID-Nr. ATU 36898407
_______________________________________________