You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@manifoldcf.apache.org by Butz Joachim <Jo...@hcsolutions.at> on 2017/03/23 14:46:56 UTC
rdf:RDF not detected, "not valid feed" in DEBUG log
Hi,
I am using ManifoldCF 2.6.
The rss connector does not crawl the feed http://rss.orf.at/news.xml.
In manifoldcf.log the following line appears:
org.apache.manifoldcf.crawler.connectors.rss.RSSConnector$OuterContextClass DEBUG 2017-03-23 14:29:54,718 (Worker thread '1') - RSS: RSS document 'http://rss.orf.at/news.xml' does not have rss, feed, or rdf:RDF tag - not valid feed
I tried the following change in RSSConnector (on branch release-2.6-branch) and now the feed is crawled.
It is maybe a bug in the RSSConnector.
Kind Regards,
Joachim
--- a/connectors/rss/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/rss/RSSConnector.java
+++ b/connectors/rss/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/rss/RSSConnector.java
@@ -3311,7 +3311,7 @@ public class RSSConnector extends org.apache.manifoldcf.crawler.connectors.BaseR
Logging.connectors.debug("RSS: Parsed bottom-level XML for RSS document '"+documentIdentifier+"'");
return new RSSContextClass(theStream,namespace,localName,qName,atts,documentIdentifier,activities,filter);
}
- else if (localName.equals("RDF"))
+ else if (localName.toUpperCase().equals("RDF"))
{
// RDF/Atom feed detected
outerTagCount++;
@@ -3345,7 +3345,7 @@ public class RSSConnector extends org.apache.manifoldcf.crawler.connectors.BaseR
{
rescanTimeSet = ((RSSContextClass)context).process();
}
- else if (tagName.equals("RDF"))
+ else if (tagName.toUpperCase().equals("RDF"))
{
rescanTimeSet = ((RDFContextClass)context).process();
}
_______________________________________________
Dipl.-Ing. Joachim Butz
Softwareentwickler
HC SOLUTIONS GesmbH
A - 4030 Linz, Dauphinestraße 5
Telefon: +43 (0)732 / 9394 0
Mobil:
Fax: +43 (0)732 / 9394 800
E-Mail: Joachim.Butz@hcsolutions.at
Home: http://www.hcsolutions.at/
http://www.tomo-base.at/
Firmenbuchnummer: FN 115314 F
Firmenbuchgericht: Landesgericht Linz
Rechtsform: GesmbH
UID-Nr. ATU 36898407
_______________________________________________