You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by "DK (Jira)" <ji...@apache.org> on 2022/01/24 21:05:00 UTC

[jira] [Commented] (CONNECTORS-1695) Sitemap xml not detected in version 2.17 webconnector

    [ https://issues.apache.org/jira/browse/CONNECTORS-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17481393#comment-17481393 ] 

DK commented on CONNECTORS-1695:
--------------------------------

Server returns valid sitemap xml and with mime type text/xml as mime type. As per another defect, it is in 'interestingMimeType' and should be supported. I also exclude it in solr output connector. But, I just get an error in job history indicating text/xml is restricted and web connector is still trying to process sitemap.xm as one full xml file.

Appreciate any pointers or help fixing it.

> Sitemap xml not detected in version 2.17 webconnector
> -----------------------------------------------------
>
>                 Key: CONNECTORS-1695
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1695
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: Web connector
>    Affects Versions: ManifoldCF 2.17
>            Reporter: DK
>            Priority: Major
>
> Trying to index sitemap xml and web connector index the whole xml into solr.
> Please fix in version 2.17.
> If it is any special config that needs to be taken care, please add here and add in documentation to make it clear.
>  
> Sitemap.xml:
> <sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
> <sitemap>
> <loc>https://<url>/sitemap_1.xml</loc>
> <lastmod>2022-01-21T16:04:45Z</lastmod>
> </sitemap>
> </sitemapindex>
>  
> sitemap_1.xml:
> <urlset>
> <url>
> <loc>https://<docurl></loc>
> <lastmod>2018-10-31T11:25:27Z</lastmod>
> </url>
> </urlset>



--
This message was sent by Atlassian Jira
(v8.20.1#820001)