You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@manifoldcf.apache.org by Rene Nederhand <re...@nederhand.net> on 2014/09/29 22:54:46 UTC

Transformation Connectors with RSS-feeds

Hi All,

I am experimenting with metadata and Tika transformation connectors.These
connectors work fine when indexing individual documents, but not when using
RSS as repository connection.

When I use an RSS feed, the full feed is parsed by Tika as being one
document. Since, normally ManifoldCF will process each individual item (url
to document) separately, this is not the behaviour as I expected.

Is there a way to tell the transformation connector to proces each item of
the rss-feed?

Thanks a lot in advance,

Rene Nederhand

Re: Transformation Connectors with RSS-feeds

Posted by Karl Wright <da...@gmail.com>.
Hi Rene,

The only way that the RSS connector would operate in this way is if the
Content-Type of the feed was not something that the connector recognized as
being a feed.  It's not a function of whether you have Tika etc involved.

Can you use curl and find out what the Content-Type header actually is?

Thanks,
Karl


On Mon, Sep 29, 2014 at 4:54 PM, Rene Nederhand <re...@nederhand.net> wrote:

> Hi All,
>
> I am experimenting with metadata and Tika transformation connectors.These
> connectors work fine when indexing individual documents, but not when using
> RSS as repository connection.
>
> When I use an RSS feed, the full feed is parsed by Tika as being one
> document. Since, normally ManifoldCF will process each individual item (url
> to document) separately, this is not the behaviour as I expected.
>
> Is there a way to tell the transformation connector to proces each item of
> the rss-feed?
>
> Thanks a lot in advance,
>
> Rene Nederhand
>