You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@camel.apache.org by Cliff Court <cl...@vine.co.za> on 2009/11/05 15:02:17 UTC

Problem of skipped items/stories in Camel RSS reader

Good Day

We have a system which makes use of the Camel RSS reader. We are getting
intermittent problems with it whereby not all items/stories within an RSS
feed are being read into the Camel system. 

Unfortunately I am unable to isolate a repeatable case to describe but I was
wondering is anyone else has had a similar experience and if you can advise
any solutions?

Related to this, we are also have a problem where, if we update the
publication date of items within the RSS feed, these are also not read. The
component documentation refers to configuration such that only 'new' stories
are read, but it does not specify what property make a story 'new'. We have
assumed it is the publication date but it would be very useful to get
clarity on this?

Many thanks
Cliff Court

-- 
View this message in context: http://old.nabble.com/Problem-of-skipped-items-stories-in-Camel-RSS-reader-tp26214545p26214545.html
Sent from the Camel - Users mailing list archive at Nabble.com.


Re: Problem of skipped items/stories in Camel RSS reader

Posted by Jon Anstey <ja...@gmail.com>.
Yeah, I considered using Apache Abdera to do the RSS support under the hood
but at the time it was just a bit of code in a sandbox. Now, its still not
released anywhere :) I guess for now we'll just stick with ROME. Though,
maybe in the future we can consider Abdera if it is better.

On Thu, Nov 5, 2009 at 11:21 AM, Claus Ibsen <cl...@gmail.com> wrote:

> Hi
>
> I wonder if that ROME library is maintained anymore. Seems that 1.0 is
> the last release which is also fairly old.
>
> What about that Apache ATOM component. Can't it read RSS feeds also?
>
>
> On Thu, Nov 5, 2009 at 3:46 PM, Cliff Court <cl...@vine.co.za> wrote:
> >
> >
> > Hi Jon
> >
> > Many thanks for the very prompt reply.
> >
> > The RSS feed we have been testing quite a bit is
> > http://www.iafrica.com/pls/cms/grapevine.xml?p_section=world_news
> >
> > We did check it with a few RSS validators just to be sure it was valid
> RSS,
> > and it seems to be. That said, we are seeing this with all RSS feeds
> we're
> > testing.
> >
> > Of course the RSS reader component may not be the issue, perhaps it's the
> > parser that places the feed items onto the ActiveMQ queues, but I thought
> > I'd just ask the community if they seen this behavior generally
> >
> > Many thanks again.
> > Cliff
> >
> >
> > janstey wrote:
> >>
> >> On Thu, Nov 5, 2009 at 10:32 AM, Cliff Court <cl...@vine.co.za> wrote:
> >>
> >>>
> >>> Good Day
> >>>
> >>> We have a system which makes use of the Camel RSS reader. We are
> getting
> >>> intermittent problems with it whereby not all items/stories within an
> RSS
> >>> feed are being read into the Camel system.
> >>>
> >>> Unfortunately I am unable to isolate a repeatable case to describe but
> I
> >>> was
> >>> wondering is anyone else has had a similar experience and if you can
> >>> advise
> >>> any solutions?
> >>>
> >>
> >> I haven't experienced this myself but it sounds pretty bad... are there
> >> any
> >> more details you can provide of your usecase? Like, are you consuming
> any
> >> public RSS feeds that we can try out?
> >>
> >>
> >>>
> >>> Related to this, we are also have a problem where, if we update the
> >>> publication date of items within the RSS feed, these are also not read.
> >>> The
> >>> component documentation refers to configuration such that only 'new'
> >>> stories
> >>> are read, but it does not specify what property make a story 'new'. We
> >>> have
> >>> assumed it is the publication date but it would be very useful to get
> >>> clarity on this?
> >>>
> >>
> >> To see if a story is new Camel first checks the updated date, if there
> is
> >> no
> >> updated date (i.e. its the first post) then the publication date is
> used.
> >>
> >>
> >>>
> >>> Many thanks
> >>> Cliff Court
> >>>
> >>> --
> >>> View this message in context:
> >>>
> http://old.nabble.com/Problem-of-skipped-items-stories-in-Camel-RSS-reader-tp26214545p26214545.html
> >>> Sent from the Camel - Users mailing list archive at Nabble.com.
> >>>
> >>>
> >>
> >>
> >> --
> >> Cheers,
> >> Jon
> >>
> >> Camel in Action: http://manning.com/ibsen
> >> Blog: http://janstey.blogspot.com
> >>
> >>
> >
> > --
> > View this message in context:
> http://old.nabble.com/Problem-of-skipped-items-stories-in-Camel-RSS-reader-tp26214545p26215729.html
> > Sent from the Camel - Users mailing list archive at Nabble.com.
> >
> >
>
>
>
> --
> Claus Ibsen
> Apache Camel Committer
>
> Author of Camel in Action: http://www.manning.com/ibsen/
> Open Source Integration: http://fusesource.com
> Blog: http://davsclaus.blogspot.com/
> Twitter: http://twitter.com/davsclaus
>



-- 
Cheers,
Jon

Camel in Action: http://manning.com/ibsen
Blog: http://janstey.blogspot.com

Re: Problem of skipped items/stories in Camel RSS reader

Posted by Claus Ibsen <cl...@gmail.com>.
Hi

I wonder if that ROME library is maintained anymore. Seems that 1.0 is
the last release which is also fairly old.

What about that Apache ATOM component. Can't it read RSS feeds also?


On Thu, Nov 5, 2009 at 3:46 PM, Cliff Court <cl...@vine.co.za> wrote:
>
>
> Hi Jon
>
> Many thanks for the very prompt reply.
>
> The RSS feed we have been testing quite a bit is
> http://www.iafrica.com/pls/cms/grapevine.xml?p_section=world_news
>
> We did check it with a few RSS validators just to be sure it was valid RSS,
> and it seems to be. That said, we are seeing this with all RSS feeds we're
> testing.
>
> Of course the RSS reader component may not be the issue, perhaps it's the
> parser that places the feed items onto the ActiveMQ queues, but I thought
> I'd just ask the community if they seen this behavior generally
>
> Many thanks again.
> Cliff
>
>
> janstey wrote:
>>
>> On Thu, Nov 5, 2009 at 10:32 AM, Cliff Court <cl...@vine.co.za> wrote:
>>
>>>
>>> Good Day
>>>
>>> We have a system which makes use of the Camel RSS reader. We are getting
>>> intermittent problems with it whereby not all items/stories within an RSS
>>> feed are being read into the Camel system.
>>>
>>> Unfortunately I am unable to isolate a repeatable case to describe but I
>>> was
>>> wondering is anyone else has had a similar experience and if you can
>>> advise
>>> any solutions?
>>>
>>
>> I haven't experienced this myself but it sounds pretty bad... are there
>> any
>> more details you can provide of your usecase? Like, are you consuming any
>> public RSS feeds that we can try out?
>>
>>
>>>
>>> Related to this, we are also have a problem where, if we update the
>>> publication date of items within the RSS feed, these are also not read.
>>> The
>>> component documentation refers to configuration such that only 'new'
>>> stories
>>> are read, but it does not specify what property make a story 'new'. We
>>> have
>>> assumed it is the publication date but it would be very useful to get
>>> clarity on this?
>>>
>>
>> To see if a story is new Camel first checks the updated date, if there is
>> no
>> updated date (i.e. its the first post) then the publication date is used.
>>
>>
>>>
>>> Many thanks
>>> Cliff Court
>>>
>>> --
>>> View this message in context:
>>> http://old.nabble.com/Problem-of-skipped-items-stories-in-Camel-RSS-reader-tp26214545p26214545.html
>>> Sent from the Camel - Users mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>> --
>> Cheers,
>> Jon
>>
>> Camel in Action: http://manning.com/ibsen
>> Blog: http://janstey.blogspot.com
>>
>>
>
> --
> View this message in context: http://old.nabble.com/Problem-of-skipped-items-stories-in-Camel-RSS-reader-tp26214545p26215729.html
> Sent from the Camel - Users mailing list archive at Nabble.com.
>
>



-- 
Claus Ibsen
Apache Camel Committer

Author of Camel in Action: http://www.manning.com/ibsen/
Open Source Integration: http://fusesource.com
Blog: http://davsclaus.blogspot.com/
Twitter: http://twitter.com/davsclaus

Re: Problem of skipped items/stories in Camel RSS reader

Posted by Cliff Court <cl...@vine.co.za>.

Hi Jon

Many thanks for the very prompt reply.

The RSS feed we have been testing quite a bit is
http://www.iafrica.com/pls/cms/grapevine.xml?p_section=world_news

We did check it with a few RSS validators just to be sure it was valid RSS,
and it seems to be. That said, we are seeing this with all RSS feeds we're
testing.

Of course the RSS reader component may not be the issue, perhaps it's the
parser that places the feed items onto the ActiveMQ queues, but I thought
I'd just ask the community if they seen this behavior generally

Many thanks again.
Cliff


janstey wrote:
> 
> On Thu, Nov 5, 2009 at 10:32 AM, Cliff Court <cl...@vine.co.za> wrote:
> 
>>
>> Good Day
>>
>> We have a system which makes use of the Camel RSS reader. We are getting
>> intermittent problems with it whereby not all items/stories within an RSS
>> feed are being read into the Camel system.
>>
>> Unfortunately I am unable to isolate a repeatable case to describe but I
>> was
>> wondering is anyone else has had a similar experience and if you can
>> advise
>> any solutions?
>>
> 
> I haven't experienced this myself but it sounds pretty bad... are there
> any
> more details you can provide of your usecase? Like, are you consuming any
> public RSS feeds that we can try out?
> 
> 
>>
>> Related to this, we are also have a problem where, if we update the
>> publication date of items within the RSS feed, these are also not read.
>> The
>> component documentation refers to configuration such that only 'new'
>> stories
>> are read, but it does not specify what property make a story 'new'. We
>> have
>> assumed it is the publication date but it would be very useful to get
>> clarity on this?
>>
> 
> To see if a story is new Camel first checks the updated date, if there is
> no
> updated date (i.e. its the first post) then the publication date is used.
> 
> 
>>
>> Many thanks
>> Cliff Court
>>
>> --
>> View this message in context:
>> http://old.nabble.com/Problem-of-skipped-items-stories-in-Camel-RSS-reader-tp26214545p26214545.html
>> Sent from the Camel - Users mailing list archive at Nabble.com.
>>
>>
> 
> 
> -- 
> Cheers,
> Jon
> 
> Camel in Action: http://manning.com/ibsen
> Blog: http://janstey.blogspot.com
> 
> 

-- 
View this message in context: http://old.nabble.com/Problem-of-skipped-items-stories-in-Camel-RSS-reader-tp26214545p26215729.html
Sent from the Camel - Users mailing list archive at Nabble.com.


Re: Problem of skipped items/stories in Camel RSS reader

Posted by Jon Anstey <ja...@gmail.com>.
On Thu, Nov 5, 2009 at 10:32 AM, Cliff Court <cl...@vine.co.za> wrote:

>
> Good Day
>
> We have a system which makes use of the Camel RSS reader. We are getting
> intermittent problems with it whereby not all items/stories within an RSS
> feed are being read into the Camel system.
>
> Unfortunately I am unable to isolate a repeatable case to describe but I
> was
> wondering is anyone else has had a similar experience and if you can advise
> any solutions?
>

I haven't experienced this myself but it sounds pretty bad... are there any
more details you can provide of your usecase? Like, are you consuming any
public RSS feeds that we can try out?


>
> Related to this, we are also have a problem where, if we update the
> publication date of items within the RSS feed, these are also not read. The
> component documentation refers to configuration such that only 'new'
> stories
> are read, but it does not specify what property make a story 'new'. We have
> assumed it is the publication date but it would be very useful to get
> clarity on this?
>

To see if a story is new Camel first checks the updated date, if there is no
updated date (i.e. its the first post) then the publication date is used.


>
> Many thanks
> Cliff Court
>
> --
> View this message in context:
> http://old.nabble.com/Problem-of-skipped-items-stories-in-Camel-RSS-reader-tp26214545p26214545.html
> Sent from the Camel - Users mailing list archive at Nabble.com.
>
>


-- 
Cheers,
Jon

Camel in Action: http://manning.com/ibsen
Blog: http://janstey.blogspot.com

Re: Problem of skipped items/stories in Camel RSS reader

Posted by Claus Ibsen <cl...@gmail.com>.
On Thu, Nov 5, 2009 at 3:02 PM, Cliff Court <cl...@vine.co.za> wrote:
>
> Good Day
>
> We have a system which makes use of the Camel RSS reader. We are getting
> intermittent problems with it whereby not all items/stories within an RSS
> feed are being read into the Camel system.
>
> Unfortunately I am unable to isolate a repeatable case to describe but I was
> wondering is anyone else has had a similar experience and if you can advise
> any solutions?
>
> Related to this, we are also have a problem where, if we update the
> publication date of items within the RSS feed, these are also not read. The
> component documentation refers to configuration such that only 'new' stories
> are read, but it does not specify what property make a story 'new'. We have
> assumed it is the publication date but it would be very useful to get
> clarity on this?
>

I think its the lastUpdated option. Check out the source code for camel-rss.

Maybe it needs a better new/updated detection, or an option to turn
that off so yourself can filter yourself.


> Many thanks
> Cliff Court
>
> --
> View this message in context: http://old.nabble.com/Problem-of-skipped-items-stories-in-Camel-RSS-reader-tp26214545p26214545.html
> Sent from the Camel - Users mailing list archive at Nabble.com.
>
>



-- 
Claus Ibsen
Apache Camel Committer

Author of Camel in Action: http://www.manning.com/ibsen/
Open Source Integration: http://fusesource.com
Blog: http://davsclaus.blogspot.com/
Twitter: http://twitter.com/davsclaus

Re: Problem of skipped items/stories in Camel RSS reader - update

Posted by Claus Ibsen <cl...@gmail.com>.
Hi

Thanks for the findings.

Maybe you can save the feed which have the feeds ordered in non
publication order.
And we can use that for unit testing.

Feel free to anym the feed beforehand.

And we love contributions so if you want to take a stab to see if you
can resolve this please go ahead.
http://camel.apache.org/contributing.html

On Mon, Nov 9, 2009 at 11:13 AM, Cliff Court <cl...@vine.co.za> wrote:
>
> Many thanks to those who have commented on my initial post.
>
> Here's a bit of an update on what we've found with some additional
> investigation on the behavior of the RSS component.
>
> It appears that if you configure it to pick up one item/story at a time (as
> opposed to the entire feed), it will do this until it reaches a feed that is
> earlier than the one it previously read. This means that if you have a RSS
> feed whose stories' publication dates are not sorted in publication date
> order (descending) i.e. with the latest story at the top of the feed, the
> component will stop reading the feed as soon as it hits a story that is
> earlier than the last one read by the component.
>
> I can't swear that this is the case, but our tests are showing this. We've
> looked at several feeds now and have not yet found one that is in date order
> descending - thus the problem.
>
> One can configure the RSS reader to read the entire feed at one time, but
> there is a problem of some kind with that configuration that is causing an
> exception of some kind - but this issue is not for this topic.
>
> Thanks
> Cliff
>
>
>
> Cliff Court wrote:
>>
>> Good Day
>>
>> We have a system which makes use of the Camel RSS reader. We are getting
>> intermittent problems with it whereby not all items/stories within an RSS
>> feed are being read into the Camel system.
>>
>> Unfortunately I am unable to isolate a repeatable case to describe but I
>> was wondering is anyone else has had a similar experience and if you can
>> advise any solutions?
>>
>> Related to this, we are also have a problem where, if we update the
>> publication date of items within the RSS feed, these are also not read.
>> The component documentation refers to configuration such that only 'new'
>> stories are read, but it does not specify what property make a story
>> 'new'. We have assumed it is the publication date but it would be very
>> useful to get clarity on this?
>>
>> Many thanks
>> Cliff Court
>>
>>
>
> --
> View this message in context: http://old.nabble.com/Problem-of-skipped-items-stories-in-Camel-RSS-reader-tp26214545p26263881.html
> Sent from the Camel - Users mailing list archive at Nabble.com.
>
>



-- 
Claus Ibsen
Apache Camel Committer

Author of Camel in Action: http://www.manning.com/ibsen/
Open Source Integration: http://fusesource.com
Blog: http://davsclaus.blogspot.com/
Twitter: http://twitter.com/davsclaus

Re: Problem of skipped items/stories in Camel RSS reader - update

Posted by Cliff Court <cl...@vine.co.za>.
Many thanks to those who have commented on my initial post.

Here's a bit of an update on what we've found with some additional
investigation on the behavior of the RSS component.

It appears that if you configure it to pick up one item/story at a time (as
opposed to the entire feed), it will do this until it reaches a feed that is
earlier than the one it previously read. This means that if you have a RSS
feed whose stories' publication dates are not sorted in publication date
order (descending) i.e. with the latest story at the top of the feed, the
component will stop reading the feed as soon as it hits a story that is
earlier than the last one read by the component.

I can't swear that this is the case, but our tests are showing this. We've
looked at several feeds now and have not yet found one that is in date order
descending - thus the problem.

One can configure the RSS reader to read the entire feed at one time, but
there is a problem of some kind with that configuration that is causing an
exception of some kind - but this issue is not for this topic.

Thanks
Cliff



Cliff Court wrote:
> 
> Good Day
> 
> We have a system which makes use of the Camel RSS reader. We are getting
> intermittent problems with it whereby not all items/stories within an RSS
> feed are being read into the Camel system. 
> 
> Unfortunately I am unable to isolate a repeatable case to describe but I
> was wondering is anyone else has had a similar experience and if you can
> advise any solutions?
> 
> Related to this, we are also have a problem where, if we update the
> publication date of items within the RSS feed, these are also not read.
> The component documentation refers to configuration such that only 'new'
> stories are read, but it does not specify what property make a story
> 'new'. We have assumed it is the publication date but it would be very
> useful to get clarity on this?
> 
> Many thanks
> Cliff Court
> 
> 

-- 
View this message in context: http://old.nabble.com/Problem-of-skipped-items-stories-in-Camel-RSS-reader-tp26214545p26263881.html
Sent from the Camel - Users mailing list archive at Nabble.com.