You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@camel.apache.org by Michael Rambichler <mi...@rambichler.at> on 2022/11/24 13:36:44 UTC

Streamcaching and spooling

Hi all,

with camel 3.18. stream caching is enabled by default. That's good.

But we have realized that spooling is disabled by default.
IMHO this can lead to unwanted OOM situations. Because from now on (>=
3.18.x) a big stream will be cached by default in memory.
Or am I wrong?

I would strongly suggest that we set the default of spoolEnabled also to
true.

What do you think?

BR
 Michael

Re: Streamcaching and spooling

Posted by ski n <ra...@gmail.com>.
In addition to Simo I also encountered sometimes FileNotFoundException and
other I think disk related issues like:

org.apache.camel.RuntimeCamelException: Cannot reset stream from file
tmp/camelcontext-ID_5ffdbadf852c8f0010034cb9/cos4845255624781072102.tmp

What maybe also is possible is to set it by default to true, but that
there are some checks like if there is enough space (for example there
must be 10% available) and the correct file permissions. If they are
not the case then print a warning (instead of failing) that spooling
is set to false with the reason why this is the case. I think ActiveMQ
Artemis uses this kind of check for its message store.

Raymond



On Fri, Nov 25, 2022 at 11:40 AM Simo Hakanen <si...@axelhealth.com>
wrote:

> On Fri, Nov 25, 2022 at 11:09 AM Michael Rambichler <michael@rambichler.at
> >
> wrote:
>
> > Hi Claus, Bakab
> >
> > Thanks for your feedback.
> > The argument with the explicit decision is valid. But on the other hand
> > IMHO its better that the route fails immediately then running
> uncontrolled
> > OOM later in production.
> >
> > If you agree with me I will make the JIRA/Pull Request for setting the
> > default to true.
> >
> > @Claus. We have some routes with more then 1MB XML messages which has to
> be
> > streamed and parsed.
> >
> > BR
> >  Michael
> >
> > Am Do., 24. Nov. 2022 um 17:11 Uhr schrieb Babak Vahdat
> > <ba...@swissonline.ch.invalid>:
> >
> > >
> > >
> > > > On 24 Nov 2022, at 16:48, Claus Ibsen <cl...@gmail.com> wrote:
> > > >
> > > > On Thu, Nov 24, 2022 at 3:27 PM Babak Vahdat
> > > > <ba...@swissonline.ch.invalid> wrote:
> > > >
> > > >> Hi
> > > >>
> > > >> I remember there was a rational behind that:
> > > >>
> > >
> > > Sorry I made a tiny typo above, I meant “rationale” and not “rational”.
> > >
> > > Babak
> > >
> > > >> https://issues.apache.org/jira/browse/CAMEL-18098 <
> > > >> https://issues.apache.org/jira/browse/CAMEL-18098>
> > > >>
> > > >> "As spooling to disk requires that the volume have space and the
> user
> > > has
> > > >> permission to write to disk etc. For container workloads this is not
> > > always
> > > >> the case."
> > > >>
> > > >> So better would be to explicitly enable if *really* required for
> large
> > > >> data streams which is not always the case?
> > > >>
> > > > Babak
> > > >>
> > > >
> > > >
> > > > Ah yeah, good point Babak. Users that work with large streams needs
> to
> > > > configure for such use-cases.
> > > > You may also want to configure which temporary directory to use for
> > > > spooling, and what byte limit to overflow etc.
> > > >
> > > >
> > > >
> > > >>
> > > >>> On 24 Nov 2022, at 14:59, Claus Ibsen <cl...@gmail.com>
> wrote:
> > > >>>
> > > >>> Hi
> > > >>>
> > > >>> Yeah that is a good idea to overflow to disk.
> > > >>> You are welcome to create a JIRA and send a PR against main branch.
> > > >>>
> > > >>> How big streams are you processing btw ?
> > > >>>
> > > >>> On Thu, Nov 24, 2022 at 2:37 PM Michael Rambichler <
> > > >> michael@rambichler.at>
> > > >>> wrote:
> > > >>>
> > > >>>> Hi all,
> > > >>>>
> > > >>>> with camel 3.18. stream caching is enabled by default. That's
> good.
> > > >>>>
> > > >>>> But we have realized that spooling is disabled by default.
> > > >>>> IMHO this can lead to unwanted OOM situations. Because from now on
> > (>=
> > > >>>> 3.18.x) a big stream will be cached by default in memory.
> > > >>>> Or am I wrong?
> > > >>>>
> > > >>>> I would strongly suggest that we set the default of spoolEnabled
> > also
> > > to
> > > >>>> true.
> > > >>>>
> > > >>>> What do you think?
> > > >>>>
> > > >>>> BR
> > > >>>> Michael
> > > >>>>
> > > >>>
> > > >>>
> > > >>> --
> > > >>> Claus Ibsen
> > > >>> -----------------
> > > >>> @davsclaus
> > > >>> Camel in Action 2: https://www.manning.com/ibsen2
> > > >>
> > > >>
> > > >
> > > > --
> > > > Claus Ibsen
> > > > -----------------
> > > > @davsclaus
> > > > Camel in Action 2: https://www.manning.com/ibsen2
> > >
> > >
> >
>
> Hi all
>
> Not to hijack the thread but to point out that there's possible problems
> with streamcaching and spooling, related to InOut-routes.
> When UnitOfWork ends, the spooled file will be deleted and that happens
> before the Out-response is created, leading to
> FileNotFoundException. So defaulting spooling enabled might lead to
> unexpected problems elsewhere.
>
> --
> Simo Hakanen
>

Re: Streamcaching and spooling

Posted by Simo Hakanen <si...@axelhealth.com>.
On Fri, Nov 25, 2022 at 11:09 AM Michael Rambichler <mi...@rambichler.at>
wrote:

> Hi Claus, Bakab
>
> Thanks for your feedback.
> The argument with the explicit decision is valid. But on the other hand
> IMHO its better that the route fails immediately then running uncontrolled
> OOM later in production.
>
> If you agree with me I will make the JIRA/Pull Request for setting the
> default to true.
>
> @Claus. We have some routes with more then 1MB XML messages which has to be
> streamed and parsed.
>
> BR
>  Michael
>
> Am Do., 24. Nov. 2022 um 17:11 Uhr schrieb Babak Vahdat
> <ba...@swissonline.ch.invalid>:
>
> >
> >
> > > On 24 Nov 2022, at 16:48, Claus Ibsen <cl...@gmail.com> wrote:
> > >
> > > On Thu, Nov 24, 2022 at 3:27 PM Babak Vahdat
> > > <ba...@swissonline.ch.invalid> wrote:
> > >
> > >> Hi
> > >>
> > >> I remember there was a rational behind that:
> > >>
> >
> > Sorry I made a tiny typo above, I meant “rationale” and not “rational”.
> >
> > Babak
> >
> > >> https://issues.apache.org/jira/browse/CAMEL-18098 <
> > >> https://issues.apache.org/jira/browse/CAMEL-18098>
> > >>
> > >> "As spooling to disk requires that the volume have space and the user
> > has
> > >> permission to write to disk etc. For container workloads this is not
> > always
> > >> the case."
> > >>
> > >> So better would be to explicitly enable if *really* required for large
> > >> data streams which is not always the case?
> > >>
> > > Babak
> > >>
> > >
> > >
> > > Ah yeah, good point Babak. Users that work with large streams needs to
> > > configure for such use-cases.
> > > You may also want to configure which temporary directory to use for
> > > spooling, and what byte limit to overflow etc.
> > >
> > >
> > >
> > >>
> > >>> On 24 Nov 2022, at 14:59, Claus Ibsen <cl...@gmail.com> wrote:
> > >>>
> > >>> Hi
> > >>>
> > >>> Yeah that is a good idea to overflow to disk.
> > >>> You are welcome to create a JIRA and send a PR against main branch.
> > >>>
> > >>> How big streams are you processing btw ?
> > >>>
> > >>> On Thu, Nov 24, 2022 at 2:37 PM Michael Rambichler <
> > >> michael@rambichler.at>
> > >>> wrote:
> > >>>
> > >>>> Hi all,
> > >>>>
> > >>>> with camel 3.18. stream caching is enabled by default. That's good.
> > >>>>
> > >>>> But we have realized that spooling is disabled by default.
> > >>>> IMHO this can lead to unwanted OOM situations. Because from now on
> (>=
> > >>>> 3.18.x) a big stream will be cached by default in memory.
> > >>>> Or am I wrong?
> > >>>>
> > >>>> I would strongly suggest that we set the default of spoolEnabled
> also
> > to
> > >>>> true.
> > >>>>
> > >>>> What do you think?
> > >>>>
> > >>>> BR
> > >>>> Michael
> > >>>>
> > >>>
> > >>>
> > >>> --
> > >>> Claus Ibsen
> > >>> -----------------
> > >>> @davsclaus
> > >>> Camel in Action 2: https://www.manning.com/ibsen2
> > >>
> > >>
> > >
> > > --
> > > Claus Ibsen
> > > -----------------
> > > @davsclaus
> > > Camel in Action 2: https://www.manning.com/ibsen2
> >
> >
>

Hi all

Not to hijack the thread but to point out that there's possible problems
with streamcaching and spooling, related to InOut-routes.
When UnitOfWork ends, the spooled file will be deleted and that happens
before the Out-response is created, leading to
FileNotFoundException. So defaulting spooling enabled might lead to
unexpected problems elsewhere.

-- 
Simo Hakanen

Re: Streamcaching and spooling

Posted by Michael Rambichler <mi...@rambichler.at>.
Hi Claus, Bakab

Thanks for your feedback.
The argument with the explicit decision is valid. But on the other hand
IMHO its better that the route fails immediately then running uncontrolled
OOM later in production.

If you agree with me I will make the JIRA/Pull Request for setting the
default to true.

@Claus. We have some routes with more then 1MB XML messages which has to be
streamed and parsed.

BR
 Michael

Am Do., 24. Nov. 2022 um 17:11 Uhr schrieb Babak Vahdat
<ba...@swissonline.ch.invalid>:

>
>
> > On 24 Nov 2022, at 16:48, Claus Ibsen <cl...@gmail.com> wrote:
> >
> > On Thu, Nov 24, 2022 at 3:27 PM Babak Vahdat
> > <ba...@swissonline.ch.invalid> wrote:
> >
> >> Hi
> >>
> >> I remember there was a rational behind that:
> >>
>
> Sorry I made a tiny typo above, I meant “rationale” and not “rational”.
>
> Babak
>
> >> https://issues.apache.org/jira/browse/CAMEL-18098 <
> >> https://issues.apache.org/jira/browse/CAMEL-18098>
> >>
> >> "As spooling to disk requires that the volume have space and the user
> has
> >> permission to write to disk etc. For container workloads this is not
> always
> >> the case."
> >>
> >> So better would be to explicitly enable if *really* required for large
> >> data streams which is not always the case?
> >>
> > Babak
> >>
> >
> >
> > Ah yeah, good point Babak. Users that work with large streams needs to
> > configure for such use-cases.
> > You may also want to configure which temporary directory to use for
> > spooling, and what byte limit to overflow etc.
> >
> >
> >
> >>
> >>> On 24 Nov 2022, at 14:59, Claus Ibsen <cl...@gmail.com> wrote:
> >>>
> >>> Hi
> >>>
> >>> Yeah that is a good idea to overflow to disk.
> >>> You are welcome to create a JIRA and send a PR against main branch.
> >>>
> >>> How big streams are you processing btw ?
> >>>
> >>> On Thu, Nov 24, 2022 at 2:37 PM Michael Rambichler <
> >> michael@rambichler.at>
> >>> wrote:
> >>>
> >>>> Hi all,
> >>>>
> >>>> with camel 3.18. stream caching is enabled by default. That's good.
> >>>>
> >>>> But we have realized that spooling is disabled by default.
> >>>> IMHO this can lead to unwanted OOM situations. Because from now on (>=
> >>>> 3.18.x) a big stream will be cached by default in memory.
> >>>> Or am I wrong?
> >>>>
> >>>> I would strongly suggest that we set the default of spoolEnabled also
> to
> >>>> true.
> >>>>
> >>>> What do you think?
> >>>>
> >>>> BR
> >>>> Michael
> >>>>
> >>>
> >>>
> >>> --
> >>> Claus Ibsen
> >>> -----------------
> >>> @davsclaus
> >>> Camel in Action 2: https://www.manning.com/ibsen2
> >>
> >>
> >
> > --
> > Claus Ibsen
> > -----------------
> > @davsclaus
> > Camel in Action 2: https://www.manning.com/ibsen2
>
>

Re: Streamcaching and spooling

Posted by Babak Vahdat <ba...@swissonline.ch.INVALID>.

> On 24 Nov 2022, at 16:48, Claus Ibsen <cl...@gmail.com> wrote:
> 
> On Thu, Nov 24, 2022 at 3:27 PM Babak Vahdat
> <ba...@swissonline.ch.invalid> wrote:
> 
>> Hi
>> 
>> I remember there was a rational behind that:
>> 

Sorry I made a tiny typo above, I meant “rationale” and not “rational”. 

Babak

>> https://issues.apache.org/jira/browse/CAMEL-18098 <
>> https://issues.apache.org/jira/browse/CAMEL-18098>
>> 
>> "As spooling to disk requires that the volume have space and the user has
>> permission to write to disk etc. For container workloads this is not always
>> the case."
>> 
>> So better would be to explicitly enable if *really* required for large
>> data streams which is not always the case?
>> 
> Babak
>> 
> 
> 
> Ah yeah, good point Babak. Users that work with large streams needs to
> configure for such use-cases.
> You may also want to configure which temporary directory to use for
> spooling, and what byte limit to overflow etc.
> 
> 
> 
>> 
>>> On 24 Nov 2022, at 14:59, Claus Ibsen <cl...@gmail.com> wrote:
>>> 
>>> Hi
>>> 
>>> Yeah that is a good idea to overflow to disk.
>>> You are welcome to create a JIRA and send a PR against main branch.
>>> 
>>> How big streams are you processing btw ?
>>> 
>>> On Thu, Nov 24, 2022 at 2:37 PM Michael Rambichler <
>> michael@rambichler.at>
>>> wrote:
>>> 
>>>> Hi all,
>>>> 
>>>> with camel 3.18. stream caching is enabled by default. That's good.
>>>> 
>>>> But we have realized that spooling is disabled by default.
>>>> IMHO this can lead to unwanted OOM situations. Because from now on (>=
>>>> 3.18.x) a big stream will be cached by default in memory.
>>>> Or am I wrong?
>>>> 
>>>> I would strongly suggest that we set the default of spoolEnabled also to
>>>> true.
>>>> 
>>>> What do you think?
>>>> 
>>>> BR
>>>> Michael
>>>> 
>>> 
>>> 
>>> --
>>> Claus Ibsen
>>> -----------------
>>> @davsclaus
>>> Camel in Action 2: https://www.manning.com/ibsen2
>> 
>> 
> 
> -- 
> Claus Ibsen
> -----------------
> @davsclaus
> Camel in Action 2: https://www.manning.com/ibsen2


Re: Streamcaching and spooling

Posted by Claus Ibsen <cl...@gmail.com>.
On Thu, Nov 24, 2022 at 3:27 PM Babak Vahdat
<ba...@swissonline.ch.invalid> wrote:

> Hi
>
> I remember there was a rational behind that:
>
> https://issues.apache.org/jira/browse/CAMEL-18098 <
> https://issues.apache.org/jira/browse/CAMEL-18098>
>
> "As spooling to disk requires that the volume have space and the user has
> permission to write to disk etc. For container workloads this is not always
> the case."
>
> So better would be to explicitly enable if *really* required for large
> data streams which is not always the case?
>
Babak
>


Ah yeah, good point Babak. Users that work with large streams needs to
configure for such use-cases.
You may also want to configure which temporary directory to use for
spooling, and what byte limit to overflow etc.



>
> > On 24 Nov 2022, at 14:59, Claus Ibsen <cl...@gmail.com> wrote:
> >
> > Hi
> >
> > Yeah that is a good idea to overflow to disk.
> > You are welcome to create a JIRA and send a PR against main branch.
> >
> > How big streams are you processing btw ?
> >
> > On Thu, Nov 24, 2022 at 2:37 PM Michael Rambichler <
> michael@rambichler.at>
> > wrote:
> >
> >> Hi all,
> >>
> >> with camel 3.18. stream caching is enabled by default. That's good.
> >>
> >> But we have realized that spooling is disabled by default.
> >> IMHO this can lead to unwanted OOM situations. Because from now on (>=
> >> 3.18.x) a big stream will be cached by default in memory.
> >> Or am I wrong?
> >>
> >> I would strongly suggest that we set the default of spoolEnabled also to
> >> true.
> >>
> >> What do you think?
> >>
> >> BR
> >> Michael
> >>
> >
> >
> > --
> > Claus Ibsen
> > -----------------
> > @davsclaus
> > Camel in Action 2: https://www.manning.com/ibsen2
>
>

-- 
Claus Ibsen
-----------------
@davsclaus
Camel in Action 2: https://www.manning.com/ibsen2

Re: Streamcaching and spooling

Posted by Babak Vahdat <ba...@swissonline.ch.INVALID>.
Hi

I remember there was a rational behind that:

https://issues.apache.org/jira/browse/CAMEL-18098 <https://issues.apache.org/jira/browse/CAMEL-18098>

"As spooling to disk requires that the volume have space and the user has permission to write to disk etc. For container workloads this is not always the case."

So better would be to explicitly enable if *really* required for large data streams which is not always the case?

Babak

> On 24 Nov 2022, at 14:59, Claus Ibsen <cl...@gmail.com> wrote:
> 
> Hi
> 
> Yeah that is a good idea to overflow to disk.
> You are welcome to create a JIRA and send a PR against main branch.
> 
> How big streams are you processing btw ?
> 
> On Thu, Nov 24, 2022 at 2:37 PM Michael Rambichler <mi...@rambichler.at>
> wrote:
> 
>> Hi all,
>> 
>> with camel 3.18. stream caching is enabled by default. That's good.
>> 
>> But we have realized that spooling is disabled by default.
>> IMHO this can lead to unwanted OOM situations. Because from now on (>=
>> 3.18.x) a big stream will be cached by default in memory.
>> Or am I wrong?
>> 
>> I would strongly suggest that we set the default of spoolEnabled also to
>> true.
>> 
>> What do you think?
>> 
>> BR
>> Michael
>> 
> 
> 
> -- 
> Claus Ibsen
> -----------------
> @davsclaus
> Camel in Action 2: https://www.manning.com/ibsen2


Re: Streamcaching and spooling

Posted by Claus Ibsen <cl...@gmail.com>.
Hi

Yeah that is a good idea to overflow to disk.
You are welcome to create a JIRA and send a PR against main branch.

How big streams are you processing btw ?

On Thu, Nov 24, 2022 at 2:37 PM Michael Rambichler <mi...@rambichler.at>
wrote:

> Hi all,
>
> with camel 3.18. stream caching is enabled by default. That's good.
>
> But we have realized that spooling is disabled by default.
> IMHO this can lead to unwanted OOM situations. Because from now on (>=
> 3.18.x) a big stream will be cached by default in memory.
> Or am I wrong?
>
> I would strongly suggest that we set the default of spoolEnabled also to
> true.
>
> What do you think?
>
> BR
>  Michael
>


-- 
Claus Ibsen
-----------------
@davsclaus
Camel in Action 2: https://www.manning.com/ibsen2