You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@camel.apache.org by Andrew Chandler <an...@riftware.com> on 2010/03/11 16:26:30 UTC

Question about iterators/splitters

I have a question about how camel processes iterables - I've been trying
to track what looks like runaway memory usage.   Based on thread
profiles we're not going nuts running everything in parallel so another
possibility that comes to mind is that the collection of items to be
worked on is getting pre-iterated over and messages are being queued up
in advance.   We actually need certain things to run in parallel but be
limited by the available size of a threadpool  so we might configure the
route to run 10 things in parallel but need to run a million things
through that route, just 10 at a time.      Hopefully my question is
making sense.

Re: Question about iterators/splitters

Posted by Claus Ibsen <cl...@gmail.com>.
On Fri, Mar 12, 2010 at 1:15 PM, ext2 <xu...@tongtech.com> wrote:
>
>
> Hi Andrew:
> I have run to the same problem as you. To resolve this , beside using
> streaming option. You maybe care about following things too:
>
> Because in your situation, iterator will return millions result , but
> thread pool only has  limited threads.
>
> 1) You must configured  a executor, not use the default executor.
>        Because the default executor is a ScheduledThreadPoolExecutor. The
> executor will cache the input task in it's internal queue, then use the
> thread (at max as the pool's core size,camel default is 10) to consume the
> queue.
>        So if your iterator works very faster than the following process.
> Many messages will be queued in the SchedulredThreadPoolExecutor's internal
> queue. How much message will queued, only dependent on how slowly the
> process running than the splitter-iterator runs. and un-predictable memory
> usage will occurs.
>
> 2) You must implement your self's ScheduledExecutor and configure it to
> splitter-pattern. The ScheduledExecutor should used a internal queue which
> has limited size. And the ScheduleExector will blocked while the internal
> queue is full;
>
> 3)if both 1), 2) are fulfilled, there is still another un-usual situation
> cannot satisfied by splitter-pattern; That is :
>        If millions message are processed using limited thread number, and
> the process result ask for aggregate .The million result message will cause
> memory usage problem too;
> This situation is very un-usual. If your situation ask for this, you'd
> better write a custom code.
>

>From Camel 2.3 onwards, the default created thread pools will be
cached thread pool from the JDK which is unbounded.
And it should also be easier to configure thread pool and having
thread pool configuration rules to make it even easier.

A bit details here
http://camel.apache.org/camel-23-threadpool-configuration.html

>
> ---------------------------------------------
> Andrew Chandler wrote:
>>No we aren't - our datasource objects are all implementing iterable
>>which is being passed to the splitter and processed there.  I'm about to
>>try and figure out what it will take to convert and what other
>>side-effects might be.    Thank you for the tip
>
>
>
> On Thu, 2010-03-11 at 17:20 +0100, Claus Ibsen wrote:
>
>> Hi
>>
>> Are you using the streaming option on the splitter? Then it wont
> pre-iterate.
>>
>>
>> On Thu, Mar 11, 2010 at 4:26 PM, Andrew Chandler <an...@riftware.com>
> wrote:
>> > I have a question about how camel processes iterables - I've been trying
>> > to track what looks like runaway memory usage.   Based on thread
>> > profiles we're not going nuts running everything in parallel so another
>> > possibility that comes to mind is that the collection of items to be
>> > worked on is getting pre-iterated over and messages are being queued up
>> > in advance.   We actually need certain things to run in parallel but be
>> > limited by the available size of a threadpool  so we might configure the
>> > route to run 10 things in parallel but need to run a million things
>> > through that route, just 10 at a time.      Hopefully my question is
>> > making sense.
>> >
>>
>>
>>
>
>
>
>
>



-- 
Claus Ibsen
Apache Camel Committer

Author of Camel in Action: http://www.manning.com/ibsen/
Open Source Integration: http://fusesource.com
Blog: http://davsclaus.blogspot.com/
Twitter: http://twitter.com/davsclaus

Re: Question about iterators/splitters

Posted by Andrew Chandler <an...@riftware.com>.
Thanks for the answer - we saw an immediate improvement by switching to
stream but your answer might explain why we still see perhaps more
memory usage than we want to see.  I'll experiment some more.

On Fri, 2010-03-12 at 20:15 +0800, ext2 wrote:

> 
> Hi Andrew:
> I have run to the same problem as you. To resolve this , beside using
> streaming option. You maybe care about following things too:
> 
> Because in your situation, iterator will return millions result , but
> thread pool only has  limited threads. 
> 
> 1) You must configured  a executor, not use the default executor. 
> 	Because the default executor is a ScheduledThreadPoolExecutor. The
> executor will cache the input task in it's internal queue, then use the
> thread (at max as the pool's core size,camel default is 10) to consume the
> queue.
> 	So if your iterator works very faster than the following process.
> Many messages will be queued in the SchedulredThreadPoolExecutor's internal
> queue. How much message will queued, only dependent on how slowly the
> process running than the splitter-iterator runs. and un-predictable memory
> usage will occurs.
> 
> 2) You must implement your self's ScheduledExecutor and configure it to
> splitter-pattern. The ScheduledExecutor should used a internal queue which
> has limited size. And the ScheduleExector will blocked while the internal
> queue is full;
> 
> 3)if both 1), 2) are fulfilled, there is still another un-usual situation
> cannot satisfied by splitter-pattern; That is :
> 	If millions message are processed using limited thread number, and
> the process result ask for aggregate .The million result message will cause
> memory usage problem too; 
> This situation is very un-usual. If your situation ask for this, you'd
> better write a custom code. 
> 	
> 
> ---------------------------------------------
> Andrew Chandler wrote:
> >No we aren't - our datasource objects are all implementing iterable
> >which is being passed to the splitter and processed there.  I'm about to
> >try and figure out what it will take to convert and what other
> >side-effects might be.    Thank you for the tip
> 
> 
> 
> On Thu, 2010-03-11 at 17:20 +0100, Claus Ibsen wrote:
> 
> > Hi
> > 
> > Are you using the streaming option on the splitter? Then it wont
> pre-iterate.
> > 
> > 
> > On Thu, Mar 11, 2010 at 4:26 PM, Andrew Chandler <an...@riftware.com>
> wrote:
> > > I have a question about how camel processes iterables - I've been trying
> > > to track what looks like runaway memory usage.   Based on thread
> > > profiles we're not going nuts running everything in parallel so another
> > > possibility that comes to mind is that the collection of items to be
> > > worked on is getting pre-iterated over and messages are being queued up
> > > in advance.   We actually need certain things to run in parallel but be
> > > limited by the available size of a threadpool  so we might configure the
> > > route to run 10 things in parallel but need to run a million things
> > > through that route, just 10 at a time.      Hopefully my question is
> > > making sense.
> > >
> > 
> > 
> > 
> 
> 
> 
> 



Re: Question about iterators/splitters

Posted by ext2 <xu...@tongtech.com>.

Hi Andrew:
I have run to the same problem as you. To resolve this , beside using
streaming option. You maybe care about following things too:

Because in your situation, iterator will return millions result , but
thread pool only has  limited threads. 

1) You must configured  a executor, not use the default executor. 
	Because the default executor is a ScheduledThreadPoolExecutor. The
executor will cache the input task in it's internal queue, then use the
thread (at max as the pool's core size,camel default is 10) to consume the
queue.
	So if your iterator works very faster than the following process.
Many messages will be queued in the SchedulredThreadPoolExecutor's internal
queue. How much message will queued, only dependent on how slowly the
process running than the splitter-iterator runs. and un-predictable memory
usage will occurs.

2) You must implement your self's ScheduledExecutor and configure it to
splitter-pattern. The ScheduledExecutor should used a internal queue which
has limited size. And the ScheduleExector will blocked while the internal
queue is full;

3)if both 1), 2) are fulfilled, there is still another un-usual situation
cannot satisfied by splitter-pattern; That is :
	If millions message are processed using limited thread number, and
the process result ask for aggregate .The million result message will cause
memory usage problem too; 
This situation is very un-usual. If your situation ask for this, you'd
better write a custom code. 
	

---------------------------------------------
Andrew Chandler wrote:
>No we aren't - our datasource objects are all implementing iterable
>which is being passed to the splitter and processed there.  I'm about to
>try and figure out what it will take to convert and what other
>side-effects might be.    Thank you for the tip



On Thu, 2010-03-11 at 17:20 +0100, Claus Ibsen wrote:

> Hi
> 
> Are you using the streaming option on the splitter? Then it wont
pre-iterate.
> 
> 
> On Thu, Mar 11, 2010 at 4:26 PM, Andrew Chandler <an...@riftware.com>
wrote:
> > I have a question about how camel processes iterables - I've been trying
> > to track what looks like runaway memory usage.   Based on thread
> > profiles we're not going nuts running everything in parallel so another
> > possibility that comes to mind is that the collection of items to be
> > worked on is getting pre-iterated over and messages are being queued up
> > in advance.   We actually need certain things to run in parallel but be
> > limited by the available size of a threadpool  so we might configure the
> > route to run 10 things in parallel but need to run a million things
> > through that route, just 10 at a time.      Hopefully my question is
> > making sense.
> >
> 
> 
> 





Re: Question about iterators/splitters

Posted by Andrew Chandler <an...@riftware.com>.
No we aren't - our datasource objects are all implementing iterable
which is being passed to the splitter and processed there.  I'm about to
try and figure out what it will take to convert and what other
side-effects might be.    Thank you for the tip



On Thu, 2010-03-11 at 17:20 +0100, Claus Ibsen wrote:

> Hi
> 
> Are you using the streaming option on the splitter? Then it wont pre-iterate.
> 
> 
> On Thu, Mar 11, 2010 at 4:26 PM, Andrew Chandler <an...@riftware.com> wrote:
> > I have a question about how camel processes iterables - I've been trying
> > to track what looks like runaway memory usage.   Based on thread
> > profiles we're not going nuts running everything in parallel so another
> > possibility that comes to mind is that the collection of items to be
> > worked on is getting pre-iterated over and messages are being queued up
> > in advance.   We actually need certain things to run in parallel but be
> > limited by the available size of a threadpool  so we might configure the
> > route to run 10 things in parallel but need to run a million things
> > through that route, just 10 at a time.      Hopefully my question is
> > making sense.
> >
> 
> 
> 



Re: Question about iterators/splitters

Posted by Claus Ibsen <cl...@gmail.com>.
Hi

Are you using the streaming option on the splitter? Then it wont pre-iterate.


On Thu, Mar 11, 2010 at 4:26 PM, Andrew Chandler <an...@riftware.com> wrote:
> I have a question about how camel processes iterables - I've been trying
> to track what looks like runaway memory usage.   Based on thread
> profiles we're not going nuts running everything in parallel so another
> possibility that comes to mind is that the collection of items to be
> worked on is getting pre-iterated over and messages are being queued up
> in advance.   We actually need certain things to run in parallel but be
> limited by the available size of a threadpool  so we might configure the
> route to run 10 things in parallel but need to run a million things
> through that route, just 10 at a time.      Hopefully my question is
> making sense.
>



-- 
Claus Ibsen
Apache Camel Committer

Author of Camel in Action: http://www.manning.com/ibsen/
Open Source Integration: http://fusesource.com
Blog: http://davsclaus.blogspot.com/
Twitter: http://twitter.com/davsclaus