You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ofbiz.apache.org by Adam Heath <do...@brainfood.com> on 2010/04/01 17:31:06 UTC

multi-threaded EntityDataLoadContainer and SequenceUtil

As Adrian and I previously discussed, he said he had discovered some
possible problems with SequenceUtil in multi-threaded situations.  He
discovered this when he made EntityDataLoadContainer load each xml
file in a thread.

I've recently done the same on my local copy, but I don't see any
problems.  What I did see, however, was that just throwing every xml
data file into a thread(actually, a 4-count thread pool), had errors
loading some files, because each file has an implicit dependency on
some possible other set of files, and those files hadn't been loaded yet.

So, before doing a thread load, the files would have to have an
explicit dependency listed, so that correct ordering could be done.
This is not something that would make ofbiz easier to use.

Trying to figure out the implicit dependencies automatically by
comparing each entity line isn't worthwhile, as that would be
reimplementing a database, and what would be the point.

So, Adrian, if you have any more pointers as to what your original
change did, I'd appreciate any insight you might have.  Otherwise, I
will say that we can't load data in parallel.

Additionally, I suspsected that SequenceUtil actually *didn't* have
any problems.  I wrote a test case quite a while back that did
multi-threaded testing of SequenceUtil, and it never had any problems.
 It used 100 threads, with each thread trying to allocate 1000
sequence values.

Re: multi-threaded EntityDataLoadContainer and SequenceUtil

Posted by Adrian Crum <ad...@yahoo.com>.

--- On Sat, 4/3/10, Adam Heath <do...@brainfood.com> wrote:
> Adrian Crum wrote:
> > --- On Sat, 4/3/10, Adam Heath <do...@brainfood.com>
> wrote:
> >> Adrian Crum wrote:
> >>> I multi-threaded the data load by having one
> thread
> >> parse the XML files
> >>> and put the results in a queue. Another
> thread
> >> services the queue and
> >>> loads the data. I also multi-threaded the
> EECAs - but
> >> that has an issue
> >>> I need to solve.
> >> Well, there could be some EECAs that have
> dependencies on
> >> each other,
> >> when defined in a single definition file. 
> Or, they
> >> have implicit
> >> dependencies with other earlier defined
> ecas.  Like,
> >> maybe an order
> >> eca assuming that a product eca has run, just
> because ofbiz
> >> has always
> >> loaded the product component before the order
> component.
> > 
> > I used a FIFO queue serviced by a single thread for
> the EECAs - to preserve the sequence. The main idea was to
> offload the EECA execution from the thread that triggered
> the EECA. The data load was also in a FIFO queue serviced by
> a single thread so the files were being loaded in order.
> > 
> > To summarize:
> > 
> > 1. Table creation is handled by a thread pool with an
> adjustable size. A thread task is to create a table and its
> primary keys. Thread tasks run in parallel. Main thread
> blocks until all tables and primary keys are created.
> > 2. Main thread creates foreign keys.
> > 3. Main thread parses XML files, puts results in data
> load queue.
> > 4. A data load thread services the data load queue and
> stores the data. If an ECA is triggered it puts the ECA info
> in an ECA queue.
> > 5. An ECA thread services the ECA queue and runs the
> ECA.
> > 6. Main thread blocks until all queues are empty.
> 
> Except if an eca fires, but the main data load thread keeps
> going,
> then the main data load thread might insert/update
> something that
> hasn't yet been manipulated by the eca(s).

Good point. Maybe that's the problem I'm having and needed to track down.

> Additionally, and eca can run a service, which can do
> anything,
> including adding/updating/removing other values, which
> cause other
> ecas to fire.  Which then interact with the
> queued-based eca.
> 
> Were your changes only active at startup, during the
> initial install,
> or were they always available?  When data is later
> manipulated, during
> a test run, certain guarantees still have to be met(which
> I'm sure you
> know).

It was just for run-install.

> >> This is a difficult problem to solve; probably not
> worth
> >> it.  During
> >> production, different high-level threads,
> modifying
> >> different
> >> entities, will run faster, they are already
> running in
> >> multiple threads.
> >>
> >> Most ecas(entity, and probably service) generally
> run
> >> relatively fast.
> >>    Trying to break that up and dispatch
> into
> >> a thread pool might make
> >> things slower, as you have cpu cache coherency
> effects to
> >> content with.
> >>
> >> What would be better, is to break up the higher
> levels into
> >> more
> >> threads, during an install.  That could be
> made
> >> semi-smart, if we add
> >> file dependencies to the data xml files. 
> Such
> >> explicit dependencies
> >> will  have to be done by hand.  Then, a
> parallel
> >> execution framework,
> >> that ran each xml file in parallel, once all of
> it's
> >> dependencies were
> >> met, would give us a speedup.
> > 
> > The minor changes I made cut the data load time in
> half. That's not fast enough? ;-)
> > 
> > It didn't take a lot of threads or a lot of thought to
> speed things up. The bottom line is, you want to keep parts
> of the process going while waiting for DB I/O.
> 
> As for run-install, it starts up catalina.  It'd be
> nice if that were
> multi-threaded as well.  But catalina appears to be
> serial internally.

Getting back to SEDA...

We could implement a SEDA-like architecture in a separate control servlet and try it out on different applications by changing their web.xml files. If we had access to the author's test code we could see if it made a difference in overload situations. Where I work we have a classroom filled with computers that could be used as clients to test a SEDA server.

Re: multi-threaded EntityDataLoadContainer and SequenceUtil

Posted by Adam Heath <do...@brainfood.com>.

Adrian Crum wrote:
> --- On Sat, 4/3/10, Adam Heath <do...@brainfood.com> wrote:
>> Adrian Crum wrote:
>>> I multi-threaded the data load by having one thread
>> parse the XML files
>>> and put the results in a queue. Another thread
>> services the queue and
>>> loads the data. I also multi-threaded the EECAs - but
>> that has an issue
>>> I need to solve.
>> Well, there could be some EECAs that have dependencies on
>> each other,
>> when defined in a single definition file.  Or, they
>> have implicit
>> dependencies with other earlier defined ecas.  Like,
>> maybe an order
>> eca assuming that a product eca has run, just because ofbiz
>> has always
>> loaded the product component before the order component.
> 
> I used a FIFO queue serviced by a single thread for the EECAs - to preserve the sequence. The main idea was to offload the EECA execution from the thread that triggered the EECA. The data load was also in a FIFO queue serviced by a single thread so the files were being loaded in order.
> 
> To summarize:
> 
> 1. Table creation is handled by a thread pool with an adjustable size. A thread task is to create a table and its primary keys. Thread tasks run in parallel. Main thread blocks until all tables and primary keys are created.
> 2. Main thread creates foreign keys.
> 3. Main thread parses XML files, puts results in data load queue.
> 4. A data load thread services the data load queue and stores the data. If an ECA is triggered it puts the ECA info in an ECA queue.
> 5. An ECA thread services the ECA queue and runs the ECA.
> 6. Main thread blocks until all queues are empty.

Except if an eca fires, but the main data load thread keeps going,
then the main data load thread might insert/update something that
hasn't yet been manipulated by the eca(s).

Additionally, and eca can run a service, which can do anything,
including adding/updating/removing other values, which cause other
ecas to fire.  Which then interact with the queued-based eca.

Were your changes only active at startup, during the initial install,
or were they always available?  When data is later manipulated, during
a test run, certain guarantees still have to be met(which I'm sure you
know).

>> This is a difficult problem to solve; probably not worth
>> it.  During
>> production, different high-level threads, modifying
>> different
>> entities, will run faster, they are already running in
>> multiple threads.
>>
>> Most ecas(entity, and probably service) generally run
>> relatively fast.
>>    Trying to break that up and dispatch into
>> a thread pool might make
>> things slower, as you have cpu cache coherency effects to
>> content with.
>>
>> What would be better, is to break up the higher levels into
>> more
>> threads, during an install.  That could be made
>> semi-smart, if we add
>> file dependencies to the data xml files.  Such
>> explicit dependencies
>> will  have to be done by hand.  Then, a parallel
>> execution framework,
>> that ran each xml file in parallel, once all of it's
>> dependencies were
>> met, would give us a speedup.
> 
> The minor changes I made cut the data load time in half. That's not fast enough? ;-)
> 
> It didn't take a lot of threads or a lot of thought to speed things up. The bottom line is, you want to keep parts of the process going while waiting for DB I/O.

As for run-install, it starts up catalina.  It'd be nice if that were
multi-threaded as well.  But catalina appears to be serial internally.

Re: multi-threaded EntityDataLoadContainer and SequenceUtil

Posted by Adrian Crum <ad...@yahoo.com>.

--- On Sat, 4/3/10, Adam Heath <do...@brainfood.com> wrote:
> Adrian Crum wrote:
> > I multi-threaded the data load by having one thread
> parse the XML files
> > and put the results in a queue. Another thread
> services the queue and
> > loads the data. I also multi-threaded the EECAs - but
> that has an issue
> > I need to solve.
> 
> Well, there could be some EECAs that have dependencies on
> each other,
> when defined in a single definition file.  Or, they
> have implicit
> dependencies with other earlier defined ecas.  Like,
> maybe an order
> eca assuming that a product eca has run, just because ofbiz
> has always
> loaded the product component before the order component.

I used a FIFO queue serviced by a single thread for the EECAs - to preserve the sequence. The main idea was to offload the EECA execution from the thread that triggered the EECA. The data load was also in a FIFO queue serviced by a single thread so the files were being loaded in order.

To summarize:

1. Table creation is handled by a thread pool with an adjustable size. A thread task is to create a table and its primary keys. Thread tasks run in parallel. Main thread blocks until all tables and primary keys are created.
2. Main thread creates foreign keys.
3. Main thread parses XML files, puts results in data load queue.
4. A data load thread services the data load queue and stores the data. If an ECA is triggered it puts the ECA info in an ECA queue.
5. An ECA thread services the ECA queue and runs the ECA.
6. Main thread blocks until all queues are empty.

> This is a difficult problem to solve; probably not worth
> it.  During
> production, different high-level threads, modifying
> different
> entities, will run faster, they are already running in
> multiple threads.
> 
> Most ecas(entity, and probably service) generally run
> relatively fast.
>    Trying to break that up and dispatch into
> a thread pool might make
> things slower, as you have cpu cache coherency effects to
> content with.
> 
> What would be better, is to break up the higher levels into
> more
> threads, during an install.  That could be made
> semi-smart, if we add
> file dependencies to the data xml files.  Such
> explicit dependencies
> will  have to be done by hand.  Then, a parallel
> execution framework,
> that ran each xml file in parallel, once all of it's
> dependencies were
> met, would give us a speedup.

The minor changes I made cut the data load time in half. That's not fast enough? ;-)

It didn't take a lot of threads or a lot of thought to speed things up. The bottom line is, you want to keep parts of the process going while waiting for DB I/O.

Re: multi-threaded EntityDataLoadContainer and SequenceUtil

Posted by Adam Heath <do...@brainfood.com>.

Adrian Crum wrote:
> I multi-threaded the data load by having one thread parse the XML files
> and put the results in a queue. Another thread services the queue and
> loads the data. I also multi-threaded the EECAs - but that has an issue
> I need to solve.

Well, there could be some EECAs that have dependencies on each other,
when defined in a single definition file.  Or, they have implicit
dependencies with other earlier defined ecas.  Like, maybe an order
eca assuming that a product eca has run, just because ofbiz has always
loaded the product component before the order component.

This is a difficult problem to solve; probably not worth it.  During
production, different high-level threads, modifying different
entities, will run faster, they are already running in multiple threads.

Most ecas(entity, and probably service) generally run relatively fast.
   Trying to break that up and dispatch into a thread pool might make
things slower, as you have cpu cache coherency effects to content with.

What would be better, is to break up the higher levels into more
threads, during an install.  That could be made semi-smart, if we add
file dependencies to the data xml files.  Such explicit dependencies
will  have to be done by hand.  Then, a parallel execution framework,
that ran each xml file in parallel, once all of it's dependencies were
met, would give us a speedup.

Re: multi-threaded EntityDataLoadContainer and SequenceUtil

Posted by Adrian Crum <ad...@yahoo.com>.

--- On Thu, 4/1/10, Adam Heath <do...@brainfood.com> wrote:
> Adrian Crum wrote:
> > --- On Thu, 4/1/10, Adam Heath <do...@brainfood.com>
> wrote:
> >> Adrian Crum wrote:
> >>> --- On Thu, 4/1/10, Adam Heath <do...@brainfood.com>
> >> wrote:
> >>>> Adrian Crum wrote:
> >>>>> --- On Thu, 4/1/10, Adam Heath <do...@brainfood.com>
> >>>> wrote:
> >>>>>> Adrian Crum wrote:
> >>>>>>> I multi-threaded the data load
> by
> >> having one
> >>>> thread
> >>>>>> parse the XML files
> >>>>>>> and put the results in a
> queue.
> >> Another
> >>>> thread
> >>>>>> services the queue and
> >>>>>>> loads the data. I also
> multi-threaded
> >> the
> >>>> EECAs - but
> >>>>>> that has an issue
> >>>>>>> I need to solve.
> >>>>>> We need to be careful with that. 
> >>>> EntitySaxReader
> >>>>>> supports reading
> >>>>>> extremely large data files; it
> doesn't
> >> read the
> >>>> entire
> >>>>>> thing into
> >>>>>> memory.  So, any such event
> dispatch
> >> system
> >>>> needs to
> >>>>>> keep the parsing
> >>>>>> from getting to far ahead.
> >>>>> http://java.sun.com/javase/6/docs/api/java/util/concurrent/BlockingQueue.html
> >>>> Not really.  That will block the
> calling
> >> thread when
> >>>> no data is available.
> >>> Yeah, really.
> >>>
> >>> 1. Construct a FIFO queue, fire up n consumers
> to
> >> service the queue.
> >>> 2. Consumers block, waiting for queue
> elements.
> >>> 3. Producer adds elements to queue. Consumers
> >> unblock.
> >>> 4. Queue reaches capacity, producer blocks,
> waiting
> >> for room.
> >>> 5. Consumers empty the queue.
> >>> 6. Goto step 2.
> >> And that's a blocking algo, which is bad.
> >>
> >> If you only have a limited number of threads, then
> anytime
> >> one of them
> >> blocks, the thread becomes unavailable to do real
> work.
> >>
> >> What needs to happen in these cases is that the
> thread
> >> removes it self
> >> from the thread pool, and the consumer thread then
> had to
> >> resubmit the
> >> producer.
> >>
> >> The whole point of SEDA is to not have unbounded
> resource
> >> usage.  If a
> >> thread gets blocked, then that implies that
> another new
> >> thread will be
> >> needed to keep the work queue proceeding.
> > 
> > Why Events Are A Bad Idea (for high-concurrency
> servers) - http://capriccio.cs.berkeley.edu/pubs/threads-hotos-2003.pdf
> > 
> > An interesting refutation to SEDA.
> 
> (haven't read that yet)
> 
> ==
> mkdir /dev/shm/ofbiz-runtime
> mount --bind /dev/shm/ofbiz-runtime
> $OFBIZ_HOME/runtime/data
> ==
> 
> Quick speedup.  /dev/shm is a tmpfs(on linux anyways),
> basically a
> filesystem kept only in ram.

==
goto /local/neighborhood/best-buy
purchase $CPU, $RAM, $RAID
echo Problem solved
==

Works on Windows.

Re: multi-threaded EntityDataLoadContainer and SequenceUtil

Posted by Adam Heath <do...@brainfood.com>.

Adrian Crum wrote:
> --- On Thu, 4/1/10, Adam Heath <do...@brainfood.com> wrote:
>> Adrian Crum wrote:
>>> --- On Thu, 4/1/10, Adam Heath <do...@brainfood.com>
>> wrote:
>>>> Adrian Crum wrote:
>>>>> --- On Thu, 4/1/10, Adam Heath <do...@brainfood.com>
>>>> wrote:
>>>>>> Adrian Crum wrote:
>>>>>>> I multi-threaded the data load by
>> having one
>>>> thread
>>>>>> parse the XML files
>>>>>>> and put the results in a queue.
>> Another
>>>> thread
>>>>>> services the queue and
>>>>>>> loads the data. I also multi-threaded
>> the
>>>> EECAs - but
>>>>>> that has an issue
>>>>>>> I need to solve.
>>>>>> We need to be careful with that. 
>>>> EntitySaxReader
>>>>>> supports reading
>>>>>> extremely large data files; it doesn't
>> read the
>>>> entire
>>>>>> thing into
>>>>>> memory.  So, any such event dispatch
>> system
>>>> needs to
>>>>>> keep the parsing
>>>>>> from getting to far ahead.
>>>>> http://java.sun.com/javase/6/docs/api/java/util/concurrent/BlockingQueue.html
>>>> Not really.  That will block the calling
>> thread when
>>>> no data is available.
>>> Yeah, really.
>>>
>>> 1. Construct a FIFO queue, fire up n consumers to
>> service the queue.
>>> 2. Consumers block, waiting for queue elements.
>>> 3. Producer adds elements to queue. Consumers
>> unblock.
>>> 4. Queue reaches capacity, producer blocks, waiting
>> for room.
>>> 5. Consumers empty the queue.
>>> 6. Goto step 2.
>> And that's a blocking algo, which is bad.
>>
>> If you only have a limited number of threads, then anytime
>> one of them
>> blocks, the thread becomes unavailable to do real work.
>>
>> What needs to happen in these cases is that the thread
>> removes it self
>> from the thread pool, and the consumer thread then had to
>> resubmit the
>> producer.
>>
>> The whole point of SEDA is to not have unbounded resource
>> usage.  If a
>> thread gets blocked, then that implies that another new
>> thread will be
>> needed to keep the work queue proceeding.
> 
> Why Events Are A Bad Idea (for high-concurrency servers) - http://capriccio.cs.berkeley.edu/pubs/threads-hotos-2003.pdf
> 
> An interesting refutation to SEDA.

(haven't read that yet)

==
mkdir /dev/shm/ofbiz-runtime
mount --bind /dev/shm/ofbiz-runtime $OFBIZ_HOME/runtime/data
==

Quick speedup.  /dev/shm is a tmpfs(on linux anyways), basically a
filesystem kept only in ram.

Re: multi-threaded EntityDataLoadContainer and SequenceUtil

Posted by Adrian Crum <ad...@yahoo.com>.

--- On Thu, 4/1/10, Adam Heath <do...@brainfood.com> wrote:
> Adrian Crum wrote:
> > --- On Thu, 4/1/10, Adam Heath <do...@brainfood.com>
> wrote:
> >> Adrian Crum wrote:
> >>> --- On Thu, 4/1/10, Adam Heath <do...@brainfood.com>
> >> wrote:
> >>>> Adrian Crum wrote:
> >>>>> I multi-threaded the data load by
> having one
> >> thread
> >>>> parse the XML files
> >>>>> and put the results in a queue.
> Another
> >> thread
> >>>> services the queue and
> >>>>> loads the data. I also multi-threaded
> the
> >> EECAs - but
> >>>> that has an issue
> >>>>> I need to solve.
> >>>> We need to be careful with that. 
> >> EntitySaxReader
> >>>> supports reading
> >>>> extremely large data files; it doesn't
> read the
> >> entire
> >>>> thing into
> >>>> memory.  So, any such event dispatch
> system
> >> needs to
> >>>> keep the parsing
> >>>> from getting to far ahead.
> >>> http://java.sun.com/javase/6/docs/api/java/util/concurrent/BlockingQueue.html
> >> Not really.  That will block the calling
> thread when
> >> no data is available.
> > 
> > Yeah, really.
> > 
> > 1. Construct a FIFO queue, fire up n consumers to
> service the queue.
> > 2. Consumers block, waiting for queue elements.
> > 3. Producer adds elements to queue. Consumers
> unblock.
> > 4. Queue reaches capacity, producer blocks, waiting
> for room.
> > 5. Consumers empty the queue.
> > 6. Goto step 2.
> 
> And that's a blocking algo, which is bad.
> 
> If you only have a limited number of threads, then anytime
> one of them
> blocks, the thread becomes unavailable to do real work.
> 
> What needs to happen in these cases is that the thread
> removes it self
> from the thread pool, and the consumer thread then had to
> resubmit the
> producer.
> 
> The whole point of SEDA is to not have unbounded resource
> usage.  If a
> thread gets blocked, then that implies that another new
> thread will be
> needed to keep the work queue proceeding.

Why Events Are A Bad Idea (for high-concurrency servers) - http://capriccio.cs.berkeley.edu/pubs/threads-hotos-2003.pdf

An interesting refutation to SEDA.

Re: multi-threaded EntityDataLoadContainer and SequenceUtil

Posted by Adrian Crum <ad...@yahoo.com>.

--- On Thu, 4/1/10, Adam Heath <do...@brainfood.com> wrote:
> >> Nope, not good enough.  It would be possible
> for the
> >> producer thread
> >> to stuck for a long time,
> producing/consuming.  If
> >> there are several
> >> such workflows like this in the thread pool, then
> the
> >> threads become
> >> unavailable for doing other work.
> > 
> > Are we talking about theoretical software or OFBiz?
> What thread pool? The application server's? I have been
> referring to the existing OFBiz entity import/export code.
> If an entity import takes n mS in the current
> single-threaded code, and the same import takes n/x mS using
> multi-threaded code, then hasn't the performance improved?
> 
> Data loading can take place from webtools.  And
> several requests could
> be submitted at once.  There's no reason to try and
> process them all
> at the same time, if the cpu is loaded.  Just queue up
> the requests.

Like SEDA or my JMS idea. In other words, theoretical.

> I'm not suggesting we go thru and change ofbiz to some kind
> of
> segmented event dispatcher.  But the basic
> infrastructure is simple
> enough to write, it doesn't hurt to do it right in the
> first place.

Simpler yet is to use a BlockingQueue for this one task.

I'm not disagreeing with you - it would be cool to have a SEDA-style application. Instead, I'm advocating baby steps. From my perspective, it is easier to try a simple multi-threaded approach and see if it causes any problems. If that works okay, then you can make it more sophisticated.

Multiple simultaneous huge entity import requests under heavy load sounds like an unlikely scenario. Is there a real need to design for that?

Re: multi-threaded EntityDataLoadContainer and SequenceUtil

Posted by Adam Heath <do...@brainfood.com>.

Adrian Crum wrote:
> --- On Thu, 4/1/10, Adam Heath <do...@brainfood.com> wrote:
>> Adrian Crum wrote:
>>> --- On Thu, 4/1/10, Adam Heath <do...@brainfood.com>
>> wrote:
>>>> Adrian Crum wrote:
>>>>> --- On Thu, 4/1/10, Adam Heath <do...@brainfood.com>
>>>> wrote:
>>>>>> Adrian Crum wrote:
>>>>>>> --- On Thu, 4/1/10, Adam Heath <do...@brainfood.com>
>>>>>> wrote:
>>>>>>>> Adrian Crum wrote:
>>>>>>>>> --- On Thu, 4/1/10, Adam Heath
>> <do...@brainfood.com>
>>>>>>>> wrote:
>>>>>>>>>> Adrian Crum wrote:
>>>>>>>>>>> I multi-threaded the
>> data load
>>>> by
>>>>>> having one
>>>>>>>> thread
>>>>>>>>>> parse the XML files
>>>>>>>>>>> and put the results in
>> a
>>>> queue.
>>>>>> Another
>>>>>>>> thread
>>>>>>>>>> services the queue and
>>>>>>>>>>> loads the data. I
>> also
>>>> multi-threaded
>>>>>> the
>>>>>>>> EECAs - but
>>>>>>>>>> that has an issue
>>>>>>>>>>> I need to solve.
>>>>>>>>>> We need to be careful with
>> that. 
>>>>>>>> EntitySaxReader
>>>>>>>>>> supports reading
>>>>>>>>>> extremely large data
>> files; it
>>>> doesn't
>>>>>> read the
>>>>>>>> entire
>>>>>>>>>> thing into
>>>>>>>>>> memory.  So, any such
>> event
>>>> dispatch
>>>>>> system
>>>>>>>> needs to
>>>>>>>>>> keep the parsing
>>>>>>>>>> from getting to far
>> ahead.
>>>>>>>>> http://java.sun.com/javase/6/docs/api/java/util/concurrent/BlockingQueue.html
>>>>>>>> Not really.  That will block
>> the
>>>> calling
>>>>>> thread when
>>>>>>>> no data is available.
>>>>>>> Yeah, really.
>>>>>>>
>>>>>>> 1. Construct a FIFO queue, fire up n
>> consumers
>>>> to
>>>>>> service the queue.
>>>>>>> 2. Consumers block, waiting for queue
>>>> elements.
>>>>>>> 3. Producer adds elements to queue.
>> Consumers
>>>>>> unblock.
>>>>>>> 4. Queue reaches capacity, producer
>> blocks,
>>>> waiting
>>>>>> for room.
>>>>>>> 5. Consumers empty the queue.
>>>>>>> 6. Goto step 2.
>>>>>> And that's a blocking algo, which is bad.
>>>>> Huh? You just asked for a blocking algorithm:
>> "So, any
>>>> such event dispatch system needs to keep the
>> parsing from
>>>> getting to far ahead."
>>>>
>>>> No, I didn't ask for a blocking algorithm. 
>> When the
>>>> outgoing queue is
>>>> full, the producer needs to pause itself, so that
>> it's
>>>> thread can be
>>>> used for other things.
>>> I guess you could make the producer consume a queue
>> element, then try adding the new one again. So:
>>
>> Nope, not good enough.  It would be possible for the
>> producer thread
>> to stuck for a long time, producing/consuming.  If
>> there are several
>> such workflows like this in the thread pool, then the
>> threads become
>> unavailable for doing other work.
> 
> Are we talking about theoretical software or OFBiz? What thread pool? The application server's? I have been referring to the existing OFBiz entity import/export code. If an entity import takes n mS in the current single-threaded code, and the same import takes n/x mS using multi-threaded code, then hasn't the performance improved?

Data loading can take place from webtools.  And several requests could
be submitted at once.  There's no reason to try and process them all
at the same time, if the cpu is loaded.  Just queue up the requests.

Plus(this part is theorhetical), when ofbiz is more segmented, other
things would go thru same pool.  And thrashing would be reduced.

I'm not suggesting we go thru and change ofbiz to some kind of
segmented event dispatcher.  But the basic infrastructure is simple
enough to write, it doesn't hurt to do it right in the first place.

> 
>> CPU is a limited resource.
> 
> CPUs are cheap. Just buy more. ;-)

Go survive a slashdotting.

Re: multi-threaded EntityDataLoadContainer and SequenceUtil

Posted by Adrian Crum <ad...@yahoo.com>.

--- On Thu, 4/1/10, Adam Heath <do...@brainfood.com> wrote:
> Adrian Crum wrote:
> > --- On Thu, 4/1/10, Adam Heath <do...@brainfood.com>
> wrote:
> >> Adrian Crum wrote:
> >>> --- On Thu, 4/1/10, Adam Heath <do...@brainfood.com>
> >> wrote:
> >>>> Adrian Crum wrote:
> >>>>> --- On Thu, 4/1/10, Adam Heath <do...@brainfood.com>
> >>>> wrote:
> >>>>>> Adrian Crum wrote:
> >>>>>>> --- On Thu, 4/1/10, Adam Heath
> <do...@brainfood.com>
> >>>>>> wrote:
> >>>>>>>> Adrian Crum wrote:
> >>>>>>>>> I multi-threaded the
> data load
> >> by
> >>>> having one
> >>>>>> thread
> >>>>>>>> parse the XML files
> >>>>>>>>> and put the results in
> a
> >> queue.
> >>>> Another
> >>>>>> thread
> >>>>>>>> services the queue and
> >>>>>>>>> loads the data. I
> also
> >> multi-threaded
> >>>> the
> >>>>>> EECAs - but
> >>>>>>>> that has an issue
> >>>>>>>>> I need to solve.
> >>>>>>>> We need to be careful with
> that. 
> >>>>>> EntitySaxReader
> >>>>>>>> supports reading
> >>>>>>>> extremely large data
> files; it
> >> doesn't
> >>>> read the
> >>>>>> entire
> >>>>>>>> thing into
> >>>>>>>> memory.  So, any such
> event
> >> dispatch
> >>>> system
> >>>>>> needs to
> >>>>>>>> keep the parsing
> >>>>>>>> from getting to far
> ahead.
> >>>>>>> http://java.sun.com/javase/6/docs/api/java/util/concurrent/BlockingQueue.html
> >>>>>> Not really.  That will block
> the
> >> calling
> >>>> thread when
> >>>>>> no data is available.
> >>>>> Yeah, really.
> >>>>>
> >>>>> 1. Construct a FIFO queue, fire up n
> consumers
> >> to
> >>>> service the queue.
> >>>>> 2. Consumers block, waiting for queue
> >> elements.
> >>>>> 3. Producer adds elements to queue.
> Consumers
> >>>> unblock.
> >>>>> 4. Queue reaches capacity, producer
> blocks,
> >> waiting
> >>>> for room.
> >>>>> 5. Consumers empty the queue.
> >>>>> 6. Goto step 2.
> >>>> And that's a blocking algo, which is bad.
> >>> Huh? You just asked for a blocking algorithm:
> "So, any
> >> such event dispatch system needs to keep the
> parsing from
> >> getting to far ahead."
> >>
> >> No, I didn't ask for a blocking algorithm. 
> When the
> >> outgoing queue is
> >> full, the producer needs to pause itself, so that
> it's
> >> thread can be
> >> used for other things.
> > 
> > I guess you could make the producer consume a queue
> element, then try adding the new one again. So:
> 
> Nope, not good enough.  It would be possible for the
> producer thread
> to stuck for a long time, producing/consuming.  If
> there are several
> such workflows like this in the thread pool, then the
> threads become
> unavailable for doing other work.

Are we talking about theoretical software or OFBiz? What thread pool? The application server's? I have been referring to the existing OFBiz entity import/export code. If an entity import takes n mS in the current single-threaded code, and the same import takes n/x mS using multi-threaded code, then hasn't the performance improved?

> CPU is a limited resource.

CPUs are cheap. Just buy more. ;-)

Re: multi-threaded EntityDataLoadContainer and SequenceUtil

Posted by Adam Heath <do...@brainfood.com>.

Adrian Crum wrote:
> --- On Thu, 4/1/10, Adam Heath <do...@brainfood.com> wrote:
>> Adrian Crum wrote:
>>> --- On Thu, 4/1/10, Adam Heath <do...@brainfood.com>
>> wrote:
>>>> Adrian Crum wrote:
>>>>> --- On Thu, 4/1/10, Adam Heath <do...@brainfood.com>
>>>> wrote:
>>>>>> Adrian Crum wrote:
>>>>>>> --- On Thu, 4/1/10, Adam Heath <do...@brainfood.com>
>>>>>> wrote:
>>>>>>>> Adrian Crum wrote:
>>>>>>>>> I multi-threaded the data load
>> by
>>>> having one
>>>>>> thread
>>>>>>>> parse the XML files
>>>>>>>>> and put the results in a
>> queue.
>>>> Another
>>>>>> thread
>>>>>>>> services the queue and
>>>>>>>>> loads the data. I also
>> multi-threaded
>>>> the
>>>>>> EECAs - but
>>>>>>>> that has an issue
>>>>>>>>> I need to solve.
>>>>>>>> We need to be careful with that. 
>>>>>> EntitySaxReader
>>>>>>>> supports reading
>>>>>>>> extremely large data files; it
>> doesn't
>>>> read the
>>>>>> entire
>>>>>>>> thing into
>>>>>>>> memory.  So, any such event
>> dispatch
>>>> system
>>>>>> needs to
>>>>>>>> keep the parsing
>>>>>>>> from getting to far ahead.
>>>>>>> http://java.sun.com/javase/6/docs/api/java/util/concurrent/BlockingQueue.html
>>>>>> Not really.  That will block the
>> calling
>>>> thread when
>>>>>> no data is available.
>>>>> Yeah, really.
>>>>>
>>>>> 1. Construct a FIFO queue, fire up n consumers
>> to
>>>> service the queue.
>>>>> 2. Consumers block, waiting for queue
>> elements.
>>>>> 3. Producer adds elements to queue. Consumers
>>>> unblock.
>>>>> 4. Queue reaches capacity, producer blocks,
>> waiting
>>>> for room.
>>>>> 5. Consumers empty the queue.
>>>>> 6. Goto step 2.
>>>> And that's a blocking algo, which is bad.
>>> Huh? You just asked for a blocking algorithm: "So, any
>> such event dispatch system needs to keep the parsing from
>> getting to far ahead."
>>
>> No, I didn't ask for a blocking algorithm.  When the
>> outgoing queue is
>> full, the producer needs to pause itself, so that it's
>> thread can be
>> used for other things.
> 
> I guess you could make the producer consume a queue element, then try adding the new one again. So:

Nope, not good enough.  It would be possible for the producer thread
to stuck for a long time, producing/consuming.  If there are several
such workflows like this in the thread pool, then the threads become
unavailable for doing other work.

CPU is a limited resource.  In the SEDA model, a worker must be short
in execution time, and return back into the pool when it is done.
It's perfectly acceptable, however, to add another item to the pool's
queue to continue processing, however.

1: producer runs, creates a work unit
2: if the end has reach, submit the work unit directly
3: otherwise, wrap the unit, so that when the unit gets run, the
producer will be resubmitted.

Re: multi-threaded EntityDataLoadContainer and SequenceUtil

Posted by Adrian Crum <ad...@yahoo.com>.

--- On Thu, 4/1/10, Adam Heath <do...@brainfood.com> wrote:
> Adrian Crum wrote:
> > --- On Thu, 4/1/10, Adam Heath <do...@brainfood.com>
> wrote:
> >> Adrian Crum wrote:
> >>> --- On Thu, 4/1/10, Adam Heath <do...@brainfood.com>
> >> wrote:
> >>>> Adrian Crum wrote:
> >>>>> --- On Thu, 4/1/10, Adam Heath <do...@brainfood.com>
> >>>> wrote:
> >>>>>> Adrian Crum wrote:
> >>>>>>> I multi-threaded the data load
> by
> >> having one
> >>>> thread
> >>>>>> parse the XML files
> >>>>>>> and put the results in a
> queue.
> >> Another
> >>>> thread
> >>>>>> services the queue and
> >>>>>>> loads the data. I also
> multi-threaded
> >> the
> >>>> EECAs - but
> >>>>>> that has an issue
> >>>>>>> I need to solve.
> >>>>>> We need to be careful with that. 
> >>>> EntitySaxReader
> >>>>>> supports reading
> >>>>>> extremely large data files; it
> doesn't
> >> read the
> >>>> entire
> >>>>>> thing into
> >>>>>> memory.  So, any such event
> dispatch
> >> system
> >>>> needs to
> >>>>>> keep the parsing
> >>>>>> from getting to far ahead.
> >>>>> http://java.sun.com/javase/6/docs/api/java/util/concurrent/BlockingQueue.html
> >>>> Not really.  That will block the
> calling
> >> thread when
> >>>> no data is available.
> >>> Yeah, really.
> >>>
> >>> 1. Construct a FIFO queue, fire up n consumers
> to
> >> service the queue.
> >>> 2. Consumers block, waiting for queue
> elements.
> >>> 3. Producer adds elements to queue. Consumers
> >> unblock.
> >>> 4. Queue reaches capacity, producer blocks,
> waiting
> >> for room.
> >>> 5. Consumers empty the queue.
> >>> 6. Goto step 2.
> >> And that's a blocking algo, which is bad.
> > 
> > Huh? You just asked for a blocking algorithm: "So, any
> such event dispatch system needs to keep the parsing from
> getting to far ahead."
> 
> No, I didn't ask for a blocking algorithm.  When the
> outgoing queue is
> full, the producer needs to pause itself, so that it's
> thread can be
> used for other things.

I guess you could make the producer consume a queue element, then try adding the new one again. So:

1. Construct a FIFO queue, fire up n consumers to service the queue.
2. Consumers block, waiting for queue elements.
3. Producer adds elements to queue. Consumers unblock.
4. Queue reaches capacity, producer becomes a consumer until there is room for new elements.
5. Consumers empty the queue.
6. Goto step 2.

Btw, from my understanding of SEDA, entity import/export would be tasks that are submitted to a task queue. The queue's response time controller would determine if there are enough resources available to run the task. If the server is really busy, the task is rejected.

Re: multi-threaded EntityDataLoadContainer and SequenceUtil

Posted by Adam Heath <do...@brainfood.com>.

Adrian Crum wrote:
> --- On Thu, 4/1/10, Adam Heath <do...@brainfood.com> wrote:
>> Adrian Crum wrote:
>>> --- On Thu, 4/1/10, Adam Heath <do...@brainfood.com>
>> wrote:
>>>> Adrian Crum wrote:
>>>>> --- On Thu, 4/1/10, Adam Heath <do...@brainfood.com>
>>>> wrote:
>>>>>> Adrian Crum wrote:
>>>>>>> I multi-threaded the data load by
>> having one
>>>> thread
>>>>>> parse the XML files
>>>>>>> and put the results in a queue.
>> Another
>>>> thread
>>>>>> services the queue and
>>>>>>> loads the data. I also multi-threaded
>> the
>>>> EECAs - but
>>>>>> that has an issue
>>>>>>> I need to solve.
>>>>>> We need to be careful with that. 
>>>> EntitySaxReader
>>>>>> supports reading
>>>>>> extremely large data files; it doesn't
>> read the
>>>> entire
>>>>>> thing into
>>>>>> memory.  So, any such event dispatch
>> system
>>>> needs to
>>>>>> keep the parsing
>>>>>> from getting to far ahead.
>>>>> http://java.sun.com/javase/6/docs/api/java/util/concurrent/BlockingQueue.html
>>>> Not really.  That will block the calling
>> thread when
>>>> no data is available.
>>> Yeah, really.
>>>
>>> 1. Construct a FIFO queue, fire up n consumers to
>> service the queue.
>>> 2. Consumers block, waiting for queue elements.
>>> 3. Producer adds elements to queue. Consumers
>> unblock.
>>> 4. Queue reaches capacity, producer blocks, waiting
>> for room.
>>> 5. Consumers empty the queue.
>>> 6. Goto step 2.
>> And that's a blocking algo, which is bad.
> 
> Huh? You just asked for a blocking algorithm: "So, any such event dispatch system needs to keep the parsing from getting to far ahead."

No, I didn't ask for a blocking algorithm.  When the outgoing queue is
full, the producer needs to pause itself, so that it's thread can be
used for other things.

Consider a single, shared thread pool, used system wide.  There are
only 8 threads available, as there are only 6 real cpus available.
This thread pool is used to keep the system from getting overloaded,
running too many things at once, and thrashing.

If any of the work items being processed by one of these threads
blocks, then the system will loose a thread for doing other work.

And if A blocks on B, which blocks on C, then D, you've lost 4 threads.

Re: multi-threaded EntityDataLoadContainer and SequenceUtil

Posted by Adrian Crum <ad...@yahoo.com>.

--- On Thu, 4/1/10, Adam Heath <do...@brainfood.com> wrote:
> Adrian Crum wrote:
> > --- On Thu, 4/1/10, Adam Heath <do...@brainfood.com>
> wrote:
> >> Adrian Crum wrote:
> >>> --- On Thu, 4/1/10, Adam Heath <do...@brainfood.com>
> >> wrote:
> >>>> Adrian Crum wrote:
> >>>>> I multi-threaded the data load by
> having one
> >> thread
> >>>> parse the XML files
> >>>>> and put the results in a queue.
> Another
> >> thread
> >>>> services the queue and
> >>>>> loads the data. I also multi-threaded
> the
> >> EECAs - but
> >>>> that has an issue
> >>>>> I need to solve.
> >>>> We need to be careful with that. 
> >> EntitySaxReader
> >>>> supports reading
> >>>> extremely large data files; it doesn't
> read the
> >> entire
> >>>> thing into
> >>>> memory.  So, any such event dispatch
> system
> >> needs to
> >>>> keep the parsing
> >>>> from getting to far ahead.
> >>> http://java.sun.com/javase/6/docs/api/java/util/concurrent/BlockingQueue.html
> >> Not really.  That will block the calling
> thread when
> >> no data is available.
> > 
> > Yeah, really.
> > 
> > 1. Construct a FIFO queue, fire up n consumers to
> service the queue.
> > 2. Consumers block, waiting for queue elements.
> > 3. Producer adds elements to queue. Consumers
> unblock.
> > 4. Queue reaches capacity, producer blocks, waiting
> for room.
> > 5. Consumers empty the queue.
> > 6. Goto step 2.
> 
> And that's a blocking algo, which is bad.

Huh? You just asked for a blocking algorithm: "So, any such event dispatch system needs to keep the parsing from getting to far ahead."

> The whole point of SEDA is to not have unbounded resource
> usage.  If a
> thread gets blocked, then that implies that another new
> thread will be
> needed to keep the work queue proceeding.

You lost me again. I thought we were talking about entity import/export - not SEDA.

Re: multi-threaded EntityDataLoadContainer and SequenceUtil

Posted by Adam Heath <do...@brainfood.com>.

Adrian Crum wrote:
> --- On Thu, 4/1/10, Adam Heath <do...@brainfood.com> wrote:
>> Adrian Crum wrote:
>>> --- On Thu, 4/1/10, Adam Heath <do...@brainfood.com>
>> wrote:
>>>> Adrian Crum wrote:
>>>>> I multi-threaded the data load by having one
>> thread
>>>> parse the XML files
>>>>> and put the results in a queue. Another
>> thread
>>>> services the queue and
>>>>> loads the data. I also multi-threaded the
>> EECAs - but
>>>> that has an issue
>>>>> I need to solve.
>>>> We need to be careful with that. 
>> EntitySaxReader
>>>> supports reading
>>>> extremely large data files; it doesn't read the
>> entire
>>>> thing into
>>>> memory.  So, any such event dispatch system
>> needs to
>>>> keep the parsing
>>>> from getting to far ahead.
>>> http://java.sun.com/javase/6/docs/api/java/util/concurrent/BlockingQueue.html
>> Not really.  That will block the calling thread when
>> no data is available.
> 
> Yeah, really.
> 
> 1. Construct a FIFO queue, fire up n consumers to service the queue.
> 2. Consumers block, waiting for queue elements.
> 3. Producer adds elements to queue. Consumers unblock.
> 4. Queue reaches capacity, producer blocks, waiting for room.
> 5. Consumers empty the queue.
> 6. Goto step 2.

And that's a blocking algo, which is bad.

If you only have a limited number of threads, then anytime one of them
blocks, the thread becomes unavailable to do real work.

What needs to happen in these cases is that the thread removes it self
from the thread pool, and the consumer thread then had to resubmit the
producer.

The whole point of SEDA is to not have unbounded resource usage.  If a
thread gets blocked, then that implies that another new thread will be
needed to keep the work queue proceeding.


> 
> 
> 
> 
>

Re: multi-threaded EntityDataLoadContainer and SequenceUtil

Posted by Adrian Crum <ad...@yahoo.com>.

--- On Thu, 4/1/10, Adam Heath <do...@brainfood.com> wrote:
> Adrian Crum wrote:
> > --- On Thu, 4/1/10, Adam Heath <do...@brainfood.com>
> wrote:
> >> Adrian Crum wrote:
> >>> I multi-threaded the data load by having one
> thread
> >> parse the XML files
> >>> and put the results in a queue. Another
> thread
> >> services the queue and
> >>> loads the data. I also multi-threaded the
> EECAs - but
> >> that has an issue
> >>> I need to solve.
> >> We need to be careful with that. 
> EntitySaxReader
> >> supports reading
> >> extremely large data files; it doesn't read the
> entire
> >> thing into
> >> memory.  So, any such event dispatch system
> needs to
> >> keep the parsing
> >> from getting to far ahead.
> > 
> > http://java.sun.com/javase/6/docs/api/java/util/concurrent/BlockingQueue.html
> 
> Not really.  That will block the calling thread when
> no data is available.

Yeah, really.

1. Construct a FIFO queue, fire up n consumers to service the queue.
2. Consumers block, waiting for queue elements.
3. Producer adds elements to queue. Consumers unblock.
4. Queue reaches capacity, producer blocks, waiting for room.
5. Consumers empty the queue.
6. Goto step 2.

Re: multi-threaded EntityDataLoadContainer and SequenceUtil

Posted by Adam Heath <do...@brainfood.com>.

Adrian Crum wrote:
> --- On Thu, 4/1/10, Adam Heath <do...@brainfood.com> wrote:
>> Adrian Crum wrote:
>>> I multi-threaded the data load by having one thread
>> parse the XML files
>>> and put the results in a queue. Another thread
>> services the queue and
>>> loads the data. I also multi-threaded the EECAs - but
>> that has an issue
>>> I need to solve.
>> We need to be careful with that.  EntitySaxReader
>> supports reading
>> extremely large data files; it doesn't read the entire
>> thing into
>> memory.  So, any such event dispatch system needs to
>> keep the parsing
>> from getting to far ahead.
> 
> http://java.sun.com/javase/6/docs/api/java/util/concurrent/BlockingQueue.html

Not really.  That will block the calling thread when no data is available.

Re: multi-threaded EntityDataLoadContainer and SequenceUtil

Posted by Adrian Crum <ad...@yahoo.com>.

--- On Thu, 4/1/10, Adam Heath <do...@brainfood.com> wrote:
> Adrian Crum wrote:
> > I multi-threaded the data load by having one thread
> parse the XML files
> > and put the results in a queue. Another thread
> services the queue and
> > loads the data. I also multi-threaded the EECAs - but
> that has an issue
> > I need to solve.
> 
> We need to be careful with that.  EntitySaxReader
> supports reading
> extremely large data files; it doesn't read the entire
> thing into
> memory.  So, any such event dispatch system needs to
> keep the parsing
> from getting to far ahead.

http://java.sun.com/javase/6/docs/api/java/util/concurrent/BlockingQueue.html

Re: multi-threaded EntityDataLoadContainer and SequenceUtil

Posted by Adam Heath <do...@brainfood.com>.

Adrian Crum wrote:
> I multi-threaded the data load by having one thread parse the XML files
> and put the results in a queue. Another thread services the queue and
> loads the data. I also multi-threaded the EECAs - but that has an issue
> I need to solve.

We need to be careful with that.  EntitySaxReader supports reading
extremely large data files; it doesn't read the entire thing into
memory.  So, any such event dispatch system needs to keep the parsing
from getting to far ahead.

Re: multi-threaded EntityDataLoadContainer and SequenceUtil

Posted by Jacques Le Roux <ja...@les7arts.com>.

From: "Adrian Crum" <ad...@yahoo.com>
> --- On Thu, 4/1/10, Jacques Le Roux <ja...@les7arts.com> wrote:
>> > Adam Heath wrote:
>> >> So each entity creation itself was a separate work
>> unit. Once an
>> >> entity was created, you could submit the primary
>> key creation as well.
>> >> That's simple enough to implement(in theory,
>> anyways). This design
>> >> is starting to go towards the Sandstorm(1)
>> approach.
>> >
>> > I just looked at that site briefly. You're right - my
>> thinking was a lot like that. Split up the work with queues
>> - in other words, use the provider/consumer pattern.
>> >
>> > If I was designing a product like OFBiz, I would have
>> JMS at the front end. Each request gets packaged up into a
>> JMS message and submitted to a queue. Different tasks
>> respond to the queued messages. The last task is writing the
>> response. The app server's request thread returns almost
>> immediately. Each queue/task could be optimized.
>>
>> This makes remind me that it's mostly what is used
>> underneath in something like ServiceMix or Mule (ESBs).
>> ServiceMix is Based on the JBI concept http://servicemix.apache.org/what-is-jbi.html
>> and uses http://activemq.apache.org/ underneath
>
> Actually, the goals and designs are quite different. The goal of ESB is to have a standards-based message bus so that applications 
> from different vendors can inter-operate. The goal of SEDA (Adam's link) is to use queues to provide uniform response time in 
> servers and allow their services to degrade gracefully under load.

Yes, I saw that. It's only that it reminded me the matter of queues use also in ESBs.

> My idea of using JMS is for overload control. Each queue can be serviced by any number of servers (since JMS uses JNDI). In 
> effect, the application itself becomes a crude load balancer.

I see

Jacques

>
> -Adrian
>
>
>
>
>

Re: multi-threaded EntityDataLoadContainer and SequenceUtil

Posted by Adrian Crum <ad...@yahoo.com>.

--- On Thu, 4/1/10, Jacques Le Roux <ja...@les7arts.com> wrote:
> > Adam Heath wrote:
> >> So each entity creation itself was a separate work
> unit.  Once an
> >> entity was created, you could submit the primary
> key creation as well.
> >>  That's simple enough to implement(in theory,
> anyways).  This design
> >> is starting to go towards the Sandstorm(1)
> approach.
> > 
> > I just looked at that site briefly. You're right - my
> thinking was a lot like that. Split up the work with queues
> - in other words, use the provider/consumer pattern.
> > 
> > If I was designing a product like OFBiz, I would have
> JMS at the front end. Each request gets packaged up into a
> JMS message and submitted to a queue. Different tasks
> respond to the queued messages. The last task is writing the
> response. The app server's request thread returns almost
> immediately. Each queue/task could be optimized.
> 
> This makes remind me that it's mostly what is used
> underneath in something like ServiceMix or Mule (ESBs).
> ServiceMix is Based on the JBI concept http://servicemix.apache.org/what-is-jbi.html
> and uses http://activemq.apache.org/ underneath

Actually, the goals and designs are quite different. The goal of ESB is to have a standards-based message bus so that applications from different vendors can inter-operate. The goal of SEDA (Adam's link) is to use queues to provide uniform response time in servers and allow their services to degrade gracefully under load.

My idea of using JMS is for overload control. Each queue can be serviced by any number of servers (since JMS uses JNDI). In effect, the application itself becomes a crude load balancer.

-Adrian

Re: multi-threaded EntityDataLoadContainer and SequenceUtil

Posted by Jacques Le Roux <ja...@les7arts.com>.

From: "Adrian Crum" <ad...@hlmksw.com>
> Adam Heath wrote:
>> So each entity creation itself was a separate work unit.  Once an
>> entity was created, you could submit the primary key creation as well.
>>  That's simple enough to implement(in theory, anyways).  This design
>> is starting to go towards the Sandstorm(1) approach.
> 
> I just looked at that site briefly. You're right - my thinking was a lot 
> like that. Split up the work with queues - in other words, use the 
> provider/consumer pattern.
> 
> If I was designing a product like OFBiz, I would have JMS at the front 
> end. Each request gets packaged up into a JMS message and submitted to a 
> queue. Different tasks respond to the queued messages. The last task is 
> writing the response. The app server's request thread returns almost 
> immediately. Each queue/task could be optimized.

This makes remind me that it's mostly what is used underneath in something like ServiceMix or Mule (ESBs).
ServiceMix is Based on the JBI concept http://servicemix.apache.org/what-is-jbi.html
and uses http://activemq.apache.org/ underneath

Jacques

Re: multi-threaded EntityDataLoadContainer and SequenceUtil

Posted by Adrian Crum <ad...@hlmksw.com>.

Adam Heath wrote:
> So each entity creation itself was a separate work unit.  Once an
> entity was created, you could submit the primary key creation as well.
>  That's simple enough to implement(in theory, anyways).  This design
> is starting to go towards the Sandstorm(1) approach.

I just looked at that site briefly. You're right - my thinking was a lot 
like that. Split up the work with queues - in other words, use the 
provider/consumer pattern.

If I was designing a product like OFBiz, I would have JMS at the front 
end. Each request gets packaged up into a JMS message and submitted to a 
queue. Different tasks respond to the queued messages. The last task is 
writing the response. The app server's request thread returns almost 
immediately. Each queue/task could be optimized.

Re: multi-threaded EntityDataLoadContainer and SequenceUtil

Posted by Adrian Crum <ad...@hlmksw.com>.

Adam Heath wrote:
> Adrian Crum wrote:
>> I ran my patch against your recent changes and the errors went away. I
>> guess we can consider that issue resolved.
> 
> Yeah, I did do some changes to SequenceUtil a while back.  The biggest
> functional change was to remove some variables from the inner class to
> the outer, and not try to access them all the time.
> 
>> As far as the approach I took to multi-threading the data load - here is
>> an overview:
>>
>> I was able to run certain tasks in parallel - creating entities and
>> creating primary keys, for example. I have the number of threads
>> allocated configured in a properties file. By tweaking that number I was
>> able to increase CPU utilization and reduce the creation time. Of course
>> there was a threshold where CPU utilization was raised and creation time
>> decreased - due to thread thrash.
> 
> So each entity creation itself was a separate work unit.  Once an
> entity was created, you could submit the primary key creation as well.
>  That's simple enough to implement(in theory, anyways).  This design
> is starting to go towards the Sandstorm(1) approach.
> 
> There are ways to find out how many cpus are available.  Look at
> org.ofbiz.base.concurrent.ExecutionPool.getNewOptimalExecutor(); it
> calls into ManagementFactory.

I don't think the number of CPUs is useful information. Even a single 
CPU system might benefit. From my perspective, the best approach is to 
have a human tweak the settings to get the result they want. I might be 
wrong, but I don't think you can do that automatically.

>> Creating foreign keys must be run on a single thread to prevent database
>> deadlocks.
> 
> Maybe.  If the entity and primary keys are all created for both sides
> of the foreign key, then shouldn't it be possible to submit the work
> unit to the pool?

I don't know - I didn't spend a lot of time thinking about it. I just 
separated out the create foreign keys loop and executed it in a single 
thread. It would be fun to go back and analyze the code more and come up 
with a multi-threaded solution.

>> I multi-threaded the data load by having one thread parse the XML files
>> and put the results in a queue. Another thread services the queue and
>> loads the data. I also multi-threaded the EECAs - but that has an issue
>> I need to solve.
> 
> Hmm.  You dug deeper, splitting up the points into separate calls.  I
> hadn't done that yet, and just dumped each xml file to a separate
> thread.  My approach is obviously wrong.
> 
>> My original goal was to reduce the ant clean-all + ant run-install cycle
>> time. I recently purchased a much faster development machine that
>> completes the cycle in about 2 minutes - slightly longer than the
>> multi-threaded code, so I don't have much of an incentive to develop the
>> patch further.
> 
> I've reduced the time it takes to do a run-tests loop.  The changes
> I've done to log4j.xml reduces the *extreme* debug logging produced by
> several classes.  log4j would create a new exception, so that it could
> get the correct class and line number to print to the log.  This is a
> heavy-weight operation.  This mostly showed up as slowness when
> catalina would start up, so this set of changes doesn't directly
> affect the run-install cycle.

I had to disable logging entirely in the patch. The logger would get 
swamped and throw an exception - bringing everything to a stop.

>> The whole experience was an educational one. There is a possibility the
>> techniques I developed could be used to speed up import/export of large
>> datasets. If anyone is interested in that, I am available for hire.
> 
> We have a site, where users could upload original images(6), then fill
> out a bunch of form data, then some pdfs would be generated.  I would
> submit a bunch of image resize operations(had to make 2 reduced-size
> images for each of the originals).  All of those are able to run in
> parallel.  Then, once all the images were done, the 2 pdfs would be
> submitted.  This entire pipeline itself might be run in parallel too,
> as the user could have multiple such records that needed to be updated.
> 
> 1: http://www.eecs.harvard.edu/~mdw/proj/seda/

Cool site Bro.

Re: multi-threaded EntityDataLoadContainer and SequenceUtil

Posted by Adam Heath <do...@brainfood.com>.

Adrian Crum wrote:
> I ran my patch against your recent changes and the errors went away. I
> guess we can consider that issue resolved.

Yeah, I did do some changes to SequenceUtil a while back.  The biggest
functional change was to remove some variables from the inner class to
the outer, and not try to access them all the time.

> As far as the approach I took to multi-threading the data load - here is
> an overview:
> 
> I was able to run certain tasks in parallel - creating entities and
> creating primary keys, for example. I have the number of threads
> allocated configured in a properties file. By tweaking that number I was
> able to increase CPU utilization and reduce the creation time. Of course
> there was a threshold where CPU utilization was raised and creation time
> decreased - due to thread thrash.

So each entity creation itself was a separate work unit.  Once an
entity was created, you could submit the primary key creation as well.
 That's simple enough to implement(in theory, anyways).  This design
is starting to go towards the Sandstorm(1) approach.

There are ways to find out how many cpus are available.  Look at
org.ofbiz.base.concurrent.ExecutionPool.getNewOptimalExecutor(); it
calls into ManagementFactory.

> Creating foreign keys must be run on a single thread to prevent database
> deadlocks.

Maybe.  If the entity and primary keys are all created for both sides
of the foreign key, then shouldn't it be possible to submit the work
unit to the pool?

> I multi-threaded the data load by having one thread parse the XML files
> and put the results in a queue. Another thread services the queue and
> loads the data. I also multi-threaded the EECAs - but that has an issue
> I need to solve.

Hmm.  You dug deeper, splitting up the points into separate calls.  I
hadn't done that yet, and just dumped each xml file to a separate
thread.  My approach is obviously wrong.

> My original goal was to reduce the ant clean-all + ant run-install cycle
> time. I recently purchased a much faster development machine that
> completes the cycle in about 2 minutes - slightly longer than the
> multi-threaded code, so I don't have much of an incentive to develop the
> patch further.

I've reduced the time it takes to do a run-tests loop.  The changes
I've done to log4j.xml reduces the *extreme* debug logging produced by
several classes.  log4j would create a new exception, so that it could
get the correct class and line number to print to the log.  This is a
heavy-weight operation.  This mostly showed up as slowness when
catalina would start up, so this set of changes doesn't directly
affect the run-install cycle.

> The whole experience was an educational one. There is a possibility the
> techniques I developed could be used to speed up import/export of large
> datasets. If anyone is interested in that, I am available for hire.

We have a site, where users could upload original images(6), then fill
out a bunch of form data, then some pdfs would be generated.  I would
submit a bunch of image resize operations(had to make 2 reduced-size
images for each of the originals).  All of those are able to run in
parallel.  Then, once all the images were done, the 2 pdfs would be
submitted.  This entire pipeline itself might be run in parallel too,
as the user could have multiple such records that needed to be updated.

1: http://www.eecs.harvard.edu/~mdw/proj/seda/

Re: multi-threaded EntityDataLoadContainer and SequenceUtil

Posted by Adrian Crum <ad...@hlmksw.com>.

Adam Heath wrote:
> As Adrian and I previously discussed, he said he had discovered some
> possible problems with SequenceUtil in multi-threaded situations.  He
> discovered this when he made EntityDataLoadContainer load each xml
> file in a thread.
> 
> I've recently done the same on my local copy, but I don't see any
> problems.  What I did see, however, was that just throwing every xml
> data file into a thread(actually, a 4-count thread pool), had errors
> loading some files, because each file has an implicit dependency on
> some possible other set of files, and those files hadn't been loaded yet.
> 
> So, before doing a thread load, the files would have to have an
> explicit dependency listed, so that correct ordering could be done.
> This is not something that would make ofbiz easier to use.
> 
> Trying to figure out the implicit dependencies automatically by
> comparing each entity line isn't worthwhile, as that would be
> reimplementing a database, and what would be the point.
> 
> So, Adrian, if you have any more pointers as to what your original
> change did, I'd appreciate any insight you might have.  Otherwise, I
> will say that we can't load data in parallel.
> 
> Additionally, I suspsected that SequenceUtil actually *didn't* have
> any problems.  I wrote a test case quite a while back that did
> multi-threaded testing of SequenceUtil, and it never had any problems.
>  It used 100 threads, with each thread trying to allocate 1000
> sequence values.

I ran my patch against your recent changes and the errors went away. I 
guess we can consider that issue resolved.

As far as the approach I took to multi-threading the data load - here is 
an overview:

I was able to run certain tasks in parallel - creating entities and 
creating primary keys, for example. I have the number of threads 
allocated configured in a properties file. By tweaking that number I was 
able to increase CPU utilization and reduce the creation time. Of course 
there was a threshold where CPU utilization was raised and creation time 
decreased - due to thread thrash.

Creating foreign keys must be run on a single thread to prevent database 
deadlocks.

I multi-threaded the data load by having one thread parse the XML files 
and put the results in a queue. Another thread services the queue and 
loads the data. I also multi-threaded the EECAs - but that has an issue 
I need to solve.

My original goal was to reduce the ant clean-all + ant run-install cycle 
time. I recently purchased a much faster development machine that 
completes the cycle in about 2 minutes - slightly longer than the 
multi-threaded code, so I don't have much of an incentive to develop the 
patch further.

The whole experience was an educational one. There is a possibility the 
techniques I developed could be used to speed up import/export of large 
datasets. If anyone is interested in that, I am available for hire.

-Adrian