You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@commons.apache.org by Phil Steitz <ph...@gmail.com> on 2015/04/15 17:33:55 UTC

[math] threading redux

James Carman and I had a brief conversation following my Apachecon
talk, where I mentioned the challenge we have around deciding what
to do about supporting multiple threads / processes.  He has some
good ideas.  This is really just a poke to get him to post those
ideas :)

The final presented slides are here:
http://s.apache.org/arB

Thanks for the feedback as I was preparing!

Phil


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [math] threading redux

Posted by Phil Steitz <ph...@gmail.com>.

On 4/16/15 5:33 AM, Hank Grabowski wrote:
> I've seen some ApacheCon North America videos on YouTube (9) but not this
> one.  Will a video of it be posted at some point or were those only for the
> keynote type presentations?

Unfortunately only the keynotes.

Phil
>
> On Wed, Apr 15, 2015 at 11:33 AM, Phil Steitz <ph...@gmail.com> wrote:
>
>> James Carman and I had a brief conversation following my Apachecon
>> talk, where I mentioned the challenge we have around deciding what
>> to do about supporting multiple threads / processes.  He has some
>> good ideas.  This is really just a poke to get him to post those
>> ideas :)
>>
>> The final presented slides are here:
>> http://s.apache.org/arB
>>
>> Thanks for the feedback as I was preparing!
>>
>> Phil
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [math] threading redux

Posted by Hank Grabowski <ha...@applieddefense.com>.

I've seen some ApacheCon North America videos on YouTube (9) but not this
one.  Will a video of it be posted at some point or were those only for the
keynote type presentations?

On Wed, Apr 15, 2015 at 11:33 AM, Phil Steitz <ph...@gmail.com> wrote:

> James Carman and I had a brief conversation following my Apachecon
> talk, where I mentioned the challenge we have around deciding what
> to do about supporting multiple threads / processes.  He has some
> good ideas.  This is really just a poke to get him to post those
> ideas :)
>
> The final presented slides are here:
> http://s.apache.org/arB
>
> Thanks for the feedback as I was preparing!
>
> Phil
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

Re: [math] threading redux

Posted by Ole Ersoy <ol...@gmail.com>.

This is a pretty good read as well:
http://stackoverflow.com/questions/21163108/custom-thread-pool-in-java-8-parallel-stream

A concern in earlier discussions focused on controlling the number of threads that the job consumes.  Theres one example of using a custom thread pool for that.  Users could pass the number of threads as a method argument or constructor parameter (Or just default to Runtime.getRuntime().availableProcessors()).  At the end there's also an example of using a CompletableFuture, which seems inline with the Future-like API that James is suggesting.

Cheers,
- Ole

BTW - Personally I find working with concurrency in Java 8 simple and refreshing.  You do have to make sure that you use thread safe classes inside loops that are parallel, but the rest is straight forward (Knock on wood :) ).

On 04/17/2015 07:20 PM, Gary Gregory wrote:
> I thought I'd share this read with you guys:
> http://coopsoft.com/ar/CalamityArticle.html
>
> I'm not sure how closely these problems relate with what [math] is trying
> to do, but it's a interesting read.
>
> Gary
>
> On Fri, Apr 17, 2015 at 9:01 AM, Gilles <gi...@harfang.homelinux.org>
> wrote:
>
>> On Fri, 17 Apr 2015 08:35:42 -0700, Phil Steitz wrote:
>>
>>> On 4/17/15 3:14 AM, Gilles wrote:
>>>
>>>> Hello.
>>>>
>>>> On Thu, 16 Apr 2015 17:06:21 -0500, James Carman wrote:
>>>>
>>>>> Consider me poked!
>>>>>
>>>>> So, the Java answer to "how do I run things in multiple threads"
>>>>> is to
>>>>> use an Executor (java.util).  This doesn't necessarily mean that you
>>>>> *have* to use a separate thread (the implementation could execute
>>>>> inline).  However, in order to accommodate the separate thread case,
>>>>> you would need to code to a Future-like API.  Now, I'm not saying to
>>>>> use Executors directly, but I'd provide some abstraction layer above
>>>>> them or in lieu of them, something like:
>>>>>
>>>>> public interface ExecutorThingy {
>>>>>    Future<T> execute(Function<T> fn);
>>>>> }
>>>>>
>>>>> One could imagine implementing different ExecutorThingy
>>>>> implementations which allow you to parallelize things in different
>>>>> ways (simple threads, JMS, Akka, etc, etc.)
>>>>>
>>>> I did not understand what is being suggested: parallelization of a
>>>> single algorithm or concurrent calls to multiple instances of an
>>>> algorithm?
>>>>
>>> Really both.  It's probably best to look at some concrete examples.
>>>
>> Certainly...
>>
>>   The two I mentioned in my apachecon talk are:
>>> 1.  Threads managed by some external process / application gathering
>>> statistics to be aggregated.
>>>
>>> 2.  Allowing multiple threads to concurrently execute GA
>>> transformations within the GeneticAlgorithm "evolve" method.
>>>
>> I could not view the presentation from the link previously mentioned
>> (it did not work with my browser...).
>> Can I download the PDF file from somewhere?
>>
>>   It would be instructive to think about how to handle both of these
>>> use cases using something like what James is suggesting.  What is
>>> nice about his idea is that it could give us a way to let users /
>>> systems decide whether they want to have [math] algorithms spawn
>>> threads to execute concurrently or to allow an external execution
>>> framework to handle task distribution across threads.
>>>
>> Some (all?) cases of "external" parallelism are trivial for the CM
>> developers: the user must chop his data, pass the chunks as arguments
>> to the CM methods, then collect and reassemble the results, all by
>> himself.
>> IIUC the scenario, this cannot be deemed a "feature".
>>
>>   Since 2. above is a good example of "internal" parallelism and it
>>> also has data sharing / transfer challenges, maybe its best to start
>>> with that one.
>>>
>> That's the scenario where usage is simple and performance can match
>> the user's machine capability when running CM algorithms that are
>> inherently parallel.
>>
>> There is an example in CM: see
>>    testTravellerSalesmanSquareTourParallelSolver()
>> in
>>    org.apache.commons.math4.ml.neuralnet.sofm.KohonenTrainingTaskTest
>>
>>   I have just started thinking about this and would
>>> love to get better ideas than my own hacking about how to do it
>>>
>>> a) Using Spark with RDD's to maintain population state data
>>> b) Hadoop with HDFS (or something else?)
>>>
>> I have zero experience with this but I'm interested to know more. :-)
>>
>> Regards,
>> Gilles
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [math] threading redux

Posted by Phil Steitz <ph...@gmail.com>.

On 4/17/15 5:20 PM, Gary Gregory wrote:
> I thought I'd share this read with you guys:
> http://coopsoft.com/ar/CalamityArticle.html
>
> I'm not sure how closely these problems relate with what [math] is trying
> to do, but it's a interesting read.

Thanks!  Kind of supports the idea that somehow allowing the
execution framework to be pluggable would be good.

Phil
>
> Gary
>
> On Fri, Apr 17, 2015 at 9:01 AM, Gilles <gi...@harfang.homelinux.org>
> wrote:
>
>> On Fri, 17 Apr 2015 08:35:42 -0700, Phil Steitz wrote:
>>
>>> On 4/17/15 3:14 AM, Gilles wrote:
>>>
>>>> Hello.
>>>>
>>>> On Thu, 16 Apr 2015 17:06:21 -0500, James Carman wrote:
>>>>
>>>>> Consider me poked!
>>>>>
>>>>> So, the Java answer to "how do I run things in multiple threads"
>>>>> is to
>>>>> use an Executor (java.util).  This doesn't necessarily mean that you
>>>>> *have* to use a separate thread (the implementation could execute
>>>>> inline).  However, in order to accommodate the separate thread case,
>>>>> you would need to code to a Future-like API.  Now, I'm not saying to
>>>>> use Executors directly, but I'd provide some abstraction layer above
>>>>> them or in lieu of them, something like:
>>>>>
>>>>> public interface ExecutorThingy {
>>>>>   Future<T> execute(Function<T> fn);
>>>>> }
>>>>>
>>>>> One could imagine implementing different ExecutorThingy
>>>>> implementations which allow you to parallelize things in different
>>>>> ways (simple threads, JMS, Akka, etc, etc.)
>>>>>
>>>> I did not understand what is being suggested: parallelization of a
>>>> single algorithm or concurrent calls to multiple instances of an
>>>> algorithm?
>>>>
>>> Really both.  It's probably best to look at some concrete examples.
>>>
>> Certainly...
>>
>>  The two I mentioned in my apachecon talk are:
>>> 1.  Threads managed by some external process / application gathering
>>> statistics to be aggregated.
>>>
>>> 2.  Allowing multiple threads to concurrently execute GA
>>> transformations within the GeneticAlgorithm "evolve" method.
>>>
>> I could not view the presentation from the link previously mentioned
>> (it did not work with my browser...).
>> Can I download the PDF file from somewhere?
>>
>>  It would be instructive to think about how to handle both of these
>>> use cases using something like what James is suggesting.  What is
>>> nice about his idea is that it could give us a way to let users /
>>> systems decide whether they want to have [math] algorithms spawn
>>> threads to execute concurrently or to allow an external execution
>>> framework to handle task distribution across threads.
>>>
>> Some (all?) cases of "external" parallelism are trivial for the CM
>> developers: the user must chop his data, pass the chunks as arguments
>> to the CM methods, then collect and reassemble the results, all by
>> himself.
>> IIUC the scenario, this cannot be deemed a "feature".
>>
>>  Since 2. above is a good example of "internal" parallelism and it
>>> also has data sharing / transfer challenges, maybe its best to start
>>> with that one.
>>>
>> That's the scenario where usage is simple and performance can match
>> the user's machine capability when running CM algorithms that are
>> inherently parallel.
>>
>> There is an example in CM: see
>>   testTravellerSalesmanSquareTourParallelSolver()
>> in
>>   org.apache.commons.math4.ml.neuralnet.sofm.KohonenTrainingTaskTest
>>
>>  I have just started thinking about this and would
>>> love to get better ideas than my own hacking about how to do it
>>>
>>> a) Using Spark with RDD's to maintain population state data
>>> b) Hadoop with HDFS (or something else?)
>>>
>> I have zero experience with this but I'm interested to know more. :-)
>>
>> Regards,
>> Gilles
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [math] threading redux

Posted by Gary Gregory <ga...@gmail.com>.

I thought I'd share this read with you guys:
http://coopsoft.com/ar/CalamityArticle.html

I'm not sure how closely these problems relate with what [math] is trying
to do, but it's a interesting read.

Gary

On Fri, Apr 17, 2015 at 9:01 AM, Gilles <gi...@harfang.homelinux.org>
wrote:

> On Fri, 17 Apr 2015 08:35:42 -0700, Phil Steitz wrote:
>
>> On 4/17/15 3:14 AM, Gilles wrote:
>>
>>> Hello.
>>>
>>> On Thu, 16 Apr 2015 17:06:21 -0500, James Carman wrote:
>>>
>>>> Consider me poked!
>>>>
>>>> So, the Java answer to "how do I run things in multiple threads"
>>>> is to
>>>> use an Executor (java.util).  This doesn't necessarily mean that you
>>>> *have* to use a separate thread (the implementation could execute
>>>> inline).  However, in order to accommodate the separate thread case,
>>>> you would need to code to a Future-like API.  Now, I'm not saying to
>>>> use Executors directly, but I'd provide some abstraction layer above
>>>> them or in lieu of them, something like:
>>>>
>>>> public interface ExecutorThingy {
>>>>   Future<T> execute(Function<T> fn);
>>>> }
>>>>
>>>> One could imagine implementing different ExecutorThingy
>>>> implementations which allow you to parallelize things in different
>>>> ways (simple threads, JMS, Akka, etc, etc.)
>>>>
>>>
>>> I did not understand what is being suggested: parallelization of a
>>> single algorithm or concurrent calls to multiple instances of an
>>> algorithm?
>>>
>>
>> Really both.  It's probably best to look at some concrete examples.
>>
>
> Certainly...
>
>  The two I mentioned in my apachecon talk are:
>>
>> 1.  Threads managed by some external process / application gathering
>> statistics to be aggregated.
>>
>> 2.  Allowing multiple threads to concurrently execute GA
>> transformations within the GeneticAlgorithm "evolve" method.
>>
>
> I could not view the presentation from the link previously mentioned
> (it did not work with my browser...).
> Can I download the PDF file from somewhere?
>
>  It would be instructive to think about how to handle both of these
>> use cases using something like what James is suggesting.  What is
>> nice about his idea is that it could give us a way to let users /
>> systems decide whether they want to have [math] algorithms spawn
>> threads to execute concurrently or to allow an external execution
>> framework to handle task distribution across threads.
>>
>
> Some (all?) cases of "external" parallelism are trivial for the CM
> developers: the user must chop his data, pass the chunks as arguments
> to the CM methods, then collect and reassemble the results, all by
> himself.
> IIUC the scenario, this cannot be deemed a "feature".
>
>  Since 2. above is a good example of "internal" parallelism and it
>> also has data sharing / transfer challenges, maybe its best to start
>> with that one.
>>
>
> That's the scenario where usage is simple and performance can match
> the user's machine capability when running CM algorithms that are
> inherently parallel.
>
> There is an example in CM: see
>   testTravellerSalesmanSquareTourParallelSolver()
> in
>   org.apache.commons.math4.ml.neuralnet.sofm.KohonenTrainingTaskTest
>
>  I have just started thinking about this and would
>> love to get better ideas than my own hacking about how to do it
>>
>> a) Using Spark with RDD's to maintain population state data
>> b) Hadoop with HDFS (or something else?)
>>
>
> I have zero experience with this but I'm interested to know more. :-)
>
> Regards,
> Gilles
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


-- 
E-Mail: garydgregory@gmail.com | ggregory@apache.org
Java Persistence with Hibernate, Second Edition
<http://www.manning.com/bauer3/>
JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
Spring Batch in Action <http://www.manning.com/templier/>
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory

Re: [math] threading redux

Posted by Phil Steitz <ph...@gmail.com>.

On 4/17/15 9:01 AM, Gilles wrote:
> On Fri, 17 Apr 2015 08:35:42 -0700, Phil Steitz wrote:
>> On 4/17/15 3:14 AM, Gilles wrote:
>>> Hello.
>>>
>>> On Thu, 16 Apr 2015 17:06:21 -0500, James Carman wrote:
>>>> Consider me poked!
>>>>
>>>> So, the Java answer to "how do I run things in multiple threads"
>>>> is to
>>>> use an Executor (java.util).  This doesn't necessarily mean
>>>> that you
>>>> *have* to use a separate thread (the implementation could execute
>>>> inline).  However, in order to accommodate the separate thread
>>>> case,
>>>> you would need to code to a Future-like API.  Now, I'm not
>>>> saying to
>>>> use Executors directly, but I'd provide some abstraction layer
>>>> above
>>>> them or in lieu of them, something like:
>>>>
>>>> public interface ExecutorThingy {
>>>>   Future<T> execute(Function<T> fn);
>>>> }
>>>>
>>>> One could imagine implementing different ExecutorThingy
>>>> implementations which allow you to parallelize things in different
>>>> ways (simple threads, JMS, Akka, etc, etc.)
>>>
>>> I did not understand what is being suggested: parallelization of a
>>> single algorithm or concurrent calls to multiple instances of an
>>> algorithm?
>>
>> Really both.  It's probably best to look at some concrete examples.
>
> Certainly...
>
>> The two I mentioned in my apachecon talk are:
>>
>> 1.  Threads managed by some external process / application gathering
>> statistics to be aggregated.
>>
>> 2.  Allowing multiple threads to concurrently execute GA
>> transformations within the GeneticAlgorithm "evolve" method.
>
> I could not view the presentation from the link previously mentioned
> (it did not work with my browser...).
> Can I download the PDF file from somewhere?

Sorry.  Try this (unshortened) link

http://www.slideshare.net/psteitz/commons-mathapacheconna2015
>
>> It would be instructive to think about how to handle both of these
>> use cases using something like what James is suggesting.  What is
>> nice about his idea is that it could give us a way to let users /
>> systems decide whether they want to have [math] algorithms spawn
>> threads to execute concurrently or to allow an external execution
>> framework to handle task distribution across threads.
>
> Some (all?) cases of "external" parallelism are trivial for the CM
> developers: the user must chop his data, pass the chunks as arguments
> to the CM methods, then collect and reassemble the results, all by
> himself.
> IIUC the scenario, this cannot be deemed a "feature".

The idea is to make it easier for users to do this "chopping" and
"reassembling" and / or to let these operations be managed by
external frameworks.

The AggregatedStatistics class is a simple example of making it
easier for users to do directly.
>
>> Since 2. above is a good example of "internal" parallelism and it
>> also has data sharing / transfer challenges, maybe its best to start
>> with that one.
>
> That's the scenario where usage is simple and performance can match
> the user's machine capability when running CM algorithms that are
> inherently parallel.
>
> There is an example in CM: see
>   testTravellerSalesmanSquareTourParallelSolver()
> in
>   org.apache.commons.math4.ml.neuralnet.sofm.KohonenTrainingTaskTest

The challenge is how to make this kind of thing possible "simply"
without just pegging the local machine's cores in an unmanaged
way.   I think James has the kernel of an idea that would allow us
to have it both ways - "greedy / local" or "managed / remotable."  
This is all hand-waving at this point; but the idea that we could
find a way to make our parallelizable algorithms executable via
locally spawned threads or external task managers is appealing.
>
>> I have just started thinking about this and would
>> love to get better ideas than my own hacking about how to do it
>>
>> a) Using Spark with RDD's to maintain population state data
>> b) Hadoop with HDFS (or something else?)
>
> I have zero experience with this but I'm interested to know more. :-)

I am also just learning Spark.  It will likely take me a while to
get something meaningful; but I will start playing with this.  Other
ideas / patches welcome!

Phil
>
> Regards,
> Gilles
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [math] threading redux

Posted by Gilles <gi...@harfang.homelinux.org>.

On Fri, 17 Apr 2015 08:35:42 -0700, Phil Steitz wrote:
> On 4/17/15 3:14 AM, Gilles wrote:
>> Hello.
>>
>> On Thu, 16 Apr 2015 17:06:21 -0500, James Carman wrote:
>>> Consider me poked!
>>>
>>> So, the Java answer to "how do I run things in multiple threads"
>>> is to
>>> use an Executor (java.util).  This doesn't necessarily mean that 
>>> you
>>> *have* to use a separate thread (the implementation could execute
>>> inline).  However, in order to accommodate the separate thread 
>>> case,
>>> you would need to code to a Future-like API.  Now, I'm not saying 
>>> to
>>> use Executors directly, but I'd provide some abstraction layer 
>>> above
>>> them or in lieu of them, something like:
>>>
>>> public interface ExecutorThingy {
>>>   Future<T> execute(Function<T> fn);
>>> }
>>>
>>> One could imagine implementing different ExecutorThingy
>>> implementations which allow you to parallelize things in different
>>> ways (simple threads, JMS, Akka, etc, etc.)
>>
>> I did not understand what is being suggested: parallelization of a
>> single algorithm or concurrent calls to multiple instances of an
>> algorithm?
>
> Really both.  It's probably best to look at some concrete examples.

Certainly...

> The two I mentioned in my apachecon talk are:
>
> 1.  Threads managed by some external process / application gathering
> statistics to be aggregated.
>
> 2.  Allowing multiple threads to concurrently execute GA
> transformations within the GeneticAlgorithm "evolve" method.

I could not view the presentation from the link previously mentioned
(it did not work with my browser...).
Can I download the PDF file from somewhere?

> It would be instructive to think about how to handle both of these
> use cases using something like what James is suggesting.  What is
> nice about his idea is that it could give us a way to let users /
> systems decide whether they want to have [math] algorithms spawn
> threads to execute concurrently or to allow an external execution
> framework to handle task distribution across threads.

Some (all?) cases of "external" parallelism are trivial for the CM
developers: the user must chop his data, pass the chunks as arguments
to the CM methods, then collect and reassemble the results, all by
himself.
IIUC the scenario, this cannot be deemed a "feature".

> Since 2. above is a good example of "internal" parallelism and it
> also has data sharing / transfer challenges, maybe its best to start
> with that one.

That's the scenario where usage is simple and performance can match
the user's machine capability when running CM algorithms that are
inherently parallel.

There is an example in CM: see
   testTravellerSalesmanSquareTourParallelSolver()
in
   org.apache.commons.math4.ml.neuralnet.sofm.KohonenTrainingTaskTest

> I have just started thinking about this and would
> love to get better ideas than my own hacking about how to do it
>
> a) Using Spark with RDD's to maintain population state data
> b) Hadoop with HDFS (or something else?)

I have zero experience with this but I'm interested to know more. :-)

Regards,
Gilles


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [math] threading redux

Posted by Ole Ersoy <ol...@gmail.com>.

> That is one way to achieve parallelism.  The Executor is one way to
> manage concurrently executing threads in a single process.  There
> are other ways to do this.  My challenge is to find a way to make it
> possible for users to plug in alternatives.

Some of the methods on CompletableFuture allow the provision of an Executor. For example:
https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/CompletableFuture.html#runAsync-java.lang.Runnable-java.util.concurrent.Executor-

Does that fit the "Plug in alternatives" requirement?

Cheers,
- Ole



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [math] threading redux

Posted by Phil Steitz <ph...@gmail.com>.

On 4/22/15 6:25 PM, Gilles wrote:
> On Wed, 22 Apr 2015 11:33:30 -0500, Ole Ersoy wrote:
>> On Mon, Apr 20, 2015 at 6:05 PM Phil Steitz
>> <ph...@gmail.com> wrote:
>>>>
>>>> There are lots of ways to allow distributed processes to share
>>>> common data.  Spark has a very nice construct called a Resilient
>>>> Distributed Dataset (RDD) designed for exactly this purpose.
>> Are there any examples of a class in commons math where threads have
>> to share the data structure?
>
> It's the case for the only example that was mentioned in this thread
> with sufficient level of details so as to permit concrete statements:
> it's the SOFM implementation (in package
> "o.a.c.m.ml.neuralnet.sofm").
> The shared structure is the "Network" instance.
>
> I've only read the "Spark" examples.  I assume that an alternate
> version of "KohonenTrainingTask" would follow the "logistic
> regression" example (IIUC).
> But isn't that going to create a dependency towards Spark???

The challenge - possibly hopeless, but I am not giving up yet - is
to find a way to make it easy for someone who wants to use Spark to
do this (or Hadoop, or...).  We don't want to create dependencies on
these frameworks - just make it easy for users to distribute
computation tasks to them.  It may be that there is nothing to add
to what we already have in sofm.  Let's see what it takes to
actually get it working.

Phil
>
> Regards,
> Gilles
>
>> Cheers,
>> - Ole
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [math] threading redux

Posted by Gilles <gi...@harfang.homelinux.org>.

On Thu, 23 Apr 2015 09:15:47 -0500, Ole Ersoy wrote:
>> It's the case for the only example that was mentioned in this thread
>> with sufficient level of details so as to permit concrete 
>> statements:
>> it's the SOFM implementation (in package 
>> "o.a.c.m.ml.neuralnet.sofm").
>> The shared structure is the "Network" instance.
> Can the KohonenTrainingTask be decomposed further so tasks gets the
> data it needs (Not the whole Network instance), performs the work, 
> and
> returns an instance that can then be applied to the Network?

The training is:

  For each element in the training set {
     Find the "best" neuron, i.e. scan the whole network
     Update the "best" neuron's attributes.
  }

Best regards,
Gilles


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [math] threading redux

Posted by Ole Ersoy <ol...@gmail.com>.

> It's the case for the only example that was mentioned in this thread
> with sufficient level of details so as to permit concrete statements:
> it's the SOFM implementation (in package "o.a.c.m.ml.neuralnet.sofm").
> The shared structure is the "Network" instance.
Can the KohonenTrainingTask be decomposed further so tasks gets the data it needs (Not the whole Network instance), performs the work, and returns an instance that can then be applied to the Network?

Cheers,
- Ole

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [math] threading redux

Posted by Gilles <gi...@harfang.homelinux.org>.

On Wed, 22 Apr 2015 11:33:30 -0500, Ole Ersoy wrote:
> On Mon, Apr 20, 2015 at 6:05 PM Phil Steitz <ph...@gmail.com> 
> wrote:
>>>
>>> There are lots of ways to allow distributed processes to share
>>> common data.  Spark has a very nice construct called a Resilient
>>> Distributed Dataset (RDD) designed for exactly this purpose.
> Are there any examples of a class in commons math where threads have
> to share the data structure?

It's the case for the only example that was mentioned in this thread
with sufficient level of details so as to permit concrete statements:
it's the SOFM implementation (in package "o.a.c.m.ml.neuralnet.sofm").
The shared structure is the "Network" instance.

I've only read the "Spark" examples.  I assume that an alternate
version of "KohonenTrainingTask" would follow the "logistic
regression" example (IIUC).
But isn't that going to create a dependency towards Spark???

Regards,
Gilles

> Cheers,
> - Ole

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [math] threading redux

Posted by Ole Ersoy <ol...@gmail.com>.

On Mon, Apr 20, 2015 at 6:05 PM Phil Steitz <ph...@gmail.com> wrote:
>>
>> There are lots of ways to allow distributed processes to share
>> common data.  Spark has a very nice construct called a Resilient
>> Distributed Dataset (RDD) designed for exactly this purpose.
Are there any examples of a class in commons math where threads have to share the data structure?

Cheers,
- Ole


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [math] threading redux

Posted by James Carman <ja...@carmanconsulting.com>.

On Mon, Apr 20, 2015 at 6:05 PM Phil Steitz <ph...@gmail.com> wrote:
>
>
> There are lots of ways to allow distributed processes to share
> common data.  Spark has a very nice construct called a Resilient
> Distributed Dataset (RDD) designed for exactly this purpose.
>

To take the abstraction layer a step further (and maybe help improve
performance), you may just introduce a ComputationListener interface:

public interface ComputationListener<T> {
  void onSuccess(String id, T data);
  void onFailure(String id, Exception cause);
}

The "id" would basically be some sort of correlation id that you could use
to tie the response with the request.  You can use whatever you want here
obviously, but you get the idea.

Re: [math] threading redux

Posted by Phil Steitz <ph...@gmail.com>.

On 4/19/15 6:08 AM, Gilles wrote:
> Hello.
>
> On Sat, 18 Apr 2015 22:25:20 -0400, James Carman wrote:
>> I think I got sidetracked when typing that email.  I was trying
>> to say that
>> we need an abstraction layer above raw threads in order to allow for
>> different types of parallelism.   The Future abstraction is there
>> in order
>> to support remote execution where side effects aren't good enough.
>
> I don't know what
>  "remote execution where side effects aren't good enough"
> means.
>
> I'll describe my example of "prototype" (see quoted message below[1])
> and what *I* mean when I suggest that (some of) the CM code should
> allow
> to take advantage of multi-threading.
>
> I committed the first set of classes in "o.a.c.m.ml.neuralnet".[2]
> Here is the idea of "parallelism" that drove the design of those
> classes: The training of an artificial neural network (ANN) is
> performed
> by almost[3] independent updates of each of ANN's cells.  You
> _cannot_[4]
> however chop the network into independent parts to be sent for remote
> processing: each update must be visible ASAP by all the training
> tasks.[5]

There are lots of ways to allow distributed processes to share
common data.  Spark has a very nice construct called a Resilient
Distributed Dataset (RDD) designed for exactly this purpose.
>
> "Future" instances do not appear in the "main" code, but the idea
> was,
> indeed, to be able to use that JDK abstraction: see the unit test[6]
>   testTravellerSalesmanSquareTourParallelSolver()
> defined in class
>   org.apache.commons.math4.ml.neuralnet.sofm.KohonenTrainingTaskTest
> in the "test" part of the repository.

This is a good concrete example. The question is, is there a way
that we could set up, say KohonenTrainingTask to not just directly
implement Runnable and enable it to be executed by something other
than an in-process, thread-spawning Executor.  You're right that
however we did set it up, we would have to allow each task to access
the shared net.
>
>> As for a concrete example, you can try Phil's idea of the genetic
>> algorithm
>> stuff I suppose.
>
> I hope that with the above I made myself clear that I was not asking
> for a pointer to code that could be parallelized[7], but rather that
> people make it explicit what _they_ mean by parallelism[8].  What
> I mean
> is multithread safe code that can take advantage of the multiple core
> machines through the readily available classes in the JDK: namely the
> "Executor" framework which you also mentioned.

That is one way to achieve parallelism.  The Executor is one way to
manage concurrently executing threads in a single process.  There
are other ways to do this.  My challenge is to find a way to make it
possible for users to plug in alternatives.

> Of course, I do not preclude other approaches (I don't know them, as
> mentioned previously) that may (or may not) be more appropriate
> for the
> example I gave or to other algorithms; but I truly believe that this
> discussion should be more precise, unless we deepen the
> misunderstanding
> of what we think we are talking about.

Agreed. Above example is also a good one to look at.
>
>
> Regards,
> Gilles
>
> [1] As a side note: Shall we agree that top-posting is bad? ;-)

Yes!

> [2] With the purpose to implement a version of a specific
> algorithm (SOFM) so
>     that the data structures might not be generally useful for any
> kind of
>     artificial neural network.
> [3] The update should of course be thread-safe: two parallel tasks
> might try
>     to update the same cell at the same time.

Right, this is partly a function of what data structure and
protocols you use to protect the shared data.

> [4] At least, it's instinctively obvious that for a SOFM network
> of "relatively
>     small", you'd _loose_ performance through I/O.

Yes, just like it does not make sense to do spreadsheet math on
hadoop clusters.  The (perhaps impossible) idea is to set things up
so that thread location and management is pluggable.

Phil

> [5] In later phases of the training, "regions" will have formed in
> the ANN, so
>     that at that point, it might be possible to continue the
> updates of those
>     regions on different computation nodes (with the necessary
> synchronization
>     of the region's boundaries).
> [6] It's more of an example usage that could probably go to the
> "user guide".
> [7] The GA perfectly lend itself to the same kind of "readiness to
> parallelism"
>     code which I implemented for the SOFM.
> [8] As applied concretely to a specific algorithm in CM.
>
>> On Saturday, April 18, 2015, Gilles
>> <gi...@harfang.homelinux.org> wrote:
>>
>>> On Fri, 17 Apr 2015 16:53:56 -0500, James Carman wrote:
>>>
>>>> Do you have any pointers to code for this ForkJoin mechanism?  I'm
>>>> curious to see it.
>>>>
>>>> The key thing you will need in order to support parallelization
>>>> in a
>>>> generic way
>>>>
>>>
>>> What do you mean by "generic way"?
>>>
>>> I'm afraid that we may be trying to compare apples and oranges;
>>> each of us probably has in mind a "prototype" algorithm and an idea
>>> of how to implement it to make it run in parallel.
>>>
>>> I think that it would focus the discussion if we could
>>> 1. tell what the "prototype" is,
>>> 2. show a sort of pseudo-code of the difference between a
>>> sequential
>>>    and a parallel run of this "prototype" (i.e. what is the
>>> data, how
>>>    the (sub)tasks operate on them).
>>>
>>> Regards,
>>> Gilles
>>>
>>>  is to not tie it directly to threads, but use some
>>>> abstraction layer above threads, since that may not be the
>>>> "worker"
>>>> method you're using at the time.
>>>>
>>>> On Fri, Apr 17, 2015 at 2:57 PM, Thomas Neidhart
>>>> <th...@gmail.com> wrote:
>>>>
>>>>> On 04/17/2015 05:35 PM, Phil Steitz wrote:
>>>>>
>>>>>> On 4/17/15 3:14 AM, Gilles wrote:
>>>>>>
>>>>>>> Hello.
>>>>>>>
>>>>>>> On Thu, 16 Apr 2015 17:06:21 -0500, James Carman wrote:
>>>>>>>
>>>>>>>> Consider me poked!
>>>>>>>>
>>>>>>>> So, the Java answer to "how do I run things in multiple
>>>>>>>> threads"
>>>>>>>> is to
>>>>>>>> use an Executor (java.util).  This doesn't necessarily mean
>>>>>>>> that you
>>>>>>>> *have* to use a separate thread (the implementation could
>>>>>>>> execute
>>>>>>>> inline).  However, in order to accommodate the separate
>>>>>>>> thread case,
>>>>>>>> you would need to code to a Future-like API.  Now, I'm not
>>>>>>>> saying to
>>>>>>>> use Executors directly, but I'd provide some abstraction
>>>>>>>> layer above
>>>>>>>> them or in lieu of them, something like:
>>>>>>>>
>>>>>>>> public interface ExecutorThingy {
>>>>>>>>   Future<T> execute(Function<T> fn);
>>>>>>>> }
>>>>>>>>
>>>>>>>> One could imagine implementing different ExecutorThingy
>>>>>>>> implementations which allow you to parallelize things in
>>>>>>>> different
>>>>>>>> ways (simple threads, JMS, Akka, etc, etc.)
>>>>>>>>
>>>>>>>
>>>>>>> I did not understand what is being suggested:
>>>>>>> parallelization of a
>>>>>>> single algorithm or concurrent calls to multiple instances
>>>>>>> of an
>>>>>>> algorithm?
>>>>>>>
>>>>>>
>>>>>> Really both.  It's probably best to look at some concrete
>>>>>> examples.
>>>>>> The two I mentioned in my apachecon talk are:
>>>>>>
>>>>>> 1.  Threads managed by some external process / application
>>>>>> gathering
>>>>>> statistics to be aggregated.
>>>>>>
>>>>>> 2.  Allowing multiple threads to concurrently execute GA
>>>>>> transformations within the GeneticAlgorithm "evolve" method.
>>>>>>
>>>>>> It would be instructive to think about how to handle both of
>>>>>> these
>>>>>> use cases using something like what James is suggesting. 
>>>>>> What is
>>>>>> nice about his idea is that it could give us a way to let
>>>>>> users /
>>>>>> systems decide whether they want to have [math] algorithms spawn
>>>>>> threads to execute concurrently or to allow an external
>>>>>> execution
>>>>>> framework to handle task distribution across threads.
>>>>>>
>>>>>
>>>>> I since a more viable option is to take advantage of the ForkJoin
>>>>> mechanism that we can use now in math 4.
>>>>>
>>>>> For example, the GeneticAlgorithm could be quite easily
>>>>> changed to use a
>>>>> ForkJoinTask to perform each evolution, I will try to come up
>>>>> with an
>>>>> example soon as I plan to work on the genetics package anyway.
>>>>>
>>>>> The idea outlined above sounds nice but it is very unclear how an
>>>>> algorithm or function would perform its parallelization in
>>>>> such a way,
>>>>> and whether it would still be efficient.
>>>>>
>>>>> Thomas
>>>>>
>>>>>  Since 2. above is a good example of "internal" parallelism
>>>>> and it
>>>>>> also has data sharing / transfer challenges, maybe its best
>>>>>> to start
>>>>>> with that one.  I have just started thinking about this and
>>>>>> would
>>>>>> love to get better ideas than my own hacking about how to do it
>>>>>>
>>>>>> a) Using Spark with RDD's to maintain population state data
>>>>>> b) Hadoop with HDFS (or something else?)
>>>>>>
>>>>>> Phil
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Gilles
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [math] threading redux

Posted by Gilles <gi...@harfang.homelinux.org>.

Hello.

On Sat, 18 Apr 2015 22:25:20 -0400, James Carman wrote:
> I think I got sidetracked when typing that email.  I was trying to 
> say that
> we need an abstraction layer above raw threads in order to allow for
> different types of parallelism.   The Future abstraction is there in 
> order
> to support remote execution where side effects aren't good enough.

I don't know what
  "remote execution where side effects aren't good enough"
means.

I'll describe my example of "prototype" (see quoted message below[1])
and what *I* mean when I suggest that (some of) the CM code should 
allow
to take advantage of multi-threading.

I committed the first set of classes in "o.a.c.m.ml.neuralnet".[2]
Here is the idea of "parallelism" that drove the design of those
classes: The training of an artificial neural network (ANN) is 
performed
by almost[3] independent updates of each of ANN's cells.  You 
_cannot_[4]
however chop the network into independent parts to be sent for remote
processing: each update must be visible ASAP by all the training 
tasks.[5]

"Future" instances do not appear in the "main" code, but the idea was,
indeed, to be able to use that JDK abstraction: see the unit test[6]
   testTravellerSalesmanSquareTourParallelSolver()
defined in class
   org.apache.commons.math4.ml.neuralnet.sofm.KohonenTrainingTaskTest
in the "test" part of the repository.

> As for a concrete example, you can try Phil's idea of the genetic 
> algorithm
> stuff I suppose.

I hope that with the above I made myself clear that I was not asking
for a pointer to code that could be parallelized[7], but rather that
people make it explicit what _they_ mean by parallelism[8].  What I 
mean
is multithread safe code that can take advantage of the multiple core
machines through the readily available classes in the JDK: namely the
"Executor" framework which you also mentioned.
Of course, I do not preclude other approaches (I don't know them, as
mentioned previously) that may (or may not) be more appropriate for the
example I gave or to other algorithms; but I truly believe that this
discussion should be more precise, unless we deepen the 
misunderstanding
of what we think we are talking about.


Regards,
Gilles

[1] As a side note: Shall we agree that top-posting is bad? ;-)
[2] With the purpose to implement a version of a specific algorithm 
(SOFM) so
     that the data structures might not be generally useful for any kind 
of
     artificial neural network.
[3] The update should of course be thread-safe: two parallel tasks 
might try
     to update the same cell at the same time.
[4] At least, it's instinctively obvious that for a SOFM network of 
"relatively
     small", you'd _loose_ performance through I/O.
[5] In later phases of the training, "regions" will have formed in the 
ANN, so
     that at that point, it might be possible to continue the updates of 
those
     regions on different computation nodes (with the necessary 
synchronization
     of the region's boundaries).
[6] It's more of an example usage that could probably go to the "user 
guide".
[7] The GA perfectly lend itself to the same kind of "readiness to 
parallelism"
     code which I implemented for the SOFM.
[8] As applied concretely to a specific algorithm in CM.

> On Saturday, April 18, 2015, Gilles <gi...@harfang.homelinux.org> 
> wrote:
>
>> On Fri, 17 Apr 2015 16:53:56 -0500, James Carman wrote:
>>
>>> Do you have any pointers to code for this ForkJoin mechanism?  I'm
>>> curious to see it.
>>>
>>> The key thing you will need in order to support parallelization in 
>>> a
>>> generic way
>>>
>>
>> What do you mean by "generic way"?
>>
>> I'm afraid that we may be trying to compare apples and oranges;
>> each of us probably has in mind a "prototype" algorithm and an idea
>> of how to implement it to make it run in parallel.
>>
>> I think that it would focus the discussion if we could
>> 1. tell what the "prototype" is,
>> 2. show a sort of pseudo-code of the difference between a sequential
>>    and a parallel run of this "prototype" (i.e. what is the data, 
>> how
>>    the (sub)tasks operate on them).
>>
>> Regards,
>> Gilles
>>
>>  is to not tie it directly to threads, but use some
>>> abstraction layer above threads, since that may not be the "worker"
>>> method you're using at the time.
>>>
>>> On Fri, Apr 17, 2015 at 2:57 PM, Thomas Neidhart
>>> <th...@gmail.com> wrote:
>>>
>>>> On 04/17/2015 05:35 PM, Phil Steitz wrote:
>>>>
>>>>> On 4/17/15 3:14 AM, Gilles wrote:
>>>>>
>>>>>> Hello.
>>>>>>
>>>>>> On Thu, 16 Apr 2015 17:06:21 -0500, James Carman wrote:
>>>>>>
>>>>>>> Consider me poked!
>>>>>>>
>>>>>>> So, the Java answer to "how do I run things in multiple 
>>>>>>> threads"
>>>>>>> is to
>>>>>>> use an Executor (java.util).  This doesn't necessarily mean 
>>>>>>> that you
>>>>>>> *have* to use a separate thread (the implementation could 
>>>>>>> execute
>>>>>>> inline).  However, in order to accommodate the separate thread 
>>>>>>> case,
>>>>>>> you would need to code to a Future-like API.  Now, I'm not 
>>>>>>> saying to
>>>>>>> use Executors directly, but I'd provide some abstraction layer 
>>>>>>> above
>>>>>>> them or in lieu of them, something like:
>>>>>>>
>>>>>>> public interface ExecutorThingy {
>>>>>>>   Future<T> execute(Function<T> fn);
>>>>>>> }
>>>>>>>
>>>>>>> One could imagine implementing different ExecutorThingy
>>>>>>> implementations which allow you to parallelize things in 
>>>>>>> different
>>>>>>> ways (simple threads, JMS, Akka, etc, etc.)
>>>>>>>
>>>>>>
>>>>>> I did not understand what is being suggested: parallelization of 
>>>>>> a
>>>>>> single algorithm or concurrent calls to multiple instances of an
>>>>>> algorithm?
>>>>>>
>>>>>
>>>>> Really both.  It's probably best to look at some concrete 
>>>>> examples.
>>>>> The two I mentioned in my apachecon talk are:
>>>>>
>>>>> 1.  Threads managed by some external process / application 
>>>>> gathering
>>>>> statistics to be aggregated.
>>>>>
>>>>> 2.  Allowing multiple threads to concurrently execute GA
>>>>> transformations within the GeneticAlgorithm "evolve" method.
>>>>>
>>>>> It would be instructive to think about how to handle both of 
>>>>> these
>>>>> use cases using something like what James is suggesting.  What is
>>>>> nice about his idea is that it could give us a way to let users /
>>>>> systems decide whether they want to have [math] algorithms spawn
>>>>> threads to execute concurrently or to allow an external execution
>>>>> framework to handle task distribution across threads.
>>>>>
>>>>
>>>> I since a more viable option is to take advantage of the ForkJoin
>>>> mechanism that we can use now in math 4.
>>>>
>>>> For example, the GeneticAlgorithm could be quite easily changed to 
>>>> use a
>>>> ForkJoinTask to perform each evolution, I will try to come up with 
>>>> an
>>>> example soon as I plan to work on the genetics package anyway.
>>>>
>>>> The idea outlined above sounds nice but it is very unclear how an
>>>> algorithm or function would perform its parallelization in such a 
>>>> way,
>>>> and whether it would still be efficient.
>>>>
>>>> Thomas
>>>>
>>>>  Since 2. above is a good example of "internal" parallelism and it
>>>>> also has data sharing / transfer challenges, maybe its best to 
>>>>> start
>>>>> with that one.  I have just started thinking about this and would
>>>>> love to get better ideas than my own hacking about how to do it
>>>>>
>>>>> a) Using Spark with RDD's to maintain population state data
>>>>> b) Hadoop with HDFS (or something else?)
>>>>>
>>>>> Phil
>>>>>
>>>>>>
>>>>>>
>>>>>> Gilles


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [math] threading redux

Posted by James Carman <ja...@carmanconsulting.com>.

I think I got sidetracked when typing that email.  I was trying to say that
we need an abstraction layer above raw threads in order to allow for
different types of parallelism.   The Future abstraction is there in order
to support remote execution where side effects aren't good enough.

As for a concrete example, you can try Phil's idea of the genetic algorithm
stuff I suppose.


On Saturday, April 18, 2015, Gilles <gi...@harfang.homelinux.org> wrote:

> On Fri, 17 Apr 2015 16:53:56 -0500, James Carman wrote:
>
>> Do you have any pointers to code for this ForkJoin mechanism?  I'm
>> curious to see it.
>>
>> The key thing you will need in order to support parallelization in a
>> generic way
>>
>
> What do you mean by "generic way"?
>
> I'm afraid that we may be trying to compare apples and oranges;
> each of us probably has in mind a "prototype" algorithm and an idea
> of how to implement it to make it run in parallel.
>
> I think that it would focus the discussion if we could
> 1. tell what the "prototype" is,
> 2. show a sort of pseudo-code of the difference between a sequential
>    and a parallel run of this "prototype" (i.e. what is the data, how
>    the (sub)tasks operate on them).
>
> Regards,
> Gilles
>
>  is to not tie it directly to threads, but use some
>> abstraction layer above threads, since that may not be the "worker"
>> method you're using at the time.
>>
>> On Fri, Apr 17, 2015 at 2:57 PM, Thomas Neidhart
>> <th...@gmail.com> wrote:
>>
>>> On 04/17/2015 05:35 PM, Phil Steitz wrote:
>>>
>>>> On 4/17/15 3:14 AM, Gilles wrote:
>>>>
>>>>> Hello.
>>>>>
>>>>> On Thu, 16 Apr 2015 17:06:21 -0500, James Carman wrote:
>>>>>
>>>>>> Consider me poked!
>>>>>>
>>>>>> So, the Java answer to "how do I run things in multiple threads"
>>>>>> is to
>>>>>> use an Executor (java.util).  This doesn't necessarily mean that you
>>>>>> *have* to use a separate thread (the implementation could execute
>>>>>> inline).  However, in order to accommodate the separate thread case,
>>>>>> you would need to code to a Future-like API.  Now, I'm not saying to
>>>>>> use Executors directly, but I'd provide some abstraction layer above
>>>>>> them or in lieu of them, something like:
>>>>>>
>>>>>> public interface ExecutorThingy {
>>>>>>   Future<T> execute(Function<T> fn);
>>>>>> }
>>>>>>
>>>>>> One could imagine implementing different ExecutorThingy
>>>>>> implementations which allow you to parallelize things in different
>>>>>> ways (simple threads, JMS, Akka, etc, etc.)
>>>>>>
>>>>>
>>>>> I did not understand what is being suggested: parallelization of a
>>>>> single algorithm or concurrent calls to multiple instances of an
>>>>> algorithm?
>>>>>
>>>>
>>>> Really both.  It's probably best to look at some concrete examples.
>>>> The two I mentioned in my apachecon talk are:
>>>>
>>>> 1.  Threads managed by some external process / application gathering
>>>> statistics to be aggregated.
>>>>
>>>> 2.  Allowing multiple threads to concurrently execute GA
>>>> transformations within the GeneticAlgorithm "evolve" method.
>>>>
>>>> It would be instructive to think about how to handle both of these
>>>> use cases using something like what James is suggesting.  What is
>>>> nice about his idea is that it could give us a way to let users /
>>>> systems decide whether they want to have [math] algorithms spawn
>>>> threads to execute concurrently or to allow an external execution
>>>> framework to handle task distribution across threads.
>>>>
>>>
>>> I since a more viable option is to take advantage of the ForkJoin
>>> mechanism that we can use now in math 4.
>>>
>>> For example, the GeneticAlgorithm could be quite easily changed to use a
>>> ForkJoinTask to perform each evolution, I will try to come up with an
>>> example soon as I plan to work on the genetics package anyway.
>>>
>>> The idea outlined above sounds nice but it is very unclear how an
>>> algorithm or function would perform its parallelization in such a way,
>>> and whether it would still be efficient.
>>>
>>> Thomas
>>>
>>>  Since 2. above is a good example of "internal" parallelism and it
>>>> also has data sharing / transfer challenges, maybe its best to start
>>>> with that one.  I have just started thinking about this and would
>>>> love to get better ideas than my own hacking about how to do it
>>>>
>>>> a) Using Spark with RDD's to maintain population state data
>>>> b) Hadoop with HDFS (or something else?)
>>>>
>>>> Phil
>>>>
>>>>>
>>>>>
>>>>> Gilles
>>>>>
>>>>>  [...]
>>>>>>>
>>>>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

Re: [math] threading redux

Posted by Gilles <gi...@harfang.homelinux.org>.

On Fri, 17 Apr 2015 16:53:56 -0500, James Carman wrote:
> Do you have any pointers to code for this ForkJoin mechanism?  I'm
> curious to see it.
>
> The key thing you will need in order to support parallelization in a
> generic way

What do you mean by "generic way"?

I'm afraid that we may be trying to compare apples and oranges;
each of us probably has in mind a "prototype" algorithm and an idea
of how to implement it to make it run in parallel.

I think that it would focus the discussion if we could
1. tell what the "prototype" is,
2. show a sort of pseudo-code of the difference between a sequential
    and a parallel run of this "prototype" (i.e. what is the data, how
    the (sub)tasks operate on them).

Regards,
Gilles

> is to not tie it directly to threads, but use some
> abstraction layer above threads, since that may not be the "worker"
> method you're using at the time.
>
> On Fri, Apr 17, 2015 at 2:57 PM, Thomas Neidhart
> <th...@gmail.com> wrote:
>> On 04/17/2015 05:35 PM, Phil Steitz wrote:
>>> On 4/17/15 3:14 AM, Gilles wrote:
>>>> Hello.
>>>>
>>>> On Thu, 16 Apr 2015 17:06:21 -0500, James Carman wrote:
>>>>> Consider me poked!
>>>>>
>>>>> So, the Java answer to "how do I run things in multiple threads"
>>>>> is to
>>>>> use an Executor (java.util).  This doesn't necessarily mean that 
>>>>> you
>>>>> *have* to use a separate thread (the implementation could execute
>>>>> inline).  However, in order to accommodate the separate thread 
>>>>> case,
>>>>> you would need to code to a Future-like API.  Now, I'm not saying 
>>>>> to
>>>>> use Executors directly, but I'd provide some abstraction layer 
>>>>> above
>>>>> them or in lieu of them, something like:
>>>>>
>>>>> public interface ExecutorThingy {
>>>>>   Future<T> execute(Function<T> fn);
>>>>> }
>>>>>
>>>>> One could imagine implementing different ExecutorThingy
>>>>> implementations which allow you to parallelize things in 
>>>>> different
>>>>> ways (simple threads, JMS, Akka, etc, etc.)
>>>>
>>>> I did not understand what is being suggested: parallelization of a
>>>> single algorithm or concurrent calls to multiple instances of an
>>>> algorithm?
>>>
>>> Really both.  It's probably best to look at some concrete examples.
>>> The two I mentioned in my apachecon talk are:
>>>
>>> 1.  Threads managed by some external process / application 
>>> gathering
>>> statistics to be aggregated.
>>>
>>> 2.  Allowing multiple threads to concurrently execute GA
>>> transformations within the GeneticAlgorithm "evolve" method.
>>>
>>> It would be instructive to think about how to handle both of these
>>> use cases using something like what James is suggesting.  What is
>>> nice about his idea is that it could give us a way to let users /
>>> systems decide whether they want to have [math] algorithms spawn
>>> threads to execute concurrently or to allow an external execution
>>> framework to handle task distribution across threads.
>>
>> I since a more viable option is to take advantage of the ForkJoin
>> mechanism that we can use now in math 4.
>>
>> For example, the GeneticAlgorithm could be quite easily changed to 
>> use a
>> ForkJoinTask to perform each evolution, I will try to come up with 
>> an
>> example soon as I plan to work on the genetics package anyway.
>>
>> The idea outlined above sounds nice but it is very unclear how an
>> algorithm or function would perform its parallelization in such a 
>> way,
>> and whether it would still be efficient.
>>
>> Thomas
>>
>>> Since 2. above is a good example of "internal" parallelism and it
>>> also has data sharing / transfer challenges, maybe its best to 
>>> start
>>> with that one.  I have just started thinking about this and would
>>> love to get better ideas than my own hacking about how to do it
>>>
>>> a) Using Spark with RDD's to maintain population state data
>>> b) Hadoop with HDFS (or something else?)
>>>
>>> Phil
>>>>
>>>>
>>>> Gilles
>>>>
>>>>>> [...]


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [math] threading redux

Posted by James Carman <ja...@carmanconsulting.com>.

Do you have any pointers to code for this ForkJoin mechanism?  I'm
curious to see it.

The key thing you will need in order to support parallelization in a
generic way is to not tie it directly to threads, but use some
abstraction layer above threads, since that may not be the "worker"
method you're using at the time.

On Fri, Apr 17, 2015 at 2:57 PM, Thomas Neidhart
<th...@gmail.com> wrote:
> On 04/17/2015 05:35 PM, Phil Steitz wrote:
>> On 4/17/15 3:14 AM, Gilles wrote:
>>> Hello.
>>>
>>> On Thu, 16 Apr 2015 17:06:21 -0500, James Carman wrote:
>>>> Consider me poked!
>>>>
>>>> So, the Java answer to "how do I run things in multiple threads"
>>>> is to
>>>> use an Executor (java.util).  This doesn't necessarily mean that you
>>>> *have* to use a separate thread (the implementation could execute
>>>> inline).  However, in order to accommodate the separate thread case,
>>>> you would need to code to a Future-like API.  Now, I'm not saying to
>>>> use Executors directly, but I'd provide some abstraction layer above
>>>> them or in lieu of them, something like:
>>>>
>>>> public interface ExecutorThingy {
>>>>   Future<T> execute(Function<T> fn);
>>>> }
>>>>
>>>> One could imagine implementing different ExecutorThingy
>>>> implementations which allow you to parallelize things in different
>>>> ways (simple threads, JMS, Akka, etc, etc.)
>>>
>>> I did not understand what is being suggested: parallelization of a
>>> single algorithm or concurrent calls to multiple instances of an
>>> algorithm?
>>
>> Really both.  It's probably best to look at some concrete examples.
>> The two I mentioned in my apachecon talk are:
>>
>> 1.  Threads managed by some external process / application gathering
>> statistics to be aggregated.
>>
>> 2.  Allowing multiple threads to concurrently execute GA
>> transformations within the GeneticAlgorithm "evolve" method.
>>
>> It would be instructive to think about how to handle both of these
>> use cases using something like what James is suggesting.  What is
>> nice about his idea is that it could give us a way to let users /
>> systems decide whether they want to have [math] algorithms spawn
>> threads to execute concurrently or to allow an external execution
>> framework to handle task distribution across threads.
>
> I since a more viable option is to take advantage of the ForkJoin
> mechanism that we can use now in math 4.
>
> For example, the GeneticAlgorithm could be quite easily changed to use a
> ForkJoinTask to perform each evolution, I will try to come up with an
> example soon as I plan to work on the genetics package anyway.
>
> The idea outlined above sounds nice but it is very unclear how an
> algorithm or function would perform its parallelization in such a way,
> and whether it would still be efficient.
>
> Thomas
>
>> Since 2. above is a good example of "internal" parallelism and it
>> also has data sharing / transfer challenges, maybe its best to start
>> with that one.  I have just started thinking about this and would
>> love to get better ideas than my own hacking about how to do it
>>
>> a) Using Spark with RDD's to maintain population state data
>> b) Hadoop with HDFS (or something else?)
>>
>> Phil
>>>
>>>
>>> Gilles
>>>
>>>>> [...]
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>
>>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [math] threading redux

Posted by Thomas Neidhart <th...@gmail.com>.

On 04/17/2015 05:35 PM, Phil Steitz wrote:
> On 4/17/15 3:14 AM, Gilles wrote:
>> Hello.
>>
>> On Thu, 16 Apr 2015 17:06:21 -0500, James Carman wrote:
>>> Consider me poked!
>>>
>>> So, the Java answer to "how do I run things in multiple threads"
>>> is to
>>> use an Executor (java.util).  This doesn't necessarily mean that you
>>> *have* to use a separate thread (the implementation could execute
>>> inline).  However, in order to accommodate the separate thread case,
>>> you would need to code to a Future-like API.  Now, I'm not saying to
>>> use Executors directly, but I'd provide some abstraction layer above
>>> them or in lieu of them, something like:
>>>
>>> public interface ExecutorThingy {
>>>   Future<T> execute(Function<T> fn);
>>> }
>>>
>>> One could imagine implementing different ExecutorThingy
>>> implementations which allow you to parallelize things in different
>>> ways (simple threads, JMS, Akka, etc, etc.)
>>
>> I did not understand what is being suggested: parallelization of a
>> single algorithm or concurrent calls to multiple instances of an
>> algorithm?
> 
> Really both.  It's probably best to look at some concrete examples. 
> The two I mentioned in my apachecon talk are:
> 
> 1.  Threads managed by some external process / application gathering
> statistics to be aggregated.
> 
> 2.  Allowing multiple threads to concurrently execute GA
> transformations within the GeneticAlgorithm "evolve" method.
> 
> It would be instructive to think about how to handle both of these
> use cases using something like what James is suggesting.  What is
> nice about his idea is that it could give us a way to let users /
> systems decide whether they want to have [math] algorithms spawn
> threads to execute concurrently or to allow an external execution
> framework to handle task distribution across threads.

I since a more viable option is to take advantage of the ForkJoin
mechanism that we can use now in math 4.

For example, the GeneticAlgorithm could be quite easily changed to use a
ForkJoinTask to perform each evolution, I will try to come up with an
example soon as I plan to work on the genetics package anyway.

The idea outlined above sounds nice but it is very unclear how an
algorithm or function would perform its parallelization in such a way,
and whether it would still be efficient.

Thomas

> Since 2. above is a good example of "internal" parallelism and it
> also has data sharing / transfer challenges, maybe its best to start
> with that one.  I have just started thinking about this and would
> love to get better ideas than my own hacking about how to do it
> 
> a) Using Spark with RDD's to maintain population state data
> b) Hadoop with HDFS (or something else?)
> 
> Phil
>>
>>
>> Gilles
>>
>>>> [...]
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [math] threading redux

Posted by Phil Steitz <ph...@gmail.com>.

On 4/17/15 3:14 AM, Gilles wrote:
> Hello.
>
> On Thu, 16 Apr 2015 17:06:21 -0500, James Carman wrote:
>> Consider me poked!
>>
>> So, the Java answer to "how do I run things in multiple threads"
>> is to
>> use an Executor (java.util).  This doesn't necessarily mean that you
>> *have* to use a separate thread (the implementation could execute
>> inline).  However, in order to accommodate the separate thread case,
>> you would need to code to a Future-like API.  Now, I'm not saying to
>> use Executors directly, but I'd provide some abstraction layer above
>> them or in lieu of them, something like:
>>
>> public interface ExecutorThingy {
>>   Future<T> execute(Function<T> fn);
>> }
>>
>> One could imagine implementing different ExecutorThingy
>> implementations which allow you to parallelize things in different
>> ways (simple threads, JMS, Akka, etc, etc.)
>
> I did not understand what is being suggested: parallelization of a
> single algorithm or concurrent calls to multiple instances of an
> algorithm?

Really both.  It's probably best to look at some concrete examples. 
The two I mentioned in my apachecon talk are:

1.  Threads managed by some external process / application gathering
statistics to be aggregated.

2.  Allowing multiple threads to concurrently execute GA
transformations within the GeneticAlgorithm "evolve" method.

It would be instructive to think about how to handle both of these
use cases using something like what James is suggesting.  What is
nice about his idea is that it could give us a way to let users /
systems decide whether they want to have [math] algorithms spawn
threads to execute concurrently or to allow an external execution
framework to handle task distribution across threads.

Since 2. above is a good example of "internal" parallelism and it
also has data sharing / transfer challenges, maybe its best to start
with that one.  I have just started thinking about this and would
love to get better ideas than my own hacking about how to do it

a) Using Spark with RDD's to maintain population state data
b) Hadoop with HDFS (or something else?)

Phil
>
>
> Gilles
>
>> > [...]
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [math] threading redux

Posted by Gilles <gi...@harfang.homelinux.org>.

Hello.

On Thu, 16 Apr 2015 17:06:21 -0500, James Carman wrote:
> Consider me poked!
>
> So, the Java answer to "how do I run things in multiple threads" is 
> to
> use an Executor (java.util).  This doesn't necessarily mean that you
> *have* to use a separate thread (the implementation could execute
> inline).  However, in order to accommodate the separate thread case,
> you would need to code to a Future-like API.  Now, I'm not saying to
> use Executors directly, but I'd provide some abstraction layer above
> them or in lieu of them, something like:
>
> public interface ExecutorThingy {
>   Future<T> execute(Function<T> fn);
> }
>
> One could imagine implementing different ExecutorThingy
> implementations which allow you to parallelize things in different
> ways (simple threads, JMS, Akka, etc, etc.)

I did not understand what is being suggested: parallelization of a
single algorithm or concurrent calls to multiple instances of an
algorithm?


Gilles

> > [...]


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [math] threading redux

Posted by James Carman <ja...@carmanconsulting.com>.

Consider me poked!

So, the Java answer to "how do I run things in multiple threads" is to
use an Executor (java.util).  This doesn't necessarily mean that you
*have* to use a separate thread (the implementation could execute
inline).  However, in order to accommodate the separate thread case,
you would need to code to a Future-like API.  Now, I'm not saying to
use Executors directly, but I'd provide some abstraction layer above
them or in lieu of them, something like:

public interface ExecutorThingy {
  Future<T> execute(Function<T> fn);
}

One could imagine implementing different ExecutorThingy
implementations which allow you to parallelize things in different
ways (simple threads, JMS, Akka, etc, etc.)

On Wed, Apr 15, 2015 at 10:33 AM, Phil Steitz <ph...@gmail.com> wrote:
> James Carman and I had a brief conversation following my Apachecon
> talk, where I mentioned the challenge we have around deciding what
> to do about supporting multiple threads / processes.  He has some
> good ideas.  This is really just a poke to get him to post those
> ideas :)
>
> The final presented slides are here:
> http://s.apache.org/arB
>
> Thanks for the feedback as I was preparing!
>
> Phil
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org