You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Avijit Basak <av...@gmail.com> on 2021/11/22 12:49:00 UTC

[MATH][GENETICS][PR-199] Decision on use and customization of RNG functionality for randomization

Hi All

        I would like to request everyone to share their opinion regarding
use and customization of RNG functionality in the Genetic Algorithm
library.
        In current design RNG functionality has been used internally by the
RandomProviderManager class. This class encapsulates a predefined instance
of RandomSource and utilizes the same for all random number generation
requirements. This makes the API cleaner and easy to use for users.
        However, during the review an alternate thought has been proposed
related to customization of RandomSource by users. According to the new
proposal the users will be able to provide a RandomSource instance of their
choice to the crossover and mutation operators and other places like
ChromosomeRepresentationUtils. The drawback of this customization could be
increased complexity of the API.
        We need to decide here whether we really need this kind of
customization by users and if yes the method of doing so. Here two options
have been proposed.
*Option1:*
---CUT---
public interface MutationPolicy<P> {
    Chromosome<P> mutate(Chromosome<P> original, double mutationRate);

    interface Factory<P> {
        /**
         * Creates an instance with a dedicated source of randomness.
         *
         * @param rng RNG algorithm.
         * @param seed Seed.
         * @return an instance that must <em>not</em> be shared among
threads.
         */
        MutationPolicy<P> create(RandomSource rng, Object seed);

        default MutationPolicy<P> create(RandomSource rng) {
            return create(rng, null);
        }
        default MutationPolicy<P> create() {
            return create(RandomSource.SPLIT_MIX_64);
        }
    }
}
---CUT---

*Option 2:*
Use of an optional constructor argument for all crossover and mutation
operators. Users will be providing a RandomSource instance of their choice
or use the default one configured while instantiating the operators.

Thanks & Regards
-- Avijit Basak

Re: [MATH][GENETICS][PR-199] Decision on use and customization of RNG functionality for randomization

Posted by Avijit Basak <av...@gmail.com>.
Hi All

        Please see my comments below.

>> >As I've already indicated, "ThreadLocalRandomSource" is, IMHO, a
>> >sort of workaround for a multi-thread application that does not want
>> >to bother managing per-thread RNG instance(s).
>> -- I am not clear on this. ThreadLocalRandomSource maintains
>> an EnumMap<RandomSource, ThreadLocal<UniformRandomProvider>>. What is
meant
>> by it "does not want to bother managing per-thread RNG instance(s)" Could
>>> you please elaborate more on this. If this is an issue in RNG why don't
we
>>> think of fixing the same or providing a different internal
implementation.

>There is no issue in "Commons RNG"; it provides a tool.
>I think that it is not the right tool for a multi-threaded GA library.
--If we try to avoid use of ThreadLocalRandomSource, it will introduce
threadlocal objects for each GA operator, which are currently immutable. Is
that acceptable? In my opinion that is not the right design.

>
>> >The library should not make that decision for the application since we
>> >can care for both usages: Every piece of the GA that needs a RNG can
>> >provide factory methods that either take a "RandomSource" argument
>> >or create a default one.
>> -- Library can always use a default option or provide an option for
>> customization at a global level but it need not be at the operator
>> level(IMHO).

>How can a GA operator work without a RNG?
>It can't; it is one of the main settings of such an operator, and the
>reason it should be customizable.
-- Indeed GA operators cannot work without RNG. We can definitely customize
the same and keep the user provided instance of RandomSource as a final
reference variable. Then ThreadLocalRandomSource can use the same.

>> I don't see much use of it.

>That's OK; that's why I proposed for this kind a use a way to
>generate a default instance, without any burden for the caller.
--Default instance should always be there.

> >
> > >> >2. Less/no flexibility (no user's choice of random source).
> > >> -- Agreed.
> > -- Do we really need this much flexibility here?
>
> >My main concern is that IMO the RNG is a prominent part of a GA
> >and it is not a good design to use "ThreadLocalRandomSource".
> -- RNG is definitely a prominent part. However, if we have a sharing issue
> with ThreadLocalRandomSource we need to think of it's alternate
> implementation.

>There is a misunderstanding; there is no sharing issue, there is
>a design issue.

> >How many is "too many instances"?
> >The memory used by an operator is tiny compared to a chromosome,
> >even less to a population of chromosome, or two populations of them
> >(parents and offsprings).
> --My concern is we are trying to provide a fix for a performance problem
in
> another library and that is going to consume additional memory.

>Nothing (at all) that we should be worried (and discussing further):
>Most RNG implementations are quite lean (a few hundred bytes to
>a few KB).  You multiply this by the number of threads (a few tens
>at most), and you are well below 1 MB.  What is this amount when
>compared to the average Java application nowadays?
-- This is not about the RNG implementation. If we don't use
ThreadLocalRandomSource then we need to introduce threadlocal instances for
our GA operators. A deep copy option should also be needed for them. Let me
know if you have any separate thoughts.

> >     So I think we have a design tradeoff here performance vs memory
> > consumption. I am more worried about memory as that might restrict use
of
> > this library beyond a certain number of dimensions in some areas.
>
> >I'm referring to separate copies for each thread.
> >How many threads/virtual CPUs are common nowadays?
> >> However,
> >> creating deep copy would only be possible when we strictly restrict
> >> extension of operators which I want to avoid.
>
> >How to avoid deep copies in a multi-thread library?
> >Through synchronization?
> -- The operator interfaces are designed like a functional interface.
> Accordingly, the current implementation of all operators are read only.
The
> implementation does not maintain any mutable properties during
computations
> too. So they are perfectly suitable for multi-threaded operation.

Great!

> If you
> see any deviation to it please notify me.

Sure.
Sorry I did not have the time to look into the code yet.
--I have created a fresh repository and a new PR. Kindly go through it. It
will be easier to discuss it once we are on the same page.

> >
> > >> So even if we provide
> > >> the customization at the operator level we cannot avoid sharing.
> >
> > >We can, and we should.
> > >What we probably can't avoid sharing is the instance that represents
the
> > >population of chromosomes.
> > *--* In a multi-threaded optimization the chromosome instances are
shared
> > in case the same chromosome is chosen for crossover by the selection
> > process. I missed this point earlier.
> > ...
>
> Chromosomes can be shared (if they are read-only).
> --They are read-only.

>And immutable?
--Yes they are.

> >
> > >> >  Mine is against using "ThreadLocalRandomSource"...
> > >> -- What is the wayout other than that. Please suggest.
> >
> > >I think I did.
> > *--* The factory based approach would be useful only when we can have
> > separate copies of operators for each set of operations.
>
> If we don't have separate copies in each thread, then the operator
> will not be multithreaded...
> -- If operators do not contain any mutable property then they are
perfectly
> usable in a multi-threaded environment.

>The problem is that they do: by necessity, the RNG instance is mutable.
>You want to hide this fact through using "ThreadLocalRandomSource",
>and I think that it should not be hidden.

> > *--* I think we should not block the extension.
>
> >This would be going backwards to many things that have been done
> >to improve the robustness and reduce the bug counts of the Commons
> >Math codes.
> -- GA is different from other math functions. We may not impose the same
> principle on everything.

>The principle stems from putting (actually needed) robustness above of
>(hypothetically needed) extensibility.
>IIRC every usage of "protected" in Commons Math, on the expectation
>that it might be useful for some (indefinite) use, was reverted to
"private".
-- IMHO as long as protected methods do not mutate private fields there is
no issue due to extension.

>We should develop first with as few "public" API as possible; then if
>the need arise, and your design is indeed extensible by construction,
>it will just be be a matter of changing "private" to "protected" in a later
>release.
-- This is good for us i.e. API developers. But how can users manage that?

>> > >Initially we discussed about having a light-weight library, for easier
>> > usage
>> > >than alternative existing framework(s).
>> > *--* We can always think of making the framework lightweight but it
should
>> > not cost extensibility.
>>
>> >There is no cost: We'll gladly merge every worthy extension into
>> >the Commons component.
>> -- I think we have a disconnect here. If the framework is not extensible
>> how anyone would be able to use it in any new domain. Do you mean first
the
>> framework should be changed for any new domain and users should only use
it
>> out of box.

>This is an open-source project.
>Anyone can take the code, make whatever extension, use it for whatever
>purpose (e.g. proving by a working example that it is needed), and submit
>a patch so that everyone benefits in the next release.
-- This practice is forbidden in commercial projects and can only be
followed in open source projects. In commercial projects developers would
never make any changes in the library code. Only thing they would do is
develop permitted extensions of the library. I think we should always think
of different types of usages(commercial and non-commercial) of the library.

> >
> > >> E.g. any developer should be able to extend the
> > >> IntegralChromosome class and define a child class which explicitly
> > >> specifies the range of integers to be used.
> >
> > >It does not look like this would need an extension, only configuration
> > >of the range.
> > *-- *I agree. But the question is should we block the extension.
>
> >Please find a valid use case. ;-)
> -- Recently I did an implementation of scheduling with commons-math 3.6. I
> have implemented the chromosome representing schedule by extending
> AbstractListChromosome. The mutation was also customized according to the
> requirement. However, I was able to use the existing OnePointCrossover
> operator. Do you think this kind of implementation would be possible if
the
> framework does not support extensibility?

>I'm lacking information, namely to understand why you could use
>the "crossover" but not the "mutation".
-- It depends on the problem domain and encoding of the phenotype to
genotype. Crossover depends mostly on genotype but the mutation process
sometimes varies depending on the domain even for the same genotype. There
can't be any hard and fast rule for these.

>Also isn't the chromosome in principle an abstract representation
>of any solution independently of the domain (how good a solution
>for the problem at hand being obtained through computing the
>fitness of its associated phenotype)?
-- Chromosome cannot represent the problem domain by itself. So,
introduction of Phenotype is a good design choice. GA always has the
concept of genotype and phenotype. The genotype represents the chromosome
encoding where the phenotype always represents the domain. There are two
types of encoding popularly used in GA i.e. direct and indirect. In a
direct encoding genotype and phenotype are almost the same with minor
transformations like I have used in mathematical functions optimization
examples. However, there is also an indirect encoding practice where the
genotype does not have any close relation to the phenotype. I have
demonstrated the same with the TSP example. I have designed the Decoder to
enable use of the same genotype for both types of encoding transparently.
In the legacy model they had introduced a separate chromosome class for
RandomKey encoding which is needless now.


>In the end, we should first wonder whether there is a design issue
>that could be solved without resorting to using "protected" fields.
-- We only have protected methods, not fields.

>[My first impression about what you had to do, is that it points to a
>shortcoming of the GA functionality in previous versions of CM
>and the new design is an opportunity to fix that.]
-- One of the primary shortcoming I noticed was the one I mentioned in a
previous comment i.e. no concept of Phenotype. That is addressed in the new
design.

> > >> I have initially implemented
> > >> the Binary chromosome and the corresponding binary mutation following
> the
> > >> same pattern. However, restricting extension of concrete classes by
> > private
> > >> constructor does not prevent users from extending the abstract parent
> > >> classes.
> >
> > >We should aim at coding the GA logic through (Java) interfaces, and not
> > >expose the "abstract" classes.
> > *-- *One of the primary reasons for me to contribute in Apache' GA
library
> > is it's simplicity and extensibility.
>
> >"Extensibility" does not necessarily imply "inheritance"-based.
> -- Can you provide a solution to the above problem without an
extensibility
> feature?

>It depends on the scope of the library.
>I'm pretty that whatever the new implementation which you are
>working on, there are some problems which it won't be able to
>solve even if it's inheritance-based.
-- We cannot design a library which is good in all aspects. There should be
tradeoffs. We have to go with that.

>Moreover it can be construed that if some user has to develop
>an extension, he might rather turn to another software with that
>functionality already built in.
-- True, that anyone can do. But our design should not restrict extension.

>
> >In fact, we do want to *avoid* in order to more easily and more robustly
> >provide other advantages such as multi-threading.
> -- IMHO immutable operator design is the best choice for supporting
> multi-threading.

>Agreed.
>Immutability implies that all fields are "final" (hence "protected"
>fields would be useless).
-- There is no protected field.

> It is much easier to implement even for user extension.

>Agreed.
>Whether we allow some classes to be non-"final" is a much
>easier discussion.  No problem in doing that if it imposes no
>maintenance burden.
-- Kindly review the code and let me know your concerns with any public
class design.

> Why don't we think of fixing the ThreadLocalRandomSource.

>As said above, nothing to fix there.
--Ok.
>
> >> I would like to have a framework
> >> which should be always extensible for any problem domain with minor
> >> changes.
>
> >Any problem domain should indeed be amenable to be solved
> >by the library; I don't see how that should imply a design based
> >on inheritance.
> -- Do you have any alter design in mind. Kindly share the same.

>I gave some hints in previous messages; I can't promise that it
>would fly without actually trying it. ;-)
>But I will do it once the code is in a branch which I can modify.
-- I have created a fresh repo and PR(#200). Kindly look into it.

>
> >> The primary reason behind this is that application domains of GA
> >> are too diverse. It is not possible to implement everything in a
library.
> >> We don't know all possible domain areas too. If we remove the
> extensibility
> >> from the framework it would be useless in lots of areas.
>
> >When that occurs, people are welcome to contribute back if
> >something they need is missing.
> -- I think we have a disconnect here too. If the framework is not
> extensible how users can use this in their problem domain. If this is not
> extensible then it would never be used. How can we get back the
> contribution?

>I answered to this above.

>
> >Your argument of "too much diversity" can be reversed, in that
> >it is unlikely that one library would attract everyone that needs a
> >genetic algorithm.
> -- Even if it cannot attract everyone with out of box features it should
be
> extensible for those.

>I don't agree with making things more complicated for us, now and
>in the foreseeable future, in order to satisfy users who don't exist yet
>(because the library does not exist yet).
-- I don't want to make things complicated for us. GA has a huge amount of
usages in diverse fields. Of course we should not try to provide solutions
for all. But the only thing I would like to ensure is that this library
should be reusable so that anyone can extend it and design solutions for
a new domain. We should not put any burden towards this.

>Let's focus on making it work within a given scope, and then we can
>think of improvements (that will be easy if the design is "structurally"
>extensible, even if they are somehow "disabled" in the first release).
-- I am against this "disable" option. I have tried to search the list of
use cases for GA and found this huge list
https://en.wikipedia.org/wiki/List_of_genetic_algorithm_applications
My proposal is we should allow extensibility selectively with immutability
in place. This won't create any bugs in our code due to extension.

> >Better make a design that can handle a fraction of use cases,
> >and grow as needed.
> --There are already libraries which can solve most common use cases.
> Non-extensible nature would block the growth to a considerable extent.

>Is there a misunderstanding about what is implied by "extensible"?
>Question: Are all classes, in your current design, "immutable"?
-- Yes, they are mostly. However, there are some classes with
protected/public methods which mutate private fields for internal
processing e.g. generationsEvolved field in AbstractGeneticAlgorithm class.
However the child classes cannot modify those private fields as there are
no direct mutation methods.

>If so, that's an excellent basis, and we should stop discussing the
>meaning of "extensibility".
--I think the design first needs a review. Then we can reinitiate this
discussion.

>
> >> >Extending the functionality, if necessary, should be contributed back
> here
> >> *-- *Sometimes the GA operators are very much specific to the domain
and
> >> it's hard to generalise. In those scenarios contributing back to the
> >> library might not be possible.
>
> >In such a case, how likely will it also be that whatever general
> >framework this library has put in place, will also not be amenable
> >to that domain's specifics?
> -- Could you please frame this concern w.r.t. the scheduling example
> provided above.

?

>
> >There is always a scope from which design decisions must be taken.
> >If "multi-threading" is in the scope, then the design must avoid
> >inheritance (in public classes) in order to much more easily
> >ensure the correctness of applications.
> -- Immutable design can also take care of multi-threading.

>My main point in the discussion is that all classes with "public" access
>should be immutable, indeed.
-- They should be.

>
> >> However, if a library cannot be extended for
> >> a new domain by users it becomes underutilised over time if not
useless.
>
> >Sure but that is a hypothetical for the long-term.
> >However, if the library is buggy or slow, it will not be used at all.
> -- Is there any benchmark for speed/performance? GA is always infamous for
> resource consumption rather than time.

>I'm not sure I understand what you mean here.


Thanks & Regards
--Avijit Basak

On Thu, 23 Dec 2021 at 20:50, Gilles Sadowski <gi...@gmail.com> wrote:

> Hello.
>
> Le jeu. 23 déc. 2021 à 14:22, Avijit Basak <av...@gmail.com> a
> écrit :
> >
> > Hi All
> >
> >          Please see my comments below.
> >
> > >As I've already indicated, "ThreadLocalRandomSource" is, IMHO, a
> > >sort of workaround for a multi-thread application that does not want
> > >to bother managing per-thread RNG instance(s).
> > -- I am not clear on this. ThreadLocalRandomSource maintains
> > an EnumMap<RandomSource, ThreadLocal<UniformRandomProvider>>. What is
> meant
> > by it "does not want to bother managing per-thread RNG instance(s)" Could
> > you please elaborate more on this. If this is an issue in RNG why don't
> we
> > think of fixing the same or providing a different internal
> implementation.
>
> There is no issue in "Commons RNG"; it provides a tool.
> I think that it is not the right tool for a multi-threaded GA library.
>
> >
> > >The library should not make that decision for the application since we
> > >can care for both usages: Every piece of the GA that needs a RNG can
> > >provide factory methods that either take a "RandomSource" argument
> > >or create a default one.
> > -- Library can always use a default option or provide an option for
> > customization at a global level but it need not be at the operator
> > level(IMHO).
>
> How can a GA operator work without a RNG?
> It can't; it is one of the main settings of such an operator, and the
> reason it should be customizable.
>
> > I don't see much use of it.
>
> That's OK; that's why I proposed for this kind a use a way to
> generate a default instance, without any burden for the caller.
>
> > >
> > > >> >2. Less/no flexibility (no user's choice of random source).
> > > >> -- Agreed.
> > > -- Do we really need this much flexibility here?
> >
> > >My main concern is that IMO the RNG is a prominent part of a GA
> > >and it is not a good design to use "ThreadLocalRandomSource".
> > -- RNG is definitely a prominent part. However, if we have a sharing
> issue
> > with ThreadLocalRandomSource we need to think of it's alternate
> > implementation.
>
> There is a misunderstanding; there is no sharing issue, there is
> a design issue.
>
> > >How many is "too many instances"?
> > >The memory used by an operator is tiny compared to a chromosome,
> > >even less to a population of chromosome, or two populations of them
> > >(parents and offsprings).
> > --My concern is we are trying to provide a fix for a performance problem
> in
> > another library and that is going to consume additional memory.
>
> Nothing (at all) that we should be worried (and discussing further):
> Most RNG implementations are quite lean (a few hundred bytes to
> a few KB).  You multiply this by the number of threads (a few tens
> at most), and you are well below 1 MB.  What is this amount when
> compared to the average Java application nowadays?
>
> > >     So I think we have a design tradeoff here performance vs memory
> > > consumption. I am more worried about memory as that might restrict use
> of
> > > this library beyond a certain number of dimensions in some areas.
> >
> > >I'm referring to separate copies for each thread.
> > >How many threads/virtual CPUs are common nowadays?
> > >> However,
> > >> creating deep copy would only be possible when we strictly restrict
> > >> extension of operators which I want to avoid.
> >
> > >How to avoid deep copies in a multi-thread library?
> > >Through synchronization?
> > -- The operator interfaces are designed like a functional interface.
> > Accordingly, the current implementation of all operators are read only.
> The
> > implementation does not maintain any mutable properties during
> computations
> > too. So they are perfectly suitable for multi-threaded operation.
>
> Great!
>
> > If you
> > see any deviation to it please notify me.
>
> Sure.
> Sorry I did not have the time to look into the code yet.
>
> > >
> > > >> So even if we provide
> > > >> the customization at the operator level we cannot avoid sharing.
> > >
> > > >We can, and we should.
> > > >What we probably can't avoid sharing is the instance that represents
> the
> > > >population of chromosomes.
> > > *--* In a multi-threaded optimization the chromosome instances are
> shared
> > > in case the same chromosome is chosen for crossover by the selection
> > > process. I missed this point earlier.
> > > ...
> >
> > Chromosomes can be shared (if they are read-only).
> > --They are read-only.
>
> And immutable?
>
> > >
> > > >> >  Mine is against using "ThreadLocalRandomSource"...
> > > >> -- What is the wayout other than that. Please suggest.
> > >
> > > >I think I did.
> > > *--* The factory based approach would be useful only when we can have
> > > separate copies of operators for each set of operations.
> >
> > If we don't have separate copies in each thread, then the operator
> > will not be multithreaded...
> > -- If operators do not contain any mutable property then they are
> perfectly
> > usable in a multi-threaded environment.
>
> The problem is that they do: by necessity, the RNG instance is mutable.
> You want to hide this fact through using "ThreadLocalRandomSource",
> and I think that it should not be hidden.
>
> > > *--* I think we should not block the extension.
> >
> > >This would be going backwards to many things that have been done
> > >to improve the robustness and reduce the bug counts of the Commons
> > >Math codes.
> > -- GA is different from other math functions. We may not impose the same
> > principle on everything.
>
> The principle stems from putting (actually needed) robustness above of
> (hypothetically needed) extensibility.
> IIRC every usage of "protected" in Commons Math, on the expectation
> that it might be useful for some (indefinite) use, was reverted to
> "private".
>
> We should develop first with as few "public" API as possible; then if
> the need arise, and your design is indeed extensible by construction,
> it will just be be a matter of changing "private" to "protected" in a later
> release.
>
> > > >Initially we discussed about having a light-weight library, for easier
> > > usage
> > > >than alternative existing framework(s).
> > > *--* We can always think of making the framework lightweight but it
> should
> > > not cost extensibility.
> >
> > >There is no cost: We'll gladly merge every worthy extension into
> > >the Commons component.
> > -- I think we have a disconnect here. If the framework is not extensible
> > how anyone would be able to use it in any new domain. Do you mean first
> the
> > framework should be changed for any new domain and users should only use
> it
> > out of box.
>
> This is an open-source project.
> Anyone can take the code, make whatever extension, use it for whatever
> purpose (e.g. proving by a working example that it is needed), and submit
> a patch so that everyone benefits in the next release.
>
> > >
> > > >> E.g. any developer should be able to extend the
> > > >> IntegralChromosome class and define a child class which explicitly
> > > >> specifies the range of integers to be used.
> > >
> > > >It does not look like this would need an extension, only configuration
> > > >of the range.
> > > *-- *I agree. But the question is should we block the extension.
> >
> > >Please find a valid use case. ;-)
> > -- Recently I did an implementation of scheduling with commons-math 3.6.
> I
> > have implemented the chromosome representing schedule by extending
> > AbstractListChromosome. The mutation was also customized according to the
> > requirement. However, I was able to use the existing OnePointCrossover
> > operator. Do you think this kind of implementation would be possible if
> the
> > framework does not support extensibility?
>
> I'm lacking information, namely to understand why you could use
> the "crossover" but not the "mutation".
> Also isn't the chromosome in principle an abstract representation
> of any solution independently of the domain (how good a solution
> for the problem at hand being obtained through computing the
> fitness of its associated phenotype)?
>
> In the end, we should first wonder whether there is a design issue
> that could be solved without resorting to using "protected" fields.
> [My first impression about what you had to do, is that it points to a
> shortcoming of the GA functionality in previous versions of CM
> and the new design is an opportunity to fix that.]
>
> > > >> I have initially implemented
> > > >> the Binary chromosome and the corresponding binary mutation
> following
> > the
> > > >> same pattern. However, restricting extension of concrete classes by
> > > private
> > > >> constructor does not prevent users from extending the abstract
> parent
> > > >> classes.
> > >
> > > >We should aim at coding the GA logic through (Java) interfaces, and
> not
> > > >expose the "abstract" classes.
> > > *-- *One of the primary reasons for me to contribute in Apache' GA
> library
> > > is it's simplicity and extensibility.
> >
> > >"Extensibility" does not necessarily imply "inheritance"-based.
> > -- Can you provide a solution to the above problem without an
> extensibility
> > feature?
>
> It depends on the scope of the library.
> I'm pretty that whatever the new implementation which you are
> working on, there are some problems which it won't be able to
> solve even if it's inheritance-based.
> Moreover it can be construed that if some user has to develop
> an extension, he might rather turn to another software with that
> functionality already built in.
>
> >
> > >In fact, we do want to *avoid* in order to more easily and more robustly
> > >provide other advantages such as multi-threading.
> > -- IMHO immutable operator design is the best choice for supporting
> > multi-threading.
>
> Agreed.
> Immutability implies that all fields are "final" (hence "protected"
> fields would be useless).
>
> > It is much easier to implement even for user extension.
>
> Agreed.
> Whether we allow some classes to be non-"final" is a much
> easier discussion.  No problem in doing that if it imposes no
> maintenance burden.
>
> > Why don't we think of fixing the ThreadLocalRandomSource.
>
> As said above, nothing to fix there.
>
> >
> > >> I would like to have a framework
> > >> which should be always extensible for any problem domain with minor
> > >> changes.
> >
> > >Any problem domain should indeed be amenable to be solved
> > >by the library; I don't see how that should imply a design based
> > >on inheritance.
> > -- Do you have any alter design in mind. Kindly share the same.
>
> I gave some hints in previous messages; I can't promise that it
> would fly without actually trying it. ;-)
> But I will do it once the code is in a branch which I can modify.
>
> >
> > >> The primary reason behind this is that application domains of GA
> > >> are too diverse. It is not possible to implement everything in a
> library.
> > >> We don't know all possible domain areas too. If we remove the
> > extensibility
> > >> from the framework it would be useless in lots of areas.
> >
> > >When that occurs, people are welcome to contribute back if
> > >something they need is missing.
> > -- I think we have a disconnect here too. If the framework is not
> > extensible how users can use this in their problem domain. If this is not
> > extensible then it would never be used. How can we get back the
> > contribution?
>
> I answered to this above.
>
> >
> > >Your argument of "too much diversity" can be reversed, in that
> > >it is unlikely that one library would attract everyone that needs a
> > >genetic algorithm.
> > -- Even if it cannot attract everyone with out of box features it should
> be
> > extensible for those.
>
> I don't agree with making things more complicated for us, now and
> in the foreseeable future, in order to satisfy users who don't exist yet
> (because the library does not exist yet).
>
> Let's focus on making it work within a given scope, and then we can
> think of improvements (that will be easy if the design is "structurally"
> extensible, even if they are somehow "disabled" in the first release).
>
> > >Better make a design that can handle a fraction of use cases,
> > >and grow as needed.
> > --There are already libraries which can solve most common use cases.
> > Non-extensible nature would block the growth to a considerable extent.
>
> Is there a misunderstanding about what is implied by "extensible"?
> Question: Are all classes, in your current design, "immutable"?
> If so, that's an excellent basis, and we should stop discussing the
> meaning of "extensibility".
>
> >
> > >> >Extending the functionality, if necessary, should be contributed back
> > here
> > >> *-- *Sometimes the GA operators are very much specific to the domain
> and
> > >> it's hard to generalise. In those scenarios contributing back to the
> > >> library might not be possible.
> >
> > >In such a case, how likely will it also be that whatever general
> > >framework this library has put in place, will also not be amenable
> > >to that domain's specifics?
> > -- Could you please frame this concern w.r.t. the scheduling example
> > provided above.
>
> ?
>
> >
> > >There is always a scope from which design decisions must be taken.
> > >If "multi-threading" is in the scope, then the design must avoid
> > >inheritance (in public classes) in order to much more easily
> > >ensure the correctness of applications.
> > -- Immutable design can also take care of multi-threading.
>
> My main point in the discussion is that all classes with "public" access
> should be immutable, indeed.
>
> >
> > >> However, if a library cannot be extended for
> > >> a new domain by users it becomes underutilised over time if not
> useless.
> >
> > >Sure but that is a hypothetical for the long-term.
> > >However, if the library is buggy or slow, it will not be used at all.
> > -- Is there any benchmark for speed/performance? GA is always infamous
> for
> > resource consumption rather than time.
>
> I'm not sure I understand what you mean here.
>
> Gilles
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

-- 
Avijit Basak

Re: [MATH][GENETICS][PR-199] Decision on use and customization of RNG functionality for randomization

Posted by Gilles Sadowski <gi...@gmail.com>.
Hello.

Le jeu. 23 déc. 2021 à 14:22, Avijit Basak <av...@gmail.com> a écrit :
>
> Hi All
>
>          Please see my comments below.
>
> >As I've already indicated, "ThreadLocalRandomSource" is, IMHO, a
> >sort of workaround for a multi-thread application that does not want
> >to bother managing per-thread RNG instance(s).
> -- I am not clear on this. ThreadLocalRandomSource maintains
> an EnumMap<RandomSource, ThreadLocal<UniformRandomProvider>>. What is meant
> by it "does not want to bother managing per-thread RNG instance(s)" Could
> you please elaborate more on this. If this is an issue in RNG why don't we
> think of fixing the same or providing a different internal implementation.

There is no issue in "Commons RNG"; it provides a tool.
I think that it is not the right tool for a multi-threaded GA library.

>
> >The library should not make that decision for the application since we
> >can care for both usages: Every piece of the GA that needs a RNG can
> >provide factory methods that either take a "RandomSource" argument
> >or create a default one.
> -- Library can always use a default option or provide an option for
> customization at a global level but it need not be at the operator
> level(IMHO).

How can a GA operator work without a RNG?
It can't; it is one of the main settings of such an operator, and the
reason it should be customizable.

> I don't see much use of it.

That's OK; that's why I proposed for this kind a use a way to
generate a default instance, without any burden for the caller.

> >
> > >> >2. Less/no flexibility (no user's choice of random source).
> > >> -- Agreed.
> > -- Do we really need this much flexibility here?
>
> >My main concern is that IMO the RNG is a prominent part of a GA
> >and it is not a good design to use "ThreadLocalRandomSource".
> -- RNG is definitely a prominent part. However, if we have a sharing issue
> with ThreadLocalRandomSource we need to think of it's alternate
> implementation.

There is a misunderstanding; there is no sharing issue, there is
a design issue.

> >How many is "too many instances"?
> >The memory used by an operator is tiny compared to a chromosome,
> >even less to a population of chromosome, or two populations of them
> >(parents and offsprings).
> --My concern is we are trying to provide a fix for a performance problem in
> another library and that is going to consume additional memory.

Nothing (at all) that we should be worried (and discussing further):
Most RNG implementations are quite lean (a few hundred bytes to
a few KB).  You multiply this by the number of threads (a few tens
at most), and you are well below 1 MB.  What is this amount when
compared to the average Java application nowadays?

> >     So I think we have a design tradeoff here performance vs memory
> > consumption. I am more worried about memory as that might restrict use of
> > this library beyond a certain number of dimensions in some areas.
>
> >I'm referring to separate copies for each thread.
> >How many threads/virtual CPUs are common nowadays?
> >> However,
> >> creating deep copy would only be possible when we strictly restrict
> >> extension of operators which I want to avoid.
>
> >How to avoid deep copies in a multi-thread library?
> >Through synchronization?
> -- The operator interfaces are designed like a functional interface.
> Accordingly, the current implementation of all operators are read only. The
> implementation does not maintain any mutable properties during computations
> too. So they are perfectly suitable for multi-threaded operation.

Great!

> If you
> see any deviation to it please notify me.

Sure.
Sorry I did not have the time to look into the code yet.

> >
> > >> So even if we provide
> > >> the customization at the operator level we cannot avoid sharing.
> >
> > >We can, and we should.
> > >What we probably can't avoid sharing is the instance that represents the
> > >population of chromosomes.
> > *--* In a multi-threaded optimization the chromosome instances are shared
> > in case the same chromosome is chosen for crossover by the selection
> > process. I missed this point earlier.
> > ...
>
> Chromosomes can be shared (if they are read-only).
> --They are read-only.

And immutable?

> >
> > >> >  Mine is against using "ThreadLocalRandomSource"...
> > >> -- What is the wayout other than that. Please suggest.
> >
> > >I think I did.
> > *--* The factory based approach would be useful only when we can have
> > separate copies of operators for each set of operations.
>
> If we don't have separate copies in each thread, then the operator
> will not be multithreaded...
> -- If operators do not contain any mutable property then they are perfectly
> usable in a multi-threaded environment.

The problem is that they do: by necessity, the RNG instance is mutable.
You want to hide this fact through using "ThreadLocalRandomSource",
and I think that it should not be hidden.

> > *--* I think we should not block the extension.
>
> >This would be going backwards to many things that have been done
> >to improve the robustness and reduce the bug counts of the Commons
> >Math codes.
> -- GA is different from other math functions. We may not impose the same
> principle on everything.

The principle stems from putting (actually needed) robustness above of
(hypothetically needed) extensibility.
IIRC every usage of "protected" in Commons Math, on the expectation
that it might be useful for some (indefinite) use, was reverted to "private".

We should develop first with as few "public" API as possible; then if
the need arise, and your design is indeed extensible by construction,
it will just be be a matter of changing "private" to "protected" in a later
release.

> > >Initially we discussed about having a light-weight library, for easier
> > usage
> > >than alternative existing framework(s).
> > *--* We can always think of making the framework lightweight but it should
> > not cost extensibility.
>
> >There is no cost: We'll gladly merge every worthy extension into
> >the Commons component.
> -- I think we have a disconnect here. If the framework is not extensible
> how anyone would be able to use it in any new domain. Do you mean first the
> framework should be changed for any new domain and users should only use it
> out of box.

This is an open-source project.
Anyone can take the code, make whatever extension, use it for whatever
purpose (e.g. proving by a working example that it is needed), and submit
a patch so that everyone benefits in the next release.

> >
> > >> E.g. any developer should be able to extend the
> > >> IntegralChromosome class and define a child class which explicitly
> > >> specifies the range of integers to be used.
> >
> > >It does not look like this would need an extension, only configuration
> > >of the range.
> > *-- *I agree. But the question is should we block the extension.
>
> >Please find a valid use case. ;-)
> -- Recently I did an implementation of scheduling with commons-math 3.6. I
> have implemented the chromosome representing schedule by extending
> AbstractListChromosome. The mutation was also customized according to the
> requirement. However, I was able to use the existing OnePointCrossover
> operator. Do you think this kind of implementation would be possible if the
> framework does not support extensibility?

I'm lacking information, namely to understand why you could use
the "crossover" but not the "mutation".
Also isn't the chromosome in principle an abstract representation
of any solution independently of the domain (how good a solution
for the problem at hand being obtained through computing the
fitness of its associated phenotype)?

In the end, we should first wonder whether there is a design issue
that could be solved without resorting to using "protected" fields.
[My first impression about what you had to do, is that it points to a
shortcoming of the GA functionality in previous versions of CM
and the new design is an opportunity to fix that.]

> > >> I have initially implemented
> > >> the Binary chromosome and the corresponding binary mutation following
> the
> > >> same pattern. However, restricting extension of concrete classes by
> > private
> > >> constructor does not prevent users from extending the abstract parent
> > >> classes.
> >
> > >We should aim at coding the GA logic through (Java) interfaces, and not
> > >expose the "abstract" classes.
> > *-- *One of the primary reasons for me to contribute in Apache' GA library
> > is it's simplicity and extensibility.
>
> >"Extensibility" does not necessarily imply "inheritance"-based.
> -- Can you provide a solution to the above problem without an extensibility
> feature?

It depends on the scope of the library.
I'm pretty that whatever the new implementation which you are
working on, there are some problems which it won't be able to
solve even if it's inheritance-based.
Moreover it can be construed that if some user has to develop
an extension, he might rather turn to another software with that
functionality already built in.

>
> >In fact, we do want to *avoid* in order to more easily and more robustly
> >provide other advantages such as multi-threading.
> -- IMHO immutable operator design is the best choice for supporting
> multi-threading.

Agreed.
Immutability implies that all fields are "final" (hence "protected"
fields would be useless).

> It is much easier to implement even for user extension.

Agreed.
Whether we allow some classes to be non-"final" is a much
easier discussion.  No problem in doing that if it imposes no
maintenance burden.

> Why don't we think of fixing the ThreadLocalRandomSource.

As said above, nothing to fix there.

>
> >> I would like to have a framework
> >> which should be always extensible for any problem domain with minor
> >> changes.
>
> >Any problem domain should indeed be amenable to be solved
> >by the library; I don't see how that should imply a design based
> >on inheritance.
> -- Do you have any alter design in mind. Kindly share the same.

I gave some hints in previous messages; I can't promise that it
would fly without actually trying it. ;-)
But I will do it once the code is in a branch which I can modify.

>
> >> The primary reason behind this is that application domains of GA
> >> are too diverse. It is not possible to implement everything in a library.
> >> We don't know all possible domain areas too. If we remove the
> extensibility
> >> from the framework it would be useless in lots of areas.
>
> >When that occurs, people are welcome to contribute back if
> >something they need is missing.
> -- I think we have a disconnect here too. If the framework is not
> extensible how users can use this in their problem domain. If this is not
> extensible then it would never be used. How can we get back the
> contribution?

I answered to this above.

>
> >Your argument of "too much diversity" can be reversed, in that
> >it is unlikely that one library would attract everyone that needs a
> >genetic algorithm.
> -- Even if it cannot attract everyone with out of box features it should be
> extensible for those.

I don't agree with making things more complicated for us, now and
in the foreseeable future, in order to satisfy users who don't exist yet
(because the library does not exist yet).

Let's focus on making it work within a given scope, and then we can
think of improvements (that will be easy if the design is "structurally"
extensible, even if they are somehow "disabled" in the first release).

> >Better make a design that can handle a fraction of use cases,
> >and grow as needed.
> --There are already libraries which can solve most common use cases.
> Non-extensible nature would block the growth to a considerable extent.

Is there a misunderstanding about what is implied by "extensible"?
Question: Are all classes, in your current design, "immutable"?
If so, that's an excellent basis, and we should stop discussing the
meaning of "extensibility".

>
> >> >Extending the functionality, if necessary, should be contributed back
> here
> >> *-- *Sometimes the GA operators are very much specific to the domain and
> >> it's hard to generalise. In those scenarios contributing back to the
> >> library might not be possible.
>
> >In such a case, how likely will it also be that whatever general
> >framework this library has put in place, will also not be amenable
> >to that domain's specifics?
> -- Could you please frame this concern w.r.t. the scheduling example
> provided above.

?

>
> >There is always a scope from which design decisions must be taken.
> >If "multi-threading" is in the scope, then the design must avoid
> >inheritance (in public classes) in order to much more easily
> >ensure the correctness of applications.
> -- Immutable design can also take care of multi-threading.

My main point in the discussion is that all classes with "public" access
should be immutable, indeed.

>
> >> However, if a library cannot be extended for
> >> a new domain by users it becomes underutilised over time if not useless.
>
> >Sure but that is a hypothetical for the long-term.
> >However, if the library is buggy or slow, it will not be used at all.
> -- Is there any benchmark for speed/performance? GA is always infamous for
> resource consumption rather than time.

I'm not sure I understand what you mean here.

Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [MATH][GENETICS][PR-199] Decision on use and customization of RNG functionality for randomization

Posted by Avijit Basak <av...@gmail.com>.
Hi All

         Please see my comments below.

>As I've already indicated, "ThreadLocalRandomSource" is, IMHO, a
>sort of workaround for a multi-thread application that does not want
>to bother managing per-thread RNG instance(s).
-- I am not clear on this. ThreadLocalRandomSource maintains
an EnumMap<RandomSource, ThreadLocal<UniformRandomProvider>>. What is meant
by it "does not want to bother managing per-thread RNG instance(s)" Could
you please elaborate more on this. If this is an issue in RNG why don't we
think of fixing the same or providing a different internal implementation.

>The library should not make that decision for the application since we
>can care for both usages: Every piece of the GA that needs a RNG can
>provide factory methods that either take a "RandomSource" argument
>or create a default one.
-- Library can always use a default option or provide an option for
customization at a global level but it need not be at the operator
level(IMHO). I don't see much use of it.

>
> >> >2. Less/no flexibility (no user's choice of random source).
> >> -- Agreed.
> -- Do we really need this much flexibility here?

>My main concern is that IMO the RNG is a prominent part of a GA
>and it is not a good design to use "ThreadLocalRandomSource".
-- RNG is definitely a prominent part. However, if we have a sharing issue
with ThreadLocalRandomSource we need to think of it's alternate
implementation.
>How many is "too many instances"?
>The memory used by an operator is tiny compared to a chromosome,
>even less to a population of chromosome, or two populations of them
>(parents and offsprings).
--My concern is we are trying to provide a fix for a performance problem in
another library and that is going to consume additional memory.

>     So I think we have a design tradeoff here performance vs memory
> consumption. I am more worried about memory as that might restrict use of
> this library beyond a certain number of dimensions in some areas.

>I'm referring to separate copies for each thread.
>How many threads/virtual CPUs are common nowadays?
>> However,
>> creating deep copy would only be possible when we strictly restrict
>> extension of operators which I want to avoid.

>How to avoid deep copies in a multi-thread library?
>Through synchronization?
-- The operator interfaces are designed like a functional interface.
Accordingly, the current implementation of all operators are read only. The
implementation does not maintain any mutable properties during computations
too. So they are perfectly suitable for multi-threaded operation. If you
see any deviation to it please notify me.

>
> >> So even if we provide
> >> the customization at the operator level we cannot avoid sharing.
>
> >We can, and we should.
> >What we probably can't avoid sharing is the instance that represents the
> >population of chromosomes.
> *--* In a multi-threaded optimization the chromosome instances are shared
> in case the same chromosome is chosen for crossover by the selection
> process. I missed this point earlier.
> ...

Chromosomes can be shared (if they are read-only).
--They are read-only.

>
> >> >  Mine is against using "ThreadLocalRandomSource"...
> >> -- What is the wayout other than that. Please suggest.
>
> >I think I did.
> *--* The factory based approach would be useful only when we can have
> separate copies of operators for each set of operations.

If we don't have separate copies in each thread, then the operator
will not be multithreaded...
-- If operators do not contain any mutable property then they are perfectly
usable in a multi-threaded environment.

> *--* I think we should not block the extension.

>This would be going backwards to many things that have been done
>to improve the robustness and reduce the bug counts of the Commons
>Math codes.
-- GA is different from other math functions. We may not impose the same
principle on everything.

> >Initially we discussed about having a light-weight library, for easier
> usage
> >than alternative existing framework(s).
> *--* We can always think of making the framework lightweight but it should
> not cost extensibility.

>There is no cost: We'll gladly merge every worthy extension into
>the Commons component.
-- I think we have a disconnect here. If the framework is not extensible
how anyone would be able to use it in any new domain. Do you mean first the
framework should be changed for any new domain and users should only use it
out of box.

>
> >> E.g. any developer should be able to extend the
> >> IntegralChromosome class and define a child class which explicitly
> >> specifies the range of integers to be used.
>
> >It does not look like this would need an extension, only configuration
> >of the range.
> *-- *I agree. But the question is should we block the extension.

>Please find a valid use case. ;-)
-- Recently I did an implementation of scheduling with commons-math 3.6. I
have implemented the chromosome representing schedule by extending
AbstractListChromosome. The mutation was also customized according to the
requirement. However, I was able to use the existing OnePointCrossover
operator. Do you think this kind of implementation would be possible if the
framework does not support extensibility?

>
> >> I have initially implemented
> >> the Binary chromosome and the corresponding binary mutation following
the
> >> same pattern. However, restricting extension of concrete classes by
> private
> >> constructor does not prevent users from extending the abstract parent
> >> classes.
>
> >We should aim at coding the GA logic through (Java) interfaces, and not
> >expose the "abstract" classes.
> *-- *One of the primary reasons for me to contribute in Apache' GA library
> is it's simplicity and extensibility.

>"Extensibility" does not necessarily imply "inheritance"-based.
-- Can you provide a solution to the above problem without an extensibility
feature?

>In fact, we do want to *avoid* in order to more easily and more robustly
>provide other advantages such as multi-threading.
-- IMHO immutable operator design is the best choice for supporting
multi-threading. It is much easier to implement even for user extension.
Why don't we think of fixing the ThreadLocalRandomSource.

>> I would like to have a framework
>> which should be always extensible for any problem domain with minor
>> changes.

>Any problem domain should indeed be amenable to be solved
>by the library; I don't see how that should imply a design based
>on inheritance.
-- Do you have any alter design in mind. Kindly share the same.

>> The primary reason behind this is that application domains of GA
>> are too diverse. It is not possible to implement everything in a library.
>> We don't know all possible domain areas too. If we remove the
extensibility
>> from the framework it would be useless in lots of areas.

>When that occurs, people are welcome to contribute back if
>something they need is missing.
-- I think we have a disconnect here too. If the framework is not
extensible how users can use this in their problem domain. If this is not
extensible then it would never be used. How can we get back the
contribution?

>Your argument of "too much diversity" can be reversed, in that
>it is unlikely that one library would attract everyone that needs a
>genetic algorithm.
-- Even if it cannot attract everyone with out of box features it should be
extensible for those.

>Better make a design that can handle a fraction of use cases,
>and grow as needed.
--There are already libraries which can solve most common use cases.
Non-extensible nature would block the growth to a considerable extent.

>> >Extending the functionality, if necessary, should be contributed back
here
>> *-- *Sometimes the GA operators are very much specific to the domain and
>> it's hard to generalise. In those scenarios contributing back to the
>> library might not be possible.

>In such a case, how likely will it also be that whatever general
>framework this library has put in place, will also not be amenable
>to that domain's specifics?
-- Could you please frame this concern w.r.t. the scheduling example
provided above.

>There is always a scope from which design decisions must be taken.
>If "multi-threading" is in the scope, then the design must avoid
>inheritance (in public classes) in order to much more easily
>ensure the correctness of applications.
-- Immutable design can also take care of multi-threading.

>> However, if a library cannot be extended for
>> a new domain by users it becomes underutilised over time if not useless.

>Sure but that is a hypothetical for the long-term.
>However, if the library is buggy or slow, it will not be used at all.
-- Is there any benchmark for speed/performance? GA is always infamous for
resource consumption rather than time.


Thanks & Regards
--Avijit Basak

On Wed, 22 Dec 2021 at 20:32, Gilles Sadowski <gi...@gmail.com> wrote:

> Hello.
>
> Le mer. 22 déc. 2021 à 14:25, Avijit Basak <av...@gmail.com> a
> écrit :
> >
> > Hi All
> >
> >         Please see my comments below.
> >
> > >> >Several problems with this approach (raised in previous messages
> IIRC):
> > >> >1. Potential performance loss in sharing the same RNG instance.
> > >> -- As per my understanding ThreadLocalRandomSource creates separate
> > >> instances of UniformRandomProvider for each thread. So I am not sure
> how
> > a
> > >> UniformRandomProvider instance is being shared. Please correct me if
> I am
> > >> wrong.
> >
> > >Within a given thread there will be *one* RNG instance; that's what I
> meant
> > >by "shared".
> > >Of course you are right that that instance is not shared by multiple
> > threads
> > >(which would be a bug).
> > >The performance loss is because it will be necessary to call
> > >  ThreadLocalRandomSource.current(RandomSource source)
> > >for each access to the RNG (since it would be a bug to store the
> returned
> > >value in e.g. an operator instance that would be shared among threads
> (as
> > >you suggest below).
> >
> > -- I tried to do a small test on it and here are the results. Output
> times
> > are in milliseconds. According to my understanding the performance loss
> is
> > mostly during creation of per thread instance of UniformRandomProvider.
> > --*CUT*--
> >     @Test
> >     void test() {
> >         int limit = 1;
> >         long start = System.currentTimeMillis();
> >         for (int i = 0; i < limit; i++) {
> >             ThreadLocalRandomSource.current(RandomSource.JDK);
> >         }
> >         System.out.println(System.currentTimeMillis() - start);
> >
> >         limit = 1000;
> >         start = System.currentTimeMillis();
> >         for (int i = 0; i < limit; i++) {
> >             ThreadLocalRandomSource.current(RandomSource.JDK);
> >         }
> >         System.out.println(System.currentTimeMillis() - start);
> >
> >         limit = 10000;
> >         start = System.currentTimeMillis();
> >         for (int i = 0; i < limit; i++) {
> >             ThreadLocalRandomSource.current(RandomSource.JDK);
> >         }
> >         System.out.println(System.currentTimeMillis() - start);
> >
> >         limit = 100000;
> >         start = System.currentTimeMillis();
> >         for (int i = 0; i < limit; i++) {
> >             ThreadLocalRandomSource.current(RandomSource.JDK);
> >         }
> >         System.out.println(System.currentTimeMillis() - start);
> >
> >         limit = 1000000;
> >         start = System.currentTimeMillis();
> >         for (int i = 0; i < limit; i++) {
> >             ThreadLocalRandomSource.current(RandomSource.JDK);
> >         }
> >         System.out.println(System.currentTimeMillis() - start);
> >
> >         limit = 10000000;
> >         start = System.currentTimeMillis();
> >         for (int i = 0; i < limit; i++) {
> >             ThreadLocalRandomSource.current(RandomSource.JDK);
> >         }
> >         System.out.println(System.currentTimeMillis() - start);
> >
> >         limit = 100000000;
> >         start = System.currentTimeMillis();
> >         for (int i = 0; i < limit; i++) {
> >             ThreadLocalRandomSource.current(RandomSource.JDK);
> >         }
> >         System.out.println(System.currentTimeMillis() - start);
> >
> >         limit = 1000000000;
> >         start = System.currentTimeMillis();
> >         for (int i = 0; i < limit; i++) {
> >             ThreadLocalRandomSource.current(RandomSource.JDK);
> >         }
> >         System.out.println(System.currentTimeMillis() - start);
> >     }
> > --*CUT*--
> > --*output*--
> > 363
> > 1
> > 2
> > 4
> > 6
> > 28
> > 244
> > 2423
> > --*output*--
>
> As I've already indicated, "ThreadLocalRandomSource" is, IMHO, a
> sort of workaround for a multi-thread application that does not want
> to bother managing per-thread RNG instance(s).
> The library should not make that decision for the application since we
> can care for both usages: Every piece of the GA that needs a RNG can
> provide factory methods that either take a "RandomSource" argument
> or create a default one.
>
> Note that your above custom benchmark is likely to mean nothing
> (please see e.g. "Commons RNG" on how to create JMH based
> benchmarks).
>
> >
> > >> >2. Less/no flexibility (no user's choice of random source).
> > >> -- Agreed.
> > -- Do we really need this much flexibility here?
>
> My main concern is that IMO the RNG is a prominent part of a GA
> and it is not a good design to use "ThreadLocalRandomSource".
>
> > >> >3. Error-prone (user can access/reuse the "UniformRandomProvider"
> > >> instances).
> > >>
> > >> >Again: "ThreadLocalRandomSource" is an ad-hoc workaround for correct
> but
> > >> >"light" usage of random number generation in a multi-threaded
> > application;
> > >> GAs
> > >> >make "heavy" use of RNG, thus it is does not seem outlandish that all
> > the
> > >> RNG
> > >> >"clients" (e.g. every "operator") creates their own instances.
> > >
> > >
> > >> >IMHO, a more important discussion would be about the expectations in
> a
> > >> >multithreaded context: E.g. should an operator be shareable by
> different
> > >> >threads?  And if not, how does the API help application developers to
> > avoid
> > >> >such pitfalls?
> > >> -- Once we implement multi-threading in GA, same crossover and
> mutation
> > >> operators will be re-used across multiple threads.
> >
> > >I would be wary to go on that path; better consider making (deep)
> copies.
> > >We can have multiple instances of an operator, all being configured in
> the
> > >same way but being different instances with no risk of a multithreading
> > bug.
> >
> > -- I don't think this would be a good design choice just to support
> > customization of RNG functionality. This will lead to too many instances
> of
> > the same operators resulting in lots of unnecessary memory consumption. I
> > think we might face memory issues for higher dimensional problems. As
> > population size requirement also increases with increase of dimension
> this
> > might lead to a major issue and need a thought.
>
> How many is "too many instances"?
> The memory used by an operator is tiny compared to a chromosome,
> even less to a population of chromosome, or two populations of them
> (parents and offsprings).
>
> >     So I think we have a design tradeoff here performance vs memory
> > consumption. I am more worried about memory as that might restrict use of
> > this library beyond a certain number of dimensions in some areas.
>
> I'm referring to separate copies for each thread.
> How many threads/virtual CPUs are common nowadays?
>
> > However,
> > creating deep copy would only be possible when we strictly restrict
> > extension of operators which I want to avoid.
>
> How to avoid deep copies in a multi-thread library?
> Through synchronization?
>
> >
> > >> So even if we provide
> > >> the customization at the operator level we cannot avoid sharing.
> >
> > >We can, and we should.
> > >What we probably can't avoid sharing is the instance that represents the
> > >population of chromosomes.
> > *--* In a multi-threaded optimization the chromosome instances are shared
> > in case the same chromosome is chosen for crossover by the selection
> > process. I missed this point earlier.
> > ...
>
> Chromosomes can be shared (if they are read-only).
>
> >
> > >> >  Mine is against using "ThreadLocalRandomSource"...
> > >> -- What is the wayout other than that. Please suggest.
> >
> > >I think I did.
> > *--* The factory based approach would be useful only when we can have
> > separate copies of operators for each set of operations.
>
> If we don't have separate copies in each thread, then the operator
> will not be multithreaded...
>
> > >Maybe it's time to create a dedicated branch for the GA functionality
> > >so that we can try out the different approaches.
> >
> >
> > >
> > > >> I think first we need to decide on whether we really need this
> > > >> customization and if yes then why. Then we can decide on alternate
> > > >> implementation options.
> > > >
> > > >> >As per the recent updates of the math-related code bases, the
> > > >> >public API should provide factory methods (constructors should
> > > >> >be private).
> > > >> -- private constructors will make public API classes non-extensible.
> > This
> > > >> will severely restrict the extensibility of this framework which I
> want
> > > to
> > > >> avoid. I am not sure why we need to remove public constructors. It
> > would
> > > be
> > > >> helpful if you could refer me to any relevant discussion thread.
> > >
> > > >  Allowing extensibility is a huge burden on library maintainers.  The
> > > >  library must have been designed to support it; hence, you should
> > > >  first describe what kind(s) of extensions (with usage examples) you
> > > >  have in mind.
> > > --The library should be extensible to support customization. Users
> should
> > > be able to customise or provide their own implementation of genetic
> > > operators for crossover and mutation. The chromosome classes should
> also
> > be
> > > open for extension.
> >
> > >I don't get why we should support extensions outside this library.
> > *--* I think we should not block the extension.
>
> This would be going backwards to many things that have been done
> to improve the robustness and reduce the bug counts of the Commons
> Math codes.
>
> >
> > >Initially we discussed about having a light-weight library, for easier
> > usage
> > >than alternative existing framework(s).
> > *--* We can always think of making the framework lightweight but it
> should
> > not cost extensibility.
>
> There is no cost: We'll gladly merge every worthy extension into
> the Commons component.
>
> >
> > >> E.g. any developer should be able to extend the
> > >> IntegralChromosome class and define a child class which explicitly
> > >> specifies the range of integers to be used.
> >
> > >It does not look like this would need an extension, only configuration
> > >of the range.
> > *-- *I agree. But the question is should we block the extension.
>
> Please find a valid use case. ;-)
>
> >
> > >> I have initially implemented
> > >> the Binary chromosome and the corresponding binary mutation following
> the
> > >> same pattern. However, restricting extension of concrete classes by
> > private
> > >> constructor does not prevent users from extending the abstract parent
> > >> classes.
> >
> > >We should aim at coding the GA logic through (Java) interfaces, and not
> > >expose the "abstract" classes.
> > *-- *One of the primary reasons for me to contribute in Apache' GA
> library
> > is it's simplicity and extensibility.
>
> "Extensibility" does not necessarily imply "inheritance"-based.
> In fact, we do want to *avoid* in order to more easily and more robustly
> provide other advantages such as multi-threading.
>
> > I would like to have a framework
> > which should be always extensible for any problem domain with minor
> > changes.
>
> Any problem domain should indeed be amenable to be solved
> by the library; I don't see how that should imply a design based
> on inheritance.
>
> > The primary reason behind this is that application domains of GA
> > are too diverse. It is not possible to implement everything in a library.
> > We don't know all possible domain areas too. If we remove the
> extensibility
> > from the framework it would be useless in lots of areas.
>
> When that occurs, people are welcome to contribute back if
> something they need is missing.
> Your argument of "too much diversity" can be reversed, in that
> it is unlikely that one library would attract everyone that needs a
> genetic algorithm.
> Better make a design that can handle a fraction of use cases,
> and grow as needed.
>
> >
> > >Extending the functionality, if necessary, should be contributed back
> here
> > *-- *Sometimes the GA operators are very much specific to the domain and
> > it's hard to generalise. In those scenarios contributing back to the
> > library might not be possible.
>
> In such a case, how likely will it also be that whatever general
> framework this library has put in place, will also not be amenable
> to that domain's specifics?
> There is always a scope from which design decisions must be taken.
>
> If "multi-threading" is in the scope, then the design must avoid
> inheritance (in public classes) in order to much more easily
> ensure the correctness of applications.
>
> > However, if a library cannot be extended for
> > a new domain by users it becomes underutilised over time if not useless.
>
> Sure but that is a hypothetical for the long-term.
> However, if the library is buggy or slow, it will not be used at all.
>
> Regards,
> Gillles
>
> >>> [...]
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

-- 
Avijit Basak

Re: [MATH][GENETICS][PR-199] Decision on use and customization of RNG functionality for randomization

Posted by Gilles Sadowski <gi...@gmail.com>.
Hello.

Le mer. 22 déc. 2021 à 14:25, Avijit Basak <av...@gmail.com> a écrit :
>
> Hi All
>
>         Please see my comments below.
>
> >> >Several problems with this approach (raised in previous messages IIRC):
> >> >1. Potential performance loss in sharing the same RNG instance.
> >> -- As per my understanding ThreadLocalRandomSource creates separate
> >> instances of UniformRandomProvider for each thread. So I am not sure how
> a
> >> UniformRandomProvider instance is being shared. Please correct me if I am
> >> wrong.
>
> >Within a given thread there will be *one* RNG instance; that's what I meant
> >by "shared".
> >Of course you are right that that instance is not shared by multiple
> threads
> >(which would be a bug).
> >The performance loss is because it will be necessary to call
> >  ThreadLocalRandomSource.current(RandomSource source)
> >for each access to the RNG (since it would be a bug to store the returned
> >value in e.g. an operator instance that would be shared among threads (as
> >you suggest below).
>
> -- I tried to do a small test on it and here are the results. Output times
> are in milliseconds. According to my understanding the performance loss is
> mostly during creation of per thread instance of UniformRandomProvider.
> --*CUT*--
>     @Test
>     void test() {
>         int limit = 1;
>         long start = System.currentTimeMillis();
>         for (int i = 0; i < limit; i++) {
>             ThreadLocalRandomSource.current(RandomSource.JDK);
>         }
>         System.out.println(System.currentTimeMillis() - start);
>
>         limit = 1000;
>         start = System.currentTimeMillis();
>         for (int i = 0; i < limit; i++) {
>             ThreadLocalRandomSource.current(RandomSource.JDK);
>         }
>         System.out.println(System.currentTimeMillis() - start);
>
>         limit = 10000;
>         start = System.currentTimeMillis();
>         for (int i = 0; i < limit; i++) {
>             ThreadLocalRandomSource.current(RandomSource.JDK);
>         }
>         System.out.println(System.currentTimeMillis() - start);
>
>         limit = 100000;
>         start = System.currentTimeMillis();
>         for (int i = 0; i < limit; i++) {
>             ThreadLocalRandomSource.current(RandomSource.JDK);
>         }
>         System.out.println(System.currentTimeMillis() - start);
>
>         limit = 1000000;
>         start = System.currentTimeMillis();
>         for (int i = 0; i < limit; i++) {
>             ThreadLocalRandomSource.current(RandomSource.JDK);
>         }
>         System.out.println(System.currentTimeMillis() - start);
>
>         limit = 10000000;
>         start = System.currentTimeMillis();
>         for (int i = 0; i < limit; i++) {
>             ThreadLocalRandomSource.current(RandomSource.JDK);
>         }
>         System.out.println(System.currentTimeMillis() - start);
>
>         limit = 100000000;
>         start = System.currentTimeMillis();
>         for (int i = 0; i < limit; i++) {
>             ThreadLocalRandomSource.current(RandomSource.JDK);
>         }
>         System.out.println(System.currentTimeMillis() - start);
>
>         limit = 1000000000;
>         start = System.currentTimeMillis();
>         for (int i = 0; i < limit; i++) {
>             ThreadLocalRandomSource.current(RandomSource.JDK);
>         }
>         System.out.println(System.currentTimeMillis() - start);
>     }
> --*CUT*--
> --*output*--
> 363
> 1
> 2
> 4
> 6
> 28
> 244
> 2423
> --*output*--

As I've already indicated, "ThreadLocalRandomSource" is, IMHO, a
sort of workaround for a multi-thread application that does not want
to bother managing per-thread RNG instance(s).
The library should not make that decision for the application since we
can care for both usages: Every piece of the GA that needs a RNG can
provide factory methods that either take a "RandomSource" argument
or create a default one.

Note that your above custom benchmark is likely to mean nothing
(please see e.g. "Commons RNG" on how to create JMH based
benchmarks).

>
> >> >2. Less/no flexibility (no user's choice of random source).
> >> -- Agreed.
> -- Do we really need this much flexibility here?

My main concern is that IMO the RNG is a prominent part of a GA
and it is not a good design to use "ThreadLocalRandomSource".

> >> >3. Error-prone (user can access/reuse the "UniformRandomProvider"
> >> instances).
> >>
> >> >Again: "ThreadLocalRandomSource" is an ad-hoc workaround for correct but
> >> >"light" usage of random number generation in a multi-threaded
> application;
> >> GAs
> >> >make "heavy" use of RNG, thus it is does not seem outlandish that all
> the
> >> RNG
> >> >"clients" (e.g. every "operator") creates their own instances.
> >
> >
> >> >IMHO, a more important discussion would be about the expectations in a
> >> >multithreaded context: E.g. should an operator be shareable by different
> >> >threads?  And if not, how does the API help application developers to
> avoid
> >> >such pitfalls?
> >> -- Once we implement multi-threading in GA, same crossover and mutation
> >> operators will be re-used across multiple threads.
>
> >I would be wary to go on that path; better consider making (deep) copies.
> >We can have multiple instances of an operator, all being configured in the
> >same way but being different instances with no risk of a multithreading
> bug.
>
> -- I don't think this would be a good design choice just to support
> customization of RNG functionality. This will lead to too many instances of
> the same operators resulting in lots of unnecessary memory consumption. I
> think we might face memory issues for higher dimensional problems. As
> population size requirement also increases with increase of dimension this
> might lead to a major issue and need a thought.

How many is "too many instances"?
The memory used by an operator is tiny compared to a chromosome,
even less to a population of chromosome, or two populations of them
(parents and offsprings).

>     So I think we have a design tradeoff here performance vs memory
> consumption. I am more worried about memory as that might restrict use of
> this library beyond a certain number of dimensions in some areas.

I'm referring to separate copies for each thread.
How many threads/virtual CPUs are common nowadays?

> However,
> creating deep copy would only be possible when we strictly restrict
> extension of operators which I want to avoid.

How to avoid deep copies in a multi-thread library?
Through synchronization?

>
> >> So even if we provide
> >> the customization at the operator level we cannot avoid sharing.
>
> >We can, and we should.
> >What we probably can't avoid sharing is the instance that represents the
> >population of chromosomes.
> *--* In a multi-threaded optimization the chromosome instances are shared
> in case the same chromosome is chosen for crossover by the selection
> process. I missed this point earlier.
> ...

Chromosomes can be shared (if they are read-only).

>
> >> >  Mine is against using "ThreadLocalRandomSource"...
> >> -- What is the wayout other than that. Please suggest.
>
> >I think I did.
> *--* The factory based approach would be useful only when we can have
> separate copies of operators for each set of operations.

If we don't have separate copies in each thread, then the operator
will not be multithreaded...

> >Maybe it's time to create a dedicated branch for the GA functionality
> >so that we can try out the different approaches.
>
>
> >
> > >> I think first we need to decide on whether we really need this
> > >> customization and if yes then why. Then we can decide on alternate
> > >> implementation options.
> > >
> > >> >As per the recent updates of the math-related code bases, the
> > >> >public API should provide factory methods (constructors should
> > >> >be private).
> > >> -- private constructors will make public API classes non-extensible.
> This
> > >> will severely restrict the extensibility of this framework which I want
> > to
> > >> avoid. I am not sure why we need to remove public constructors. It
> would
> > be
> > >> helpful if you could refer me to any relevant discussion thread.
> >
> > >  Allowing extensibility is a huge burden on library maintainers.  The
> > >  library must have been designed to support it; hence, you should
> > >  first describe what kind(s) of extensions (with usage examples) you
> > >  have in mind.
> > --The library should be extensible to support customization. Users should
> > be able to customise or provide their own implementation of genetic
> > operators for crossover and mutation. The chromosome classes should also
> be
> > open for extension.
>
> >I don't get why we should support extensions outside this library.
> *--* I think we should not block the extension.

This would be going backwards to many things that have been done
to improve the robustness and reduce the bug counts of the Commons
Math codes.

>
> >Initially we discussed about having a light-weight library, for easier
> usage
> >than alternative existing framework(s).
> *--* We can always think of making the framework lightweight but it should
> not cost extensibility.

There is no cost: We'll gladly merge every worthy extension into
the Commons component.

>
> >> E.g. any developer should be able to extend the
> >> IntegralChromosome class and define a child class which explicitly
> >> specifies the range of integers to be used.
>
> >It does not look like this would need an extension, only configuration
> >of the range.
> *-- *I agree. But the question is should we block the extension.

Please find a valid use case. ;-)

>
> >> I have initially implemented
> >> the Binary chromosome and the corresponding binary mutation following the
> >> same pattern. However, restricting extension of concrete classes by
> private
> >> constructor does not prevent users from extending the abstract parent
> >> classes.
>
> >We should aim at coding the GA logic through (Java) interfaces, and not
> >expose the "abstract" classes.
> *-- *One of the primary reasons for me to contribute in Apache' GA library
> is it's simplicity and extensibility.

"Extensibility" does not necessarily imply "inheritance"-based.
In fact, we do want to *avoid* in order to more easily and more robustly
provide other advantages such as multi-threading.

> I would like to have a framework
> which should be always extensible for any problem domain with minor
> changes.

Any problem domain should indeed be amenable to be solved
by the library; I don't see how that should imply a design based
on inheritance.

> The primary reason behind this is that application domains of GA
> are too diverse. It is not possible to implement everything in a library.
> We don't know all possible domain areas too. If we remove the extensibility
> from the framework it would be useless in lots of areas.

When that occurs, people are welcome to contribute back if
something they need is missing.
Your argument of "too much diversity" can be reversed, in that
it is unlikely that one library would attract everyone that needs a
genetic algorithm.
Better make a design that can handle a fraction of use cases,
and grow as needed.

>
> >Extending the functionality, if necessary, should be contributed back here
> *-- *Sometimes the GA operators are very much specific to the domain and
> it's hard to generalise. In those scenarios contributing back to the
> library might not be possible.

In such a case, how likely will it also be that whatever general
framework this library has put in place, will also not be amenable
to that domain's specifics?
There is always a scope from which design decisions must be taken.

If "multi-threading" is in the scope, then the design must avoid
inheritance (in public classes) in order to much more easily
ensure the correctness of applications.

> However, if a library cannot be extended for
> a new domain by users it becomes underutilised over time if not useless.

Sure but that is a hypothetical for the long-term.
However, if the library is buggy or slow, it will not be used at all.

Regards,
Gillles

>>> [...]

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [MATH][GENETICS][PR-199] Decision on use and customization of RNG functionality for randomization

Posted by Avijit Basak <av...@gmail.com>.
Hi All

        Please see my *changed* comments below.

>> >  Mine is against using "ThreadLocalRandomSource"...
>> -- What is the wayout other than that. Please suggest.

>I think I did.
>>*--* The factory based approach would be useful only when we can have
separate copies of operators for each set of operations.
*--* *T*he factory based approach can introduce *custom* RNG, but it can
improve performance only when we can have separate copies of operators for
each set of operations which might lead to *memory issues* as explained in
previous mail.


Thanks & Regards
--Avijit Basak

On Wed, 22 Dec 2021 at 18:54, Avijit Basak <av...@gmail.com> wrote:

> Hi All
>
>         Please see my comments below.
>
> >> >Several problems with this approach (raised in previous messages IIRC):
> >> >1. Potential performance loss in sharing the same RNG instance.
> >> -- As per my understanding ThreadLocalRandomSource creates separate
> >> instances of UniformRandomProvider for each thread. So I am not sure
> how a
> >> UniformRandomProvider instance is being shared. Please correct me if I
> am
> >> wrong.
>
> >Within a given thread there will be *one* RNG instance; that's what I
> meant
> >by "shared".
> >Of course you are right that that instance is not shared by multiple
> threads
> >(which would be a bug).
> >The performance loss is because it will be necessary to call
> >  ThreadLocalRandomSource.current(RandomSource source)
> >for each access to the RNG (since it would be a bug to store the returned
> >value in e.g. an operator instance that would be shared among threads (as
> >you suggest below).
>
> -- I tried to do a small test on it and here are the results. Output times
> are in milliseconds. According to my understanding the performance loss is
> mostly during creation of per thread instance of UniformRandomProvider.
> --*CUT*--
>     @Test
>     void test() {
>         int limit = 1;
>         long start = System.currentTimeMillis();
>         for (int i = 0; i < limit; i++) {
>             ThreadLocalRandomSource.current(RandomSource.JDK);
>         }
>         System.out.println(System.currentTimeMillis() - start);
>
>         limit = 1000;
>         start = System.currentTimeMillis();
>         for (int i = 0; i < limit; i++) {
>             ThreadLocalRandomSource.current(RandomSource.JDK);
>         }
>         System.out.println(System.currentTimeMillis() - start);
>
>         limit = 10000;
>         start = System.currentTimeMillis();
>         for (int i = 0; i < limit; i++) {
>             ThreadLocalRandomSource.current(RandomSource.JDK);
>         }
>         System.out.println(System.currentTimeMillis() - start);
>
>         limit = 100000;
>         start = System.currentTimeMillis();
>         for (int i = 0; i < limit; i++) {
>             ThreadLocalRandomSource.current(RandomSource.JDK);
>         }
>         System.out.println(System.currentTimeMillis() - start);
>
>         limit = 1000000;
>         start = System.currentTimeMillis();
>         for (int i = 0; i < limit; i++) {
>             ThreadLocalRandomSource.current(RandomSource.JDK);
>         }
>         System.out.println(System.currentTimeMillis() - start);
>
>         limit = 10000000;
>         start = System.currentTimeMillis();
>         for (int i = 0; i < limit; i++) {
>             ThreadLocalRandomSource.current(RandomSource.JDK);
>         }
>         System.out.println(System.currentTimeMillis() - start);
>
>         limit = 100000000;
>         start = System.currentTimeMillis();
>         for (int i = 0; i < limit; i++) {
>             ThreadLocalRandomSource.current(RandomSource.JDK);
>         }
>         System.out.println(System.currentTimeMillis() - start);
>
>         limit = 1000000000;
>         start = System.currentTimeMillis();
>         for (int i = 0; i < limit; i++) {
>             ThreadLocalRandomSource.current(RandomSource.JDK);
>         }
>         System.out.println(System.currentTimeMillis() - start);
>     }
> --*CUT*--
> --*output*--
> 363
> 1
> 2
> 4
> 6
> 28
> 244
> 2423
> --*output*--
>
> >> >2. Less/no flexibility (no user's choice of random source).
> >> -- Agreed.
> -- Do we really need this much flexibility here?
> >> >3. Error-prone (user can access/reuse the "UniformRandomProvider"
> >> instances).
> >>
> >> >Again: "ThreadLocalRandomSource" is an ad-hoc workaround for correct
> but
> >> >"light" usage of random number generation in a multi-threaded
> application;
> >> GAs
> >> >make "heavy" use of RNG, thus it is does not seem outlandish that all
> the
> >> RNG
> >> >"clients" (e.g. every "operator") creates their own instances.
> >
> >
> >> >IMHO, a more important discussion would be about the expectations in a
> >> >multithreaded context: E.g. should an operator be shareable by
> different
> >> >threads?  And if not, how does the API help application developers to
> avoid
> >> >such pitfalls?
> >> -- Once we implement multi-threading in GA, same crossover and mutation
> >> operators will be re-used across multiple threads.
>
> >I would be wary to go on that path; better consider making (deep) copies.
> >We can have multiple instances of an operator, all being configured in the
> >same way but being different instances with no risk of a multithreading
> bug.
>
> -- I don't think this would be a good design choice just to support
> customization of RNG functionality. This will lead to too many instances of
> the same operators resulting in lots of unnecessary memory consumption. I
> think we might face memory issues for higher dimensional problems. As
> population size requirement also increases with increase of dimension this
> might lead to a major issue and need a thought.
>     So I think we have a design tradeoff here performance vs memory
> consumption. I am more worried about memory as that might restrict use of
> this library beyond a certain number of dimensions in some areas. However,
> creating deep copy would only be possible when we strictly restrict
> extension of operators which I want to avoid.
>
> >> So even if we provide
> >> the customization at the operator level we cannot avoid sharing.
>
> >We can, and we should.
> >What we probably can't avoid sharing is the instance that represents the
> >population of chromosomes.
> *--* In a multi-threaded optimization the chromosome instances are shared
> in case the same chromosome is chosen for crossover by the selection
> process. I missed this point earlier.
> ...
>
> >> >  Mine is against using "ThreadLocalRandomSource"...
> >> -- What is the wayout other than that. Please suggest.
>
> >I think I did.
> *--* The factory based approach would be useful only when we can have
> separate copies of operators for each set of operations.
>
> >Maybe it's time to create a dedicated branch for the GA functionality
> >so that we can try out the different approaches.
>
>
> >
> > >> I think first we need to decide on whether we really need this
> > >> customization and if yes then why. Then we can decide on alternate
> > >> implementation options.
> > >
> > >> >As per the recent updates of the math-related code bases, the
> > >> >public API should provide factory methods (constructors should
> > >> >be private).
> > >> -- private constructors will make public API classes non-extensible.
> This
> > >> will severely restrict the extensibility of this framework which I
> want
> > to
> > >> avoid. I am not sure why we need to remove public constructors. It
> would
> > be
> > >> helpful if you could refer me to any relevant discussion thread.
> >
> > >  Allowing extensibility is a huge burden on library maintainers.  The
> > >  library must have been designed to support it; hence, you should
> > >  first describe what kind(s) of extensions (with usage examples) you
> > >  have in mind.
> > --The library should be extensible to support customization. Users should
> > be able to customise or provide their own implementation of genetic
> > operators for crossover and mutation. The chromosome classes should also
> be
> > open for extension.
>
> >I don't get why we should support extensions outside this library.
> *--* I think we should not block the extension.
>
> >Initially we discussed about having a light-weight library, for easier
> usage
> >than alternative existing framework(s).
> *--* We can always think of making the framework lightweight but it
> should not cost extensibility.
>
> >> E.g. any developer should be able to extend the
> >> IntegralChromosome class and define a child class which explicitly
> >> specifies the range of integers to be used.
>
> >It does not look like this would need an extension, only configuration
> >of the range.
> *-- *I agree. But the question is should we block the extension.
>
> >> I have initially implemented
> >> the Binary chromosome and the corresponding binary mutation following
> the
> >> same pattern. However, restricting extension of concrete classes by
> private
> >> constructor does not prevent users from extending the abstract parent
> >> classes.
>
> >We should aim at coding the GA logic through (Java) interfaces, and not
> >expose the "abstract" classes.
> *-- *One of the primary reasons for me to contribute in Apache' GA
> library is it's simplicity and extensibility. I would like to have a
> framework which should be always extensible for any problem domain with
> minor changes. The primary reason behind this is that application domains
> of GA are too diverse. It is not possible to implement everything in a
> library. We don't know all possible domain areas too. If we remove the
> extensibility from the framework it would be useless in lots of areas.
>
> >Extending the functionality, if necessary, should be contributed back here
> *-- *Sometimes the GA operators are very much specific to the domain and
> it's hard to generalise. In those scenarios contributing back to the
> library might not be possible. However, if a library cannot be extended for
> a new domain by users it becomes underutilised over time if not useless.
>
>
> Thanks & Regards
> --Avijit Basak
>
> On Tue, 21 Dec 2021 at 22:05, Gilles Sadowski <gi...@gmail.com>
> wrote:
>
>> Hello.
>>
>> Le mar. 21 déc. 2021 à 16:21, Avijit Basak <av...@gmail.com> a
>> écrit :
>> >
>> > Hi All
>> >
>> >         Please see my comments. Sorry for the delayed response.
>> >
>> > >Several problems with this approach (raised in previous messages IIRC):
>> > >1. Potential performance loss in sharing the same RNG instance.
>> > -- As per my understanding ThreadLocalRandomSource creates separate
>> > instances of UniformRandomProvider for each thread. So I am not sure
>> how a
>> > UniformRandomProvider instance is being shared. Please correct me if I
>> am
>> > wrong.
>>
>> Within a given thread there will be *one* RNG instance; that's what I
>> meant
>> by "shared".
>> Of course you are right that that instance is not shared by multiple
>> threads
>> (which would be a bug).
>> The performance loss is because it will be necessary to call
>>   ThreadLocalRandomSource.current(RandomSource source)
>> for each access to the RNG (since it would be a bug to store the returned
>> value in e.g. an operator instance that would be shared among threads (as
>> you suggest below).
>>
>> > >2. Less/no flexibility (no user's choice of random source).
>> > -- Agreed.
>> > >3. Error-prone (user can access/reuse the "UniformRandomProvider"
>> > instances).
>> >
>> > >Again: "ThreadLocalRandomSource" is an ad-hoc workaround for correct
>> but
>> > >"light" usage of random number generation in a multi-threaded
>> application;
>> > GAs
>> > >make "heavy" use of RNG, thus it is does not seem outlandish that all
>> the
>> > RNG
>> > >"clients" (e.g. every "operator") creates their own instances.
>> >
>> >
>> > >IMHO, a more important discussion would be about the expectations in a
>> > >multithreaded context: E.g. should an operator be shareable by
>> different
>> > >threads?  And if not, how does the API help application developers to
>> avoid
>> > >such pitfalls?
>> > -- Once we implement multi-threading in GA, same crossover and mutation
>> > operators will be re-used across multiple threads.
>>
>> I would be wary to go on that path; better consider making (deep) copies.
>> We can have multiple instances of an operator, all being configured in the
>> same way but being different instances with no risk of a multithreading
>> bug.
>>
>> > So even if we provide
>> > the customization at the operator level we cannot avoid sharing.
>>
>> We can, and we should.
>> What we probably can't avoid sharing is the instance that represents the
>> population of chromosomes.
>>
>> >
>> > >> My original implementation did not allow any customization of
>> > RandomSource
>> > >> instances. There was a thought in review for customization of
>> > RandomSource,
>> > >> so these options were considered. I don't think this would make any
>> > >> difference to algorithm functionality.
>> >
>> > >  Quite right.  But the customization can come at zero cost for the
>> users
>> > >  who don't need it. Admittedly it's a little more work on the part of
>> the
>> > >  developer(s) but it's a one off cost (and I'm fine working on that
>> part
>> > of
>> > >  the library once other, more important, things have been settled).
>> >
>> > >> Even earlier I used Math.random()
>> > >> which worked equally well. So my *vote* should be *against* this
>> > >> customization.
>> >
>> > >  Mine is against using "ThreadLocalRandomSource"...
>> > -- What is the wayout other than that. Please suggest.
>>
>> I think I did.
>> Maybe it's time to create a dedicated branch for the GA functionality
>> so that we can try out the different approaches.
>>
>> >
>> > >> I think first we need to decide on whether we really need this
>> > >> customization and if yes then why. Then we can decide on alternate
>> > >> implementation options.
>> > >
>> > >> >As per the recent updates of the math-related code bases, the
>> > >> >public API should provide factory methods (constructors should
>> > >> >be private).
>> > >> -- private constructors will make public API classes non-extensible.
>> This
>> > >> will severely restrict the extensibility of this framework which I
>> want
>> > to
>> > >> avoid. I am not sure why we need to remove public constructors. It
>> would
>> > be
>> > >> helpful if you could refer me to any relevant discussion thread.
>> >
>> > >  Allowing extensibility is a huge burden on library maintainers.  The
>> > >  library must have been designed to support it; hence, you should
>> > >  first describe what kind(s) of extensions (with usage examples) you
>> > >  have in mind.
>> > --The library should be extensible to support customization. Users
>> should
>> > be able to customise or provide their own implementation of genetic
>> > operators for crossover and mutation. The chromosome classes should
>> also be
>> > open for extension.
>>
>> I don't get why we should support extensions outside this library.
>> Initially we discussed about having a light-weight library, for easier
>> usage
>> than alternative existing framework(s).
>>
>> > E.g. any developer should be able to extend the
>> > IntegralChromosome class and define a child class which explicitly
>> > specifies the range of integers to be used.
>>
>> It does not look like this would need an extension, only configuration
>> of the range.
>>
>> > I have initially implemented
>> > the Binary chromosome and the corresponding binary mutation following
>> the
>> > same pattern. However, restricting extension of concrete classes by
>> private
>> > constructor does not prevent users from extending the abstract parent
>> > classes.
>>
>> We should aim at coding the GA logic through (Java) interfaces, and not
>> expose the "abstract" classes.
>> Extending the functionality, if necessary, should be contributed back
>> here.
>>
>> Regards,
>> Gilles
>>
>> >>> [...]
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>
>
> --
> Avijit Basak
>


-- 
Avijit Basak

Re: [MATH][GENETICS][PR-199] Decision on use and customization of RNG functionality for randomization

Posted by Avijit Basak <av...@gmail.com>.
Hi All

        Please see my comments below.

>> >Several problems with this approach (raised in previous messages IIRC):
>> >1. Potential performance loss in sharing the same RNG instance.
>> -- As per my understanding ThreadLocalRandomSource creates separate
>> instances of UniformRandomProvider for each thread. So I am not sure how
a
>> UniformRandomProvider instance is being shared. Please correct me if I am
>> wrong.

>Within a given thread there will be *one* RNG instance; that's what I meant
>by "shared".
>Of course you are right that that instance is not shared by multiple
threads
>(which would be a bug).
>The performance loss is because it will be necessary to call
>  ThreadLocalRandomSource.current(RandomSource source)
>for each access to the RNG (since it would be a bug to store the returned
>value in e.g. an operator instance that would be shared among threads (as
>you suggest below).

-- I tried to do a small test on it and here are the results. Output times
are in milliseconds. According to my understanding the performance loss is
mostly during creation of per thread instance of UniformRandomProvider.
--*CUT*--
    @Test
    void test() {
        int limit = 1;
        long start = System.currentTimeMillis();
        for (int i = 0; i < limit; i++) {
            ThreadLocalRandomSource.current(RandomSource.JDK);
        }
        System.out.println(System.currentTimeMillis() - start);

        limit = 1000;
        start = System.currentTimeMillis();
        for (int i = 0; i < limit; i++) {
            ThreadLocalRandomSource.current(RandomSource.JDK);
        }
        System.out.println(System.currentTimeMillis() - start);

        limit = 10000;
        start = System.currentTimeMillis();
        for (int i = 0; i < limit; i++) {
            ThreadLocalRandomSource.current(RandomSource.JDK);
        }
        System.out.println(System.currentTimeMillis() - start);

        limit = 100000;
        start = System.currentTimeMillis();
        for (int i = 0; i < limit; i++) {
            ThreadLocalRandomSource.current(RandomSource.JDK);
        }
        System.out.println(System.currentTimeMillis() - start);

        limit = 1000000;
        start = System.currentTimeMillis();
        for (int i = 0; i < limit; i++) {
            ThreadLocalRandomSource.current(RandomSource.JDK);
        }
        System.out.println(System.currentTimeMillis() - start);

        limit = 10000000;
        start = System.currentTimeMillis();
        for (int i = 0; i < limit; i++) {
            ThreadLocalRandomSource.current(RandomSource.JDK);
        }
        System.out.println(System.currentTimeMillis() - start);

        limit = 100000000;
        start = System.currentTimeMillis();
        for (int i = 0; i < limit; i++) {
            ThreadLocalRandomSource.current(RandomSource.JDK);
        }
        System.out.println(System.currentTimeMillis() - start);

        limit = 1000000000;
        start = System.currentTimeMillis();
        for (int i = 0; i < limit; i++) {
            ThreadLocalRandomSource.current(RandomSource.JDK);
        }
        System.out.println(System.currentTimeMillis() - start);
    }
--*CUT*--
--*output*--
363
1
2
4
6
28
244
2423
--*output*--

>> >2. Less/no flexibility (no user's choice of random source).
>> -- Agreed.
-- Do we really need this much flexibility here?
>> >3. Error-prone (user can access/reuse the "UniformRandomProvider"
>> instances).
>>
>> >Again: "ThreadLocalRandomSource" is an ad-hoc workaround for correct but
>> >"light" usage of random number generation in a multi-threaded
application;
>> GAs
>> >make "heavy" use of RNG, thus it is does not seem outlandish that all
the
>> RNG
>> >"clients" (e.g. every "operator") creates their own instances.
>
>
>> >IMHO, a more important discussion would be about the expectations in a
>> >multithreaded context: E.g. should an operator be shareable by different
>> >threads?  And if not, how does the API help application developers to
avoid
>> >such pitfalls?
>> -- Once we implement multi-threading in GA, same crossover and mutation
>> operators will be re-used across multiple threads.

>I would be wary to go on that path; better consider making (deep) copies.
>We can have multiple instances of an operator, all being configured in the
>same way but being different instances with no risk of a multithreading
bug.

-- I don't think this would be a good design choice just to support
customization of RNG functionality. This will lead to too many instances of
the same operators resulting in lots of unnecessary memory consumption. I
think we might face memory issues for higher dimensional problems. As
population size requirement also increases with increase of dimension this
might lead to a major issue and need a thought.
    So I think we have a design tradeoff here performance vs memory
consumption. I am more worried about memory as that might restrict use of
this library beyond a certain number of dimensions in some areas. However,
creating deep copy would only be possible when we strictly restrict
extension of operators which I want to avoid.

>> So even if we provide
>> the customization at the operator level we cannot avoid sharing.

>We can, and we should.
>What we probably can't avoid sharing is the instance that represents the
>population of chromosomes.
*--* In a multi-threaded optimization the chromosome instances are shared
in case the same chromosome is chosen for crossover by the selection
process. I missed this point earlier.
...

>> >  Mine is against using "ThreadLocalRandomSource"...
>> -- What is the wayout other than that. Please suggest.

>I think I did.
*--* The factory based approach would be useful only when we can have
separate copies of operators for each set of operations.

>Maybe it's time to create a dedicated branch for the GA functionality
>so that we can try out the different approaches.


>
> >> I think first we need to decide on whether we really need this
> >> customization and if yes then why. Then we can decide on alternate
> >> implementation options.
> >
> >> >As per the recent updates of the math-related code bases, the
> >> >public API should provide factory methods (constructors should
> >> >be private).
> >> -- private constructors will make public API classes non-extensible.
This
> >> will severely restrict the extensibility of this framework which I want
> to
> >> avoid. I am not sure why we need to remove public constructors. It
would
> be
> >> helpful if you could refer me to any relevant discussion thread.
>
> >  Allowing extensibility is a huge burden on library maintainers.  The
> >  library must have been designed to support it; hence, you should
> >  first describe what kind(s) of extensions (with usage examples) you
> >  have in mind.
> --The library should be extensible to support customization. Users should
> be able to customise or provide their own implementation of genetic
> operators for crossover and mutation. The chromosome classes should also
be
> open for extension.

>I don't get why we should support extensions outside this library.
*--* I think we should not block the extension.

>Initially we discussed about having a light-weight library, for easier
usage
>than alternative existing framework(s).
*--* We can always think of making the framework lightweight but it should
not cost extensibility.

>> E.g. any developer should be able to extend the
>> IntegralChromosome class and define a child class which explicitly
>> specifies the range of integers to be used.

>It does not look like this would need an extension, only configuration
>of the range.
*-- *I agree. But the question is should we block the extension.

>> I have initially implemented
>> the Binary chromosome and the corresponding binary mutation following the
>> same pattern. However, restricting extension of concrete classes by
private
>> constructor does not prevent users from extending the abstract parent
>> classes.

>We should aim at coding the GA logic through (Java) interfaces, and not
>expose the "abstract" classes.
*-- *One of the primary reasons for me to contribute in Apache' GA library
is it's simplicity and extensibility. I would like to have a framework
which should be always extensible for any problem domain with minor
changes. The primary reason behind this is that application domains of GA
are too diverse. It is not possible to implement everything in a library.
We don't know all possible domain areas too. If we remove the extensibility
from the framework it would be useless in lots of areas.

>Extending the functionality, if necessary, should be contributed back here
*-- *Sometimes the GA operators are very much specific to the domain and
it's hard to generalise. In those scenarios contributing back to the
library might not be possible. However, if a library cannot be extended for
a new domain by users it becomes underutilised over time if not useless.


Thanks & Regards
--Avijit Basak

On Tue, 21 Dec 2021 at 22:05, Gilles Sadowski <gi...@gmail.com> wrote:

> Hello.
>
> Le mar. 21 déc. 2021 à 16:21, Avijit Basak <av...@gmail.com> a
> écrit :
> >
> > Hi All
> >
> >         Please see my comments. Sorry for the delayed response.
> >
> > >Several problems with this approach (raised in previous messages IIRC):
> > >1. Potential performance loss in sharing the same RNG instance.
> > -- As per my understanding ThreadLocalRandomSource creates separate
> > instances of UniformRandomProvider for each thread. So I am not sure how
> a
> > UniformRandomProvider instance is being shared. Please correct me if I am
> > wrong.
>
> Within a given thread there will be *one* RNG instance; that's what I meant
> by "shared".
> Of course you are right that that instance is not shared by multiple
> threads
> (which would be a bug).
> The performance loss is because it will be necessary to call
>   ThreadLocalRandomSource.current(RandomSource source)
> for each access to the RNG (since it would be a bug to store the returned
> value in e.g. an operator instance that would be shared among threads (as
> you suggest below).
>
> > >2. Less/no flexibility (no user's choice of random source).
> > -- Agreed.
> > >3. Error-prone (user can access/reuse the "UniformRandomProvider"
> > instances).
> >
> > >Again: "ThreadLocalRandomSource" is an ad-hoc workaround for correct but
> > >"light" usage of random number generation in a multi-threaded
> application;
> > GAs
> > >make "heavy" use of RNG, thus it is does not seem outlandish that all
> the
> > RNG
> > >"clients" (e.g. every "operator") creates their own instances.
> >
> >
> > >IMHO, a more important discussion would be about the expectations in a
> > >multithreaded context: E.g. should an operator be shareable by different
> > >threads?  And if not, how does the API help application developers to
> avoid
> > >such pitfalls?
> > -- Once we implement multi-threading in GA, same crossover and mutation
> > operators will be re-used across multiple threads.
>
> I would be wary to go on that path; better consider making (deep) copies.
> We can have multiple instances of an operator, all being configured in the
> same way but being different instances with no risk of a multithreading
> bug.
>
> > So even if we provide
> > the customization at the operator level we cannot avoid sharing.
>
> We can, and we should.
> What we probably can't avoid sharing is the instance that represents the
> population of chromosomes.
>
> >
> > >> My original implementation did not allow any customization of
> > RandomSource
> > >> instances. There was a thought in review for customization of
> > RandomSource,
> > >> so these options were considered. I don't think this would make any
> > >> difference to algorithm functionality.
> >
> > >  Quite right.  But the customization can come at zero cost for the
> users
> > >  who don't need it. Admittedly it's a little more work on the part of
> the
> > >  developer(s) but it's a one off cost (and I'm fine working on that
> part
> > of
> > >  the library once other, more important, things have been settled).
> >
> > >> Even earlier I used Math.random()
> > >> which worked equally well. So my *vote* should be *against* this
> > >> customization.
> >
> > >  Mine is against using "ThreadLocalRandomSource"...
> > -- What is the wayout other than that. Please suggest.
>
> I think I did.
> Maybe it's time to create a dedicated branch for the GA functionality
> so that we can try out the different approaches.
>
> >
> > >> I think first we need to decide on whether we really need this
> > >> customization and if yes then why. Then we can decide on alternate
> > >> implementation options.
> > >
> > >> >As per the recent updates of the math-related code bases, the
> > >> >public API should provide factory methods (constructors should
> > >> >be private).
> > >> -- private constructors will make public API classes non-extensible.
> This
> > >> will severely restrict the extensibility of this framework which I
> want
> > to
> > >> avoid. I am not sure why we need to remove public constructors. It
> would
> > be
> > >> helpful if you could refer me to any relevant discussion thread.
> >
> > >  Allowing extensibility is a huge burden on library maintainers.  The
> > >  library must have been designed to support it; hence, you should
> > >  first describe what kind(s) of extensions (with usage examples) you
> > >  have in mind.
> > --The library should be extensible to support customization. Users should
> > be able to customise or provide their own implementation of genetic
> > operators for crossover and mutation. The chromosome classes should also
> be
> > open for extension.
>
> I don't get why we should support extensions outside this library.
> Initially we discussed about having a light-weight library, for easier
> usage
> than alternative existing framework(s).
>
> > E.g. any developer should be able to extend the
> > IntegralChromosome class and define a child class which explicitly
> > specifies the range of integers to be used.
>
> It does not look like this would need an extension, only configuration
> of the range.
>
> > I have initially implemented
> > the Binary chromosome and the corresponding binary mutation following the
> > same pattern. However, restricting extension of concrete classes by
> private
> > constructor does not prevent users from extending the abstract parent
> > classes.
>
> We should aim at coding the GA logic through (Java) interfaces, and not
> expose the "abstract" classes.
> Extending the functionality, if necessary, should be contributed back here.
>
> Regards,
> Gilles
>
> >>> [...]
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

-- 
Avijit Basak

Re: [MATH][GENETICS][PR-199] Decision on use and customization of RNG functionality for randomization

Posted by Gilles Sadowski <gi...@gmail.com>.
Hello.

Le mar. 21 déc. 2021 à 16:21, Avijit Basak <av...@gmail.com> a écrit :
>
> Hi All
>
>         Please see my comments. Sorry for the delayed response.
>
> >Several problems with this approach (raised in previous messages IIRC):
> >1. Potential performance loss in sharing the same RNG instance.
> -- As per my understanding ThreadLocalRandomSource creates separate
> instances of UniformRandomProvider for each thread. So I am not sure how a
> UniformRandomProvider instance is being shared. Please correct me if I am
> wrong.

Within a given thread there will be *one* RNG instance; that's what I meant
by "shared".
Of course you are right that that instance is not shared by multiple threads
(which would be a bug).
The performance loss is because it will be necessary to call
  ThreadLocalRandomSource.current(RandomSource source)
for each access to the RNG (since it would be a bug to store the returned
value in e.g. an operator instance that would be shared among threads (as
you suggest below).

> >2. Less/no flexibility (no user's choice of random source).
> -- Agreed.
> >3. Error-prone (user can access/reuse the "UniformRandomProvider"
> instances).
>
> >Again: "ThreadLocalRandomSource" is an ad-hoc workaround for correct but
> >"light" usage of random number generation in a multi-threaded application;
> GAs
> >make "heavy" use of RNG, thus it is does not seem outlandish that all the
> RNG
> >"clients" (e.g. every "operator") creates their own instances.
>
>
> >IMHO, a more important discussion would be about the expectations in a
> >multithreaded context: E.g. should an operator be shareable by different
> >threads?  And if not, how does the API help application developers to avoid
> >such pitfalls?
> -- Once we implement multi-threading in GA, same crossover and mutation
> operators will be re-used across multiple threads.

I would be wary to go on that path; better consider making (deep) copies.
We can have multiple instances of an operator, all being configured in the
same way but being different instances with no risk of a multithreading bug.

> So even if we provide
> the customization at the operator level we cannot avoid sharing.

We can, and we should.
What we probably can't avoid sharing is the instance that represents the
population of chromosomes.

>
> >> My original implementation did not allow any customization of
> RandomSource
> >> instances. There was a thought in review for customization of
> RandomSource,
> >> so these options were considered. I don't think this would make any
> >> difference to algorithm functionality.
>
> >  Quite right.  But the customization can come at zero cost for the users
> >  who don't need it. Admittedly it's a little more work on the part of the
> >  developer(s) but it's a one off cost (and I'm fine working on that part
> of
> >  the library once other, more important, things have been settled).
>
> >> Even earlier I used Math.random()
> >> which worked equally well. So my *vote* should be *against* this
> >> customization.
>
> >  Mine is against using "ThreadLocalRandomSource"...
> -- What is the wayout other than that. Please suggest.

I think I did.
Maybe it's time to create a dedicated branch for the GA functionality
so that we can try out the different approaches.

>
> >> I think first we need to decide on whether we really need this
> >> customization and if yes then why. Then we can decide on alternate
> >> implementation options.
> >
> >> >As per the recent updates of the math-related code bases, the
> >> >public API should provide factory methods (constructors should
> >> >be private).
> >> -- private constructors will make public API classes non-extensible. This
> >> will severely restrict the extensibility of this framework which I want
> to
> >> avoid. I am not sure why we need to remove public constructors. It would
> be
> >> helpful if you could refer me to any relevant discussion thread.
>
> >  Allowing extensibility is a huge burden on library maintainers.  The
> >  library must have been designed to support it; hence, you should
> >  first describe what kind(s) of extensions (with usage examples) you
> >  have in mind.
> --The library should be extensible to support customization. Users should
> be able to customise or provide their own implementation of genetic
> operators for crossover and mutation. The chromosome classes should also be
> open for extension.

I don't get why we should support extensions outside this library.
Initially we discussed about having a light-weight library, for easier usage
than alternative existing framework(s).

> E.g. any developer should be able to extend the
> IntegralChromosome class and define a child class which explicitly
> specifies the range of integers to be used.

It does not look like this would need an extension, only configuration
of the range.

> I have initially implemented
> the Binary chromosome and the corresponding binary mutation following the
> same pattern. However, restricting extension of concrete classes by private
> constructor does not prevent users from extending the abstract parent
> classes.

We should aim at coding the GA logic through (Java) interfaces, and not
expose the "abstract" classes.
Extending the functionality, if necessary, should be contributed back here.

Regards,
Gilles

>>> [...]

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [MATH][GENETICS][PR-199] Decision on use and customization of RNG functionality for randomization

Posted by Avijit Basak <av...@gmail.com>.
Hi All

        Please see my comments. Sorry for the delayed response.

>Several problems with this approach (raised in previous messages IIRC):
>1. Potential performance loss in sharing the same RNG instance.
-- As per my understanding ThreadLocalRandomSource creates separate
instances of UniformRandomProvider for each thread. So I am not sure how a
UniformRandomProvider instance is being shared. Please correct me if I am
wrong.
>2. Less/no flexibility (no user's choice of random source).
-- Agreed.
>3. Error-prone (user can access/reuse the "UniformRandomProvider"
instances).

>Again: "ThreadLocalRandomSource" is an ad-hoc workaround for correct but
>"light" usage of random number generation in a multi-threaded application;
GAs
>make "heavy" use of RNG, thus it is does not seem outlandish that all the
RNG
>"clients" (e.g. every "operator") creates their own instances.


>IMHO, a more important discussion would be about the expectations in a
>multithreaded context: E.g. should an operator be shareable by different
>threads?  And if not, how does the API help application developers to avoid
>such pitfalls?
-- Once we implement multi-threading in GA, same crossover and mutation
operators will be re-used across multiple threads. So even if we provide
the customization at the operator level we cannot avoid sharing.

>> My original implementation did not allow any customization of
RandomSource
>> instances. There was a thought in review for customization of
RandomSource,
>> so these options were considered. I don't think this would make any
>> difference to algorithm functionality.

>  Quite right.  But the customization can come at zero cost for the users
>  who don't need it. Admittedly it's a little more work on the part of the
>  developer(s) but it's a one off cost (and I'm fine working on that part
of
>  the library once other, more important, things have been settled).

>> Even earlier I used Math.random()
>> which worked equally well. So my *vote* should be *against* this
>> customization.

>  Mine is against using "ThreadLocalRandomSource"...
-- What is the wayout other than that. Please suggest.

>> I think first we need to decide on whether we really need this
>> customization and if yes then why. Then we can decide on alternate
>> implementation options.
>
>> >As per the recent updates of the math-related code bases, the
>> >public API should provide factory methods (constructors should
>> >be private).
>> -- private constructors will make public API classes non-extensible. This
>> will severely restrict the extensibility of this framework which I want
to
>> avoid. I am not sure why we need to remove public constructors. It would
be
>> helpful if you could refer me to any relevant discussion thread.

>  Allowing extensibility is a huge burden on library maintainers.  The
>  library must have been designed to support it; hence, you should
>  first describe what kind(s) of extensions (with usage examples) you
>  have in mind.
--The library should be extensible to support customization. Users should
be able to customise or provide their own implementation of genetic
operators for crossover and mutation. The chromosome classes should also be
open for extension. E.g. any developer should be able to extend the
IntegralChromosome class and define a child class which explicitly
specifies the range of integers to be used. I have initially implemented
the Binary chromosome and the corresponding binary mutation following the
same pattern. However, restricting extension of concrete classes by private
constructor does not prevent users from extending the abstract parent
classes.


Thanks & Regards
--Avijit Basak


On Tue, 30 Nov 2021 at 19:20, Gilles Sadowski <gi...@gmail.com> wrote:

> Hi.
>
> Le mar. 30 nov. 2021 à 06:40, Avijit Basak <av...@gmail.com> a
> écrit :
> >
> > Hi All
> >
> >         Please see my comments:
> >
> > >The provider returned from ThreadLocalRandomSource.current(...) should
> > >only be used within a single method.
> > -- I missed the context of the thread in my previous mail. Sorry for the
> > previous communication. We can only cache the RandomSource's enum value
> and
> > reuse the same locally in other methods. According to the analysis, the
> > current implementation(In PR#199) with pre-configured RandomSource would
> > work correctly.
> > --CUT--
> > public final class RandomProviderManager {
> >     /** The default RandomSource for random number generation. **/
> >     private static RandomSource randomSource =
> > RandomSource.XO_RO_SHI_RO_128_PP;
> >     /**
> >      * constructs the singleton instance.
> >      */
> >     private RandomProviderManager() {}
> >     /**
> >      * Returns the (static) random generator.
> >      * @return the static random generator shared by GA implementation
> > classes
> >      */
> >     public static UniformRandomProvider getRandomProvider() {
> >         return
> > ThreadLocalRandomSource.current(RandomProviderManager.randomSource);
> >     }
> > }
> > --CUT--
> >
> > @Alex Herbert <al...@gmail.com>, kindly share if you see any
> > challenge to this.
>
> Several problems with this approach (raised in previous messages IIRC):
> 1. Potential performance loss in sharing the same RNG instance.
> 2. Less/no flexibility (no user's choice of random source).
> 3. Error-prone (user can access/reuse the "UniformRandomProvider"
> instances).
>
> Again: "ThreadLocalRandomSource" is an ad-hoc workaround for correct but
> "light" usage of random number generation in a multi-threaded application;
> GAs
> make "heavy" use of RNG, thus it is does not seem outlandish that all the
> RNG
> "clients" (e.g. every "operator") creates their own instances.
>
> IMHO, a more important discussion would be about the expectations in a
> multithreaded context: E.g. should an operator be shareable by different
> threads?  And if not, how does the API help application developers to avoid
> such pitfalls?
>
> > My original implementation did not allow any customization of
> RandomSource
> > instances. There was a thought in review for customization of
> RandomSource,
> > so these options were considered. I don't think this would make any
> > difference to algorithm functionality.
>
> Quite right.  But the customization can come at zero cost for the users
> who don't need it. Admittedly it's a little more work on the part of the
> developer(s) but it's a one off cost (and I'm fine working on that part of
> the library once other, more important, things have been settled).
>
> > Even earlier I used Math.random()
> > which worked equally well. So my *vote* should be *against* this
> > customization.
>
> Mine is against using "ThreadLocalRandomSource"...
>
> > I think first we need to decide on whether we really need this
> > customization and if yes then why. Then we can decide on alternate
> > implementation options.
> >
> > >As per the recent updates of the math-related code bases, the
> > >public API should provide factory methods (constructors should
> > >be private).
> > -- private constructors will make public API classes non-extensible. This
> > will severely restrict the extensibility of this framework which I want
> to
> > avoid. I am not sure why we need to remove public constructors. It would
> be
> > helpful if you could refer me to any relevant discussion thread.
>
> Allowing extensibility is a huge burden on library maintainers.  The
> library must have been designed to support it; hence, you should
> first describe what kind(s) of extensions (with usage examples) you
> have in mind.
>
> Gilles
>
> >
> >
> > Thanks & Regards
> > --Avijit Basak
> >
> >
> > On Mon, 29 Nov 2021 at 23:47, Gilles Sadowski <gi...@gmail.com>
> wrote:
> >
> > > Le lun. 29 nov. 2021 à 19:07, Alex Herbert <al...@gmail.com>
> a
> > > écrit :
> > > >
> > > > Note that your examples have incorrect usage of
> ThreadLocalRandomSource:
> > >
> > > The detailed explanation confirms what I hinted at previously: We
> > > should not use "ThreadLocalRandomSource" from within the library
> > > because we can easily do otherwise (and just as transparently for
> > > the user).
> > >
> > > Gilles
> > >
> > > > [...]
> > >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

-- 
Avijit Basak

Re: [MATH][GENETICS][PR-199] Decision on use and customization of RNG functionality for randomization

Posted by Gilles Sadowski <gi...@gmail.com>.
Hi.

Le mar. 30 nov. 2021 à 06:40, Avijit Basak <av...@gmail.com> a écrit :
>
> Hi All
>
>         Please see my comments:
>
> >The provider returned from ThreadLocalRandomSource.current(...) should
> >only be used within a single method.
> -- I missed the context of the thread in my previous mail. Sorry for the
> previous communication. We can only cache the RandomSource's enum value and
> reuse the same locally in other methods. According to the analysis, the
> current implementation(In PR#199) with pre-configured RandomSource would
> work correctly.
> --CUT--
> public final class RandomProviderManager {
>     /** The default RandomSource for random number generation. **/
>     private static RandomSource randomSource =
> RandomSource.XO_RO_SHI_RO_128_PP;
>     /**
>      * constructs the singleton instance.
>      */
>     private RandomProviderManager() {}
>     /**
>      * Returns the (static) random generator.
>      * @return the static random generator shared by GA implementation
> classes
>      */
>     public static UniformRandomProvider getRandomProvider() {
>         return
> ThreadLocalRandomSource.current(RandomProviderManager.randomSource);
>     }
> }
> --CUT--
>
> @Alex Herbert <al...@gmail.com>, kindly share if you see any
> challenge to this.

Several problems with this approach (raised in previous messages IIRC):
1. Potential performance loss in sharing the same RNG instance.
2. Less/no flexibility (no user's choice of random source).
3. Error-prone (user can access/reuse the "UniformRandomProvider" instances).

Again: "ThreadLocalRandomSource" is an ad-hoc workaround for correct but
"light" usage of random number generation in a multi-threaded application; GAs
make "heavy" use of RNG, thus it is does not seem outlandish that all the RNG
"clients" (e.g. every "operator") creates their own instances.

IMHO, a more important discussion would be about the expectations in a
multithreaded context: E.g. should an operator be shareable by different
threads?  And if not, how does the API help application developers to avoid
such pitfalls?

> My original implementation did not allow any customization of RandomSource
> instances. There was a thought in review for customization of RandomSource,
> so these options were considered. I don't think this would make any
> difference to algorithm functionality.

Quite right.  But the customization can come at zero cost for the users
who don't need it. Admittedly it's a little more work on the part of the
developer(s) but it's a one off cost (and I'm fine working on that part of
the library once other, more important, things have been settled).

> Even earlier I used Math.random()
> which worked equally well. So my *vote* should be *against* this
> customization.

Mine is against using "ThreadLocalRandomSource"...

> I think first we need to decide on whether we really need this
> customization and if yes then why. Then we can decide on alternate
> implementation options.
>
> >As per the recent updates of the math-related code bases, the
> >public API should provide factory methods (constructors should
> >be private).
> -- private constructors will make public API classes non-extensible. This
> will severely restrict the extensibility of this framework which I want to
> avoid. I am not sure why we need to remove public constructors. It would be
> helpful if you could refer me to any relevant discussion thread.

Allowing extensibility is a huge burden on library maintainers.  The
library must have been designed to support it; hence, you should
first describe what kind(s) of extensions (with usage examples) you
have in mind.

Gilles

>
>
> Thanks & Regards
> --Avijit Basak
>
>
> On Mon, 29 Nov 2021 at 23:47, Gilles Sadowski <gi...@gmail.com> wrote:
>
> > Le lun. 29 nov. 2021 à 19:07, Alex Herbert <al...@gmail.com> a
> > écrit :
> > >
> > > Note that your examples have incorrect usage of ThreadLocalRandomSource:
> >
> > The detailed explanation confirms what I hinted at previously: We
> > should not use "ThreadLocalRandomSource" from within the library
> > because we can easily do otherwise (and just as transparently for
> > the user).
> >
> > Gilles
> >
> > > [...]
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [MATH][GENETICS][PR-199] Decision on use and customization of RNG functionality for randomization

Posted by Avijit Basak <av...@gmail.com>.
Hi All

        Please see my comments:

>The provider returned from ThreadLocalRandomSource.current(...) should
>only be used within a single method.
-- I missed the context of the thread in my previous mail. Sorry for the
previous communication. We can only cache the RandomSource's enum value and
reuse the same locally in other methods. According to the analysis, the
current implementation(In PR#199) with pre-configured RandomSource would
work correctly.
--CUT--
public final class RandomProviderManager {
    /** The default RandomSource for random number generation. **/
    private static RandomSource randomSource =
RandomSource.XO_RO_SHI_RO_128_PP;
    /**
     * constructs the singleton instance.
     */
    private RandomProviderManager() {}
    /**
     * Returns the (static) random generator.
     * @return the static random generator shared by GA implementation
classes
     */
    public static UniformRandomProvider getRandomProvider() {
        return
ThreadLocalRandomSource.current(RandomProviderManager.randomSource);
    }
}
--CUT--

@Alex Herbert <al...@gmail.com>, kindly share if you see any
challenge to this.
My original implementation did not allow any customization of RandomSource
instances. There was a thought in review for customization of RandomSource,
so these options were considered. I don't think this would make any
difference to algorithm functionality. Even earlier I used Math.random()
which worked equally well. So my *vote* should be *against* this
customization.
I think first we need to decide on whether we really need this
customization and if yes then why. Then we can decide on alternate
implementation options.

>As per the recent updates of the math-related code bases, the
>public API should provide factory methods (constructors should
>be private).
-- private constructors will make public API classes non-extensible. This
will severely restrict the extensibility of this framework which I want to
avoid. I am not sure why we need to remove public constructors. It would be
helpful if you could refer me to any relevant discussion thread.


Thanks & Regards
--Avijit Basak


On Mon, 29 Nov 2021 at 23:47, Gilles Sadowski <gi...@gmail.com> wrote:

> Le lun. 29 nov. 2021 à 19:07, Alex Herbert <al...@gmail.com> a
> écrit :
> >
> > Note that your examples have incorrect usage of ThreadLocalRandomSource:
>
> The detailed explanation confirms what I hinted at previously: We
> should not use "ThreadLocalRandomSource" from within the library
> because we can easily do otherwise (and just as transparently for
> the user).
>
> Gilles
>
> > [...]
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

-- 
Avijit Basak

Re: [MATH][GENETICS][PR-199] Decision on use and customization of RNG functionality for randomization

Posted by Gilles Sadowski <gi...@gmail.com>.
Le lun. 29 nov. 2021 à 19:07, Alex Herbert <al...@gmail.com> a écrit :
>
> Note that your examples have incorrect usage of ThreadLocalRandomSource:

The detailed explanation confirms what I hinted at previously: We
should not use "ThreadLocalRandomSource" from within the library
because we can easily do otherwise (and just as transparently for
the user).

Gilles

> [...]

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [MATH][GENETICS][PR-199] Decision on use and customization of RNG functionality for randomization

Posted by Alex Herbert <al...@gmail.com>.
Note that your examples have incorrect usage of ThreadLocalRandomSource:

> >
> >     private IntegralValuedMutation(RandomSource rng) {
> >          provider = ThreadLocalRandomSource.current(rng);
> >     }

The provider returned from ThreadLocalRandomSource.current(...) should
only be used within a single method. It should never be cached. This
ensures thread safety.

AFAIK the JVM is free to use different threads for code execution but
all code within a single method must be on the same thread. So if you
cache the UniformRandomProvider then it could end up being executed on
a different thread. If you pass the cached value to multiple threads
then they will all execute with the same RNG and it will break.

The generator returned from ThreadLocalRandomSource.current should
only be used locally. For example you can call this static method from
concurrent threads:

public static int nextInt(RandomSource rs) {
    return ThreadLocalRandomSource.current(rs).nextInt();
}

A unique random generator will be created and seeded for the current
thread if it has not yet been initialized. Otherwise the previously
initialized generator is returned.

This you cannot call from multiple concurrent threads (with the same rng):

public static int nextInt(UniformRandomProvider rng) {
    return rng.nextInt();
}

Do not cache the UniformRandomProvider from ThreadLocalRandomSource.
You can cache the RandomSource enum value and then ensure all your
implementations use the enum to obtain the UniformRandomProvider when
it is required.

The RandomSource enum is a factory for creating instances from the
Commons RNG library. If you wish to allow a user to provide their own
source of randomness, and it must be thread safe then you should
specify a factory interface:

public interface UniformRandomProviderFactory {
     UniformRandomProvider create();
}

Then develop your code such that you ensure that any
UniformRandomProvider requested from the library is only ever used on
a single thread concurrently. For example:

@Test
void test() throws InterruptedException {
    // Wrong
    // UniformRandomProviderFactory factory = () ->
ThreadLocalRandomSource.current(RandomSource.KISS);

    // Correct
    UniformRandomProviderFactory factory = () -> RandomSource.KISS.create();

    AtomicLong total = new AtomicLong();
    ExecutorService es = Executors.newCachedThreadPool();
    long start = System.currentTimeMillis();
    for (int i = 0; i < 100; i++) {
        // Instance will only be used on one thread
        UniformRandomProvider rng = factory.create();
        es.execute(() -> {
            long sum = 0;
            for (int j = 0; j < 10000000; j++) {
                sum += rng.nextInt();
            }
            total.addAndGet(sum);
        });
    }
    es.shutdown();
    es.awaitTermination(10, TimeUnit.SECONDS);
    System.out.printf("%d in %.3f sec\n", total.get(),
(System.currentTimeMillis() - start) * 1e-3);
}

6973351531400 in 0.493 sec

This is a bit slower but the correct usage of ThreadLocalRandomSource:

@Test
void test() throws InterruptedException {
    AtomicLong total = new AtomicLong();
    long start = System.currentTimeMillis();
    ExecutorService es = Executors.newCachedThreadPool();
    for (int i = 0; i < 100; i++) {
        es.execute(() -> {
            long sum = 0;
            for (int j = 0; j < 10000000; j++) {
                sum +=
ThreadLocalRandomSource.current(RandomSource.KISS).nextInt();
            }
            total.addAndGet(sum);
        });
    }
    es.shutdown();
    es.awaitTermination(10, TimeUnit.SECONDS);
    System.out.printf("%d in %.3f sec\n", total.get(),
(System.currentTimeMillis() - start) * 1e-3);
}

-12038673722102 in 1.185 sec

Alex

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [MATH][GENETICS][PR-199] Decision on use and customization of RNG functionality for randomization

Posted by Gilles Sadowski <gi...@gmail.com>.
Hello.

Le lun. 29 nov. 2021 à 07:05, Avijit Basak <av...@gmail.com> a écrit :
>
> Hi All
>
>        Here is a sample use of two options.
>
> *Option1*: Declaring factory interface in MutationPolicy, CrossoverPolicy
> and SelectionPolicy. A sample implementation has been shown here for
> MutationPolicy. Similar would be required for all other relevant interfaces
> and implemented classes.
>
> --CUT--
>
>  public interface MutationPolicy<P> {
>      Chromosome<P> mutate(Chromosome<P> original, double mutationRate);
>
>      interface Factory<P> {
>          /**
>           * Creates an instance with a dedicated source of randomness.
>           *
>           * @param rng RNG algorithm.
>           * @param seed Seed.
>           * @return an instance that must <em>not</em> be shared among
>  threads.
>           */
>          MutationPolicy<P> create(RandomSource rng, Object... args);
>
>          default MutationPolicy<P> create(RandomSource rng) {
>              return create(rng, null);
>          }
>          default MutationPolicy<P> create() {
>              return create(RandomSource.SPLIT_MIX_64);
>          }
>      }
>  }
> //Implementation Class
> public class IntegralValuedMutation<P> implements MutationPolicy<P> {
>
>     private final UniformRandomProvider provider;
>
>     private IntegralValuedMutation(RandomSource rng) {
>          provider = ThreadLocalRandomSource.current(rng);
>     }
>     ...
>     ...
>     public static class MutationFactory<Q> implements Factory<Q> {
>         private static final MutationFactory instance = new MutationFactory<>();

Why a singleton?
[AFAICT, it is an unnecessary limitation, and such design choices should
be left to the application's developer.]

>         private MutationFactory() {}
>
>         @Override
>         public MutationPolicy<Q> create(RandomSource rng, Object... args) {
>             return new IntegralValuedMutation<>(args[0], args[1]);
>         }

Why is the "Object" type used here?

>         public static <Q> MutationFactory<Q> getInstance() {
>             return instance;
>         }

This is not part of my suggested API.

>     }
> //Usage
>         MutationPolicy<Integer> policy =
> IntegralValuedMutation.MutationFactory.<Integer>getInstance().create();
> --CUT--

This is not how I imagined an application developer (who is not
interested in the details of how the PRNG sequence is generated)
would call the library's API default factory for the given class.
Unless I'm mistaken, it should just be
---CUT---
MutationPolicy<Integer> policy = IntegralValuedMutation.getFactory().create();
---CUT---

Such "getFactory()" methods could be construed as a convenience
provided, or not, to the application's developer by the implementer of the
"Chromosome"-specific operator ("IntegralValuedMutation" in this case).

>
> Option2:  Optional constructor argument can also be used as an alternative
> solution.
> --CUT--
> public class IntegralValuedMutation<P> implements MutationPolicy<P> {
>     private final UniformRandomProvider provider;
>     public IntegralValuedMutation() {
>         provider = ThreadLocalRandomSource.current(RandomSource.DEFAULT);
> //DEFAULT is a chosen source.
>     }
>     public IntegralValuedMutation(RandomSource rng) {
>         provider = ThreadLocalRandomSource.current(rng);
>     }
>     ...
> }
> //Usages
> MutationPolicy<Integer> policy = new IntegralValuedMutation(rng);

As per the recent updates of the math-related code bases, the
public API should provide factory methods (constructors should
be private).

With the API suggested above, the equivalent usage would be
---CUT---
MutationPolicy<Integer> policy =
IntegralValuedMutation.getFactory().create(rng);
---CUT---

Regards,
Gilles

>
> Option2 looks to be much simpler regarding implementation and I would vote
> for the same if we decide to allow customization of RandomSource.
>
> Thanks & Regards
> --Avijit Basak
>
>
> On Mon, 22 Nov 2021 at 19:28, Gilles Sadowski <gi...@gmail.com> wrote:
>
> > [...]

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [MATH][GENETICS][PR-199] Decision on use and customization of RNG functionality for randomization

Posted by Avijit Basak <av...@gmail.com>.
Hi All

       Here is a sample use of two options.

*Option1*: Declaring factory interface in MutationPolicy, CrossoverPolicy
and SelectionPolicy. A sample implementation has been shown here for
MutationPolicy. Similar would be required for all other relevant interfaces
and implemented classes.

--CUT--

 public interface MutationPolicy<P> {
     Chromosome<P> mutate(Chromosome<P> original, double mutationRate);

     interface Factory<P> {
         /**
          * Creates an instance with a dedicated source of randomness.
          *
          * @param rng RNG algorithm.
          * @param seed Seed.
          * @return an instance that must <em>not</em> be shared among
 threads.
          */
         MutationPolicy<P> create(RandomSource rng, Object... args);

         default MutationPolicy<P> create(RandomSource rng) {
             return create(rng, null);
         }
         default MutationPolicy<P> create() {
             return create(RandomSource.SPLIT_MIX_64);
         }
     }
 }
//Implementation Class
public class IntegralValuedMutation<P> implements MutationPolicy<P> {

    private final UniformRandomProvider provider;

    private IntegralValuedMutation(RandomSource rng) {
         provider = ThreadLocalRandomSource.current(rng);
    }
    ...
    ...
    public static class MutationFactory<Q> implements Factory<Q> {
        private static final MutationFactory instance = new
MutationFactory<>();
        private MutationFactory() {}

        @Override
        public MutationPolicy<Q> create(RandomSource rng, Object... args) {
            return new IntegralValuedMutation<>(args[0], args[1]);
        }
        public static <Q> MutationFactory<Q> getInstance() {
            return instance;
        }
    }
//Usage
        MutationPolicy<Integer> policy =
IntegralValuedMutation.MutationFactory.<Integer>getInstance().create();
--CUT--

Option2:  Optional constructor argument can also be used as an alternative
solution.
--CUT--
public class IntegralValuedMutation<P> implements MutationPolicy<P> {
    private final UniformRandomProvider provider;
    public IntegralValuedMutation() {
        provider = ThreadLocalRandomSource.current(RandomSource.DEFAULT);
//DEFAULT is a chosen source.
    }
    public IntegralValuedMutation(RandomSource rng) {
        provider = ThreadLocalRandomSource.current(rng);
    }
    ...
}
//Usages
MutationPolicy<Integer> policy = new IntegralValuedMutation(rng);
--CUT--

Option2 looks to be much simpler regarding implementation and I would vote
for the same if we decide to allow customization of RandomSource.

Thanks & Regards
--Avijit Basak


On Mon, 22 Nov 2021 at 19:28, Gilles Sadowski <gi...@gmail.com> wrote:

> Hello.
>
> Le lun. 22 nov. 2021 à 13:49, Avijit Basak <av...@gmail.com> a
> écrit :
> >
> > Hi All
> >
> >         I would like to request everyone to share their opinion regarding
> > use and customization of RNG functionality in the Genetic Algorithm
> > library.
> >         In current design RNG functionality has been used internally by
> the
> > RandomProviderManager class. This class encapsulates a predefined
> instance
> > of RandomSource and utilizes the same for all random number generation
> > requirements. This makes the API cleaner and easy to use for users.
> >         However, during the review an alternate thought has been proposed
> > related to customization of RandomSource by users. According to the new
> > proposal the users will be able to provide a RandomSource instance of
> their
> > choice to the crossover and mutation operators and other places like
> > ChromosomeRepresentationUtils. The drawback of this customization could
> be
> > increased complexity of the API.
>
> Please provide an usage example of both (showing that the alternative
> would actually increase the API complexity).
>
> Thanks,
> Gilles
>
> >         We need to decide here whether we really need this kind of
> > customization by users and if yes the method of doing so. Here two
> options
> > have been proposed.
> > *Option1:*
> > ---CUT---
> > public interface MutationPolicy<P> {
> >     Chromosome<P> mutate(Chromosome<P> original, double mutationRate);
> >
> >     interface Factory<P> {
> >         /**
> >          * Creates an instance with a dedicated source of randomness.
> >          *
> >          * @param rng RNG algorithm.
> >          * @param seed Seed.
> >          * @return an instance that must <em>not</em> be shared among
> > threads.
> >          */
> >         MutationPolicy<P> create(RandomSource rng, Object seed);
> >
> >         default MutationPolicy<P> create(RandomSource rng) {
> >             return create(rng, null);
> >         }
> >         default MutationPolicy<P> create() {
> >             return create(RandomSource.SPLIT_MIX_64);
> >         }
> >     }
> > }
> > ---CUT---
> >
> > *Option 2:*
> > Use of an optional constructor argument for all crossover and mutation
> > operators. Users will be providing a RandomSource instance of their
> choice
> > or use the default one configured while instantiating the operators.
> >
> > Thanks & Regards
> > -- Avijit Basak
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

-- 
Avijit Basak

Re: [MATH][GENETICS][PR-199] Decision on use and customization of RNG functionality for randomization

Posted by Gilles Sadowski <gi...@gmail.com>.
Hello.

Le lun. 22 nov. 2021 à 13:49, Avijit Basak <av...@gmail.com> a écrit :
>
> Hi All
>
>         I would like to request everyone to share their opinion regarding
> use and customization of RNG functionality in the Genetic Algorithm
> library.
>         In current design RNG functionality has been used internally by the
> RandomProviderManager class. This class encapsulates a predefined instance
> of RandomSource and utilizes the same for all random number generation
> requirements. This makes the API cleaner and easy to use for users.
>         However, during the review an alternate thought has been proposed
> related to customization of RandomSource by users. According to the new
> proposal the users will be able to provide a RandomSource instance of their
> choice to the crossover and mutation operators and other places like
> ChromosomeRepresentationUtils. The drawback of this customization could be
> increased complexity of the API.

Please provide an usage example of both (showing that the alternative
would actually increase the API complexity).

Thanks,
Gilles

>         We need to decide here whether we really need this kind of
> customization by users and if yes the method of doing so. Here two options
> have been proposed.
> *Option1:*
> ---CUT---
> public interface MutationPolicy<P> {
>     Chromosome<P> mutate(Chromosome<P> original, double mutationRate);
>
>     interface Factory<P> {
>         /**
>          * Creates an instance with a dedicated source of randomness.
>          *
>          * @param rng RNG algorithm.
>          * @param seed Seed.
>          * @return an instance that must <em>not</em> be shared among
> threads.
>          */
>         MutationPolicy<P> create(RandomSource rng, Object seed);
>
>         default MutationPolicy<P> create(RandomSource rng) {
>             return create(rng, null);
>         }
>         default MutationPolicy<P> create() {
>             return create(RandomSource.SPLIT_MIX_64);
>         }
>     }
> }
> ---CUT---
>
> *Option 2:*
> Use of an optional constructor argument for all crossover and mutation
> operators. Users will be providing a RandomSource instance of their choice
> or use the default one configured while instantiating the operators.
>
> Thanks & Regards
> -- Avijit Basak

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org