You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Avijit Basak <av...@gmail.com> on 2022/05/01 08:06:56 UTC

Re: [Math] Review of "genetic algorithm" module

Hi All

         Please find my responses.

>Currently; it prints everything on "stderr" (so that simple usage of
>the UNIX shell's "pipe" syntax is not possible).
-- This is the default behaviour of slf4j-simple.
It is possible to provide specific system properties as VM args to redirect
the log to desired location like stdout or file.
Kindly look into the documentation provided below:
https://www.slf4j.org/api/org/slf4j/impl/SimpleLogger.html

>
>> There was a discussion
>> earlier on this and the decision was to use console for example modules
for
>> log messages.
>
>I recall that we want a separation between the logger API (library
>dependency) and logger implementation ("examples" dependency).
>
>Is it a feature of "slf4j-simple" to only print to "stderr" or can it
>be configured?
>There is no issue on the "examples" module to depend on a
>full-featured logger (primary candidate would be "Log4j2").
--Already mentioned earlier.

>
>> Library users would add the implementation of slf4j according
>> to their need and provide required configurations to control the log
levels
>> and messages.
>
>We (developers) also want minimal flexibility for testing purposes...
>
>Also, IMO "examples" codes should make a difference
>between "logging" (optional information about internals)
>and "output" (mandatory result).
>For example, the TSP application result (itinerary and travel
>distance) could be saved in a file whose name is given as
>argument to a mandatory "--output" command-line option, as
>in module "commons-math-examples/examples-sofm/tsp".
--I have made the changes to accommodate the --output as input argument.

>
>> >
>> >There is still a mix between library code and application code (but
>> >this is to be discussed in MATH-1643.
>> -- Kindly update the JIRA with your suggestions.
>
>Sure, we'll come to that.  But I think we should first converge to
>a codebase that can be merged to "master" as a replacement of
>the "o.a.c.math4.genetics" package (to be removed).
>Do you confirm that all the unit tests in that package have an
>equivalent in the new "commons-math-ga" module?
-- I have accommodated mostly. But some are altered due to changes in
design. But it is good to have a review of the same.

>
>> >From browsing the library code, I'm tempted to believe that the
>> >dependency towards a logging framework is not necessary (or
>> >underused).
>> -- Do you mean dependency on the slf4j?
>
>Yes.
>
>>   I think that such a feature could be left to the application
>> >layer (per the "ConvergenceListener" registry).
>> -- ConvergenceListener is only to log the nature of convergence,
>> not internal details of all operators. Currently we have a very low
number
>> of log messages which can be increased later.
>
>"Increased later" for what purpose?
>Logging can be very useful (e.g. for debugging); however once
>this new implementation is trusted to behave correctly, I don't see
>that additional logging statements inside the library will be very
>useful; they would rather be needed by the application in order to
>e.g. display the content of the current "Population".  The "data" (to
>be eventually displayed) would be provided by the library through
>regular accessors.
>Another case in point is that the library cannot know what display
>format is suitable for the application (e.g. a GUI).
>
>If we really need (not clear as of the current codebase) to provide
>runtime access to other parts of the library (operators, etc.), why
>not generalise the "observer" pattern?
-- IMHO fine grain log statements are a better alternative than operator
level observer to convey internal behavior of operators for the specific
application.
Use of an observer for each operator is a bit unnecessary.

>
>>
>> >Likewise, the "PopulationStatisticsLogger" is not general enough to
>> >be worth being part of the library.
>> -- I would love to keep this simple implementation of convergence
listener.
>> This can provide a very basic overview on the nature of convergence.
>
>As a user, you'll be able to provide the feature to your application
>with less than 10 lines of code (by registering a "listener").
>Everyone else might want a slightly different output (contents and/or
>format) than what this class generates.
-- Users would always have the freedom to register their own listener.
PopulationStatisticsLogger provides only a very basic option to track the
population statistics during the convergence process.
This is never a very generic solution to meet the needs of all users.

>Yes.
>Not everybody follows strict rules, and certainly not all "Commons"
>components have the same rules (unfortunately), but since it was
>decided that this module belongs in "Commons Math", I'd like to
>minimize inconsistent coding styles (converging on what is used in
>non-"legacy" source files).
>
>> This would
>> initiate changes in almost all classes.
>
>Do you mean "all classes" in the GA module?
>Then so be it. ;-)
-- I shall start to work on this.

>> [...]
>> >In "AdaptiveGeneticAlgorithm":
>> >* There should be a single constructor (same remark as above).
>> -- Removed the constructor with default argument.
>>
>> >* Why the use of reflection ("isAssignableFrom")?
>> -- Replaced it by instanceof.
>
>Marginally better ;-) it still does not say why the statistics is disabled
>depending on the operator type...
--Statistics is required only for the adaptive probability generation.
In case application users choose to use a constant probability generation
scheme there is no use of this computation.
I shall document the same.

>
>
>
>(1)
>o.a.c.m.ga.GeneticAlgorithmTestPermutations
>(under "src/test")
>
>As per your comment in that class, it is a usage example.
>Given that its name does not end with "Test", it is not run by the
>test suite.  Please move it to the "examples" module.
-- This was taken from legacy GA. There is another example of binary
genotype called onemax. We already have examples in the "examples-ga"
module.
If we don't need these in test packages we can delete the same. But in that
case there won't be any end-to-end test case execution scenario during
build. Are we fine with that?

>
>(2)
>I'm missing a high-level doc that would enable a newbie to figure
>out what to implement in order to get going.
>E.g. what is the interplay between
> * genotype
> * allele
> * phenotype
> * decoder
> * fitness function
>?
>Several classes do not provide explanations (or links) about the
>concept which they represent.  For example, there is no doc about
>what a "RandomKeyDecoder" is, and the reason for using it (or not).
-- I need to work on that.

>
>(3)
>o.a.c.m.ga.utils.ChromosomeRepresentationUtils
>
>It seems to be a "mixed-bag" kind of class (that is being frowned
>upon nowadays).
>Its comment refers to "random" but some methods are not using
>any randomization.  Most methods are only used in unit tests.
>
-- Actually most methods are taken from the legacy GA application. In the
legacy library representation there was no separate concept for the decoder
and all representation utility methods were kept in the corresponding
Chromosome classes. ChromosomeRepresentationUtils is created as a
placeholder for those utility methods.

>(4)
>o.a.c.m.ga.RandomProviderManager
>
>As already discussed, this class should not be part of the public
>API, namely because the "getRandomProvider()" method returns
>an object that is not thread-safe.
>If used internally as "syntactic sugar", it should be located in a
>package named "internal"; however I'd tend to remove it
>altogether, and call "ThreadLocalRandomSource.current(...)"
>explicitly.
>
-- The class o.a.c.m.ga.RandomProviderManager has been removed.
As per our previous discussion I have implemented the customization of
RandomSource at each operator level.

>(5)
>Why does a "Chromosome" need an "identifier"?
>Method "getId()" is only used in "PopulationStatisticalSummaryImpl"
>that is an internal class, where it seems that the chromosome itself
>(rather than its "id") could serve as the map's key.
-- How can we generate a map's key from the chromosome itself? Please
suggest.

>
>(6)
>o.a.c.m.ga.chromsome.AbstractChromosome
>
>Field "fitness" is not "final", yet it could be: a "FitnessFunction"
>object (used in "evaluate() to compute that field) is passed to the
>constructor.  Is there a reason for the "lazy" evaluation?
>Dropping it would make the instance immutable (and "evaluate()"
>should be renamed to "getFitness()").
>
>Why should the "FitnessFunction" be stored in every chromosome?
>
-- I have modified the fitness as final and initialized the same in the
constructor.

>(7)
>Spurious "@since" tags: In the new code (in "commons-math-ga"
>module), none should refer to a version < 4.0.
>
-- Some files are taken unchanged from the previous release. I have kept
the same @since tag for those files.
Do you need any change here?

>(8)
>@SuppressWarnings("unchecked")
>
>By default, I'm a bit suspicious about having to resort to these
annotations,
>especially for the kind of algorithms we are trying to implement.
>What do you think of the alternative approach outlined in the ZIP file
>attached in MATH-1618:
>    https://issues.apache.org/jira/browse/MATH-1618
>?
-- This annotation is required because we have kept an option to use
different types of genotypes including primitive.
Because of that our base interfaces only declares phenotype not genotype.
This introduced a kind of hierarchy in all operators and chromosome classes
which required us to use the mentioned annotation.
Even with the proposed new architecture we may not be able to avoid the
same.
-- It will be good if you can share some more information about the newly
proposed architecture. The areas of current design which it can improve as
well as the underlying intention.

>
>(9)
>Naming of factory methods should be harmonized to match the convention
>adopted in components like [RNG] and [Numbers].
>E.g. instead of "newChromosome(...)", please use "of(...)" or "from(...)"
>for "value object", and "create(...)" otherwise.
>
-- I have renamed the same for Chromosome classes.
What about the nextGeneration() method of ListPopulation class. Renaming
this to create() or from() won't convey the purpose of it.

>(10)
>o.a.c.m.ga.chromosome.AbstractListChromosome
>
>Constructor is called with an argument that is a previously instantiated
>"representation".  If the latter is mutable, the caller will be able to
modify
>the underlying data structure of the newly created chromosome.  [The
>doc assumes immutability of the representation but this cannot be
>enforced, and mixed ownership can entail subtle bugs.]
-- I think this applies to both representation as well as generic parameter
type T. But I don't see any other option but to rely on the user.
If you have any suggestions kindly share.

>
>(11)
>Do we agree that, in a GA, the most time-consuming task is the fitness
>computation?  Hence IMO, it should be the focus of the multithreading
>tools (i.e. "ExecutorService"), probably keeping the other parts (namely
>the genetic operators) within a simple sequential loop (as in class
>"GeneticAlgorithmFactory" in MATH-1618).
-- Current implementation uses separate threads for applying crossover and
mutation operators for each pair of selected chromosomes.
I think this ensures better utilization of multi-core processors compared
to use of multi-threading only for the fitness calculation.

-- Some codes are checked in. But there is a conflict in the pull request.
So I shall create a new one and delete the old branch itself.


Thanks & Regards
--Avijit Basak


On Fri, 15 Apr 2022 at 03:03, Gilles Sadowski <gi...@gmail.com> wrote:

> Hello.
>
> > > > [...]
>
> (1)
> o.a.c.m.ga.GeneticAlgorithmTestPermutations
> (under "src/test")
>
> As per your comment in that class, it is a usage example.
> Given that its name does not end with "Test", it is not run by the
> test suite.  Please move it to the "examples" module.
>
> (2)
> I'm missing a high-level doc that would enable a newbie to figure
> out what to implement in order to get going.
> E.g. what is the interplay between
>  * genotype
>  * allele
>  * phenotype
>  * decoder
>  * fitness function
> ?
> Several classes do not provide explanations (or links) about the
> concept which they represent.  For example, there is no doc about
> what a "RandomKeyDecoder" is, and the reason for using it (or not).
>
> (3)
> o.a.c.m.ga.utils.ChromosomeRepresentationUtils
>
> It seems to be a "mixed-bag" kind of class (that is being frowned
> upon nowadays).
> Its comment refers to "random" but some methods are not using
> any randomization.  Most methods are only used in unit tests.
>
> (4)
> o.a.c.m.ga.RandomProviderManager
>
> As already discussed, this class should not be part of the public
> API, namely because the "getRandomProvider()" method returns
> an object that is not thread-safe.
> If used internally as "syntactic sugar", it should be located in a
> package named "internal"; however I'd tend to remove it
> altogether, and call "ThreadLocalRandomSource.current(...)"
> explicitly.
>
> (5)
> Why does a "Chromosome" need an "identifier"?
> Method "getId()" is only used in "PopulationStatisticalSummaryImpl"
> that is an internal class, where it seems that the chromosome itself
> (rather than its "id") could serve as the map's key.
>
> (6)
> o.a.c.m.ga.chromsome.AbstractChromosome
>
> Field "fitness" is not "final", yet it could be: a "FitnessFunction"
> object (used in "evaluate() to compute that field) is passed to the
> constructor.  Is there a reason for the "lazy" evaluation?
> Dropping it would make the instance immutable (and "evaluate()"
> should be renamed to "getFitness()").
>
> Why should the "FitnessFunction" be stored in every chromosome?
>
> (7)
> Spurious "@since" tags: In the new code (in "commons-math-ga"
> module), none should refer to a version < 4.0.
>
> (8)
> @SuppressWarnings("unchecked")
>
> By default, I'm a bit suspicious about having to resort to these
> annotations,
> especially for the kind of algorithms we are trying to implement.
> What do you think of the alternative approach outlined in the ZIP file
> attached in MATH-1618:
>     https://issues.apache.org/jira/browse/MATH-1618
> ?
>
> (9)
> Naming of factory methods should be harmonized to match the convention
> adopted in components like [RNG] and [Numbers].
> E.g. instead of "newChromosome(...)", please use "of(...)" or "from(...)"
> for "value object", and "create(...)" otherwise.
>
> (10)
> o.a.c.m.ga.chromosome.AbstractListChromosome
>
> Constructor is called with an argument that is a previously instantiated
> "representation".  If the latter is mutable, the caller will be able to
> modify
> the underlying data structure of the newly created chromosome.  [The
> doc assumes immutability of the representation but this cannot be
> enforced, and mixed ownership can entail subtle bugs.]
>
> (11)
> Do we agree that, in a GA, the most time-consuming task is the fitness
> computation?  Hence IMO, it should be the focus of the multithreading
> tools (i.e. "ExecutorService"), probably keeping the other parts (namely
> the genetic operators) within a simple sequential loop (as in class
> "GeneticAlgorithmFactory" in MATH-1618).
>
> To be continued...
>
> Regards,
> Gilles
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

-- 
Avijit Basak

Re: [Math] Review of "genetic algorithm" module

Posted by Gilles Sadowski <gi...@gmail.com>.
Hello.

Le dim. 3 juil. 2022 à 08:39, Avijit Basak <av...@gmail.com> a écrit :
>
> Hi All
>
>             Please find my responses. Sorry for the delay.
>
> > Comments related to new model:
> >>
> >> 1) Hierarchy of GeneticOperator: In the proposed model the genetic
> >> operators are designed hierarchically implementing a common interface and
> >> operators are accepted as a list.
> >> This enables interchangeability of operators. Library users would be able
> >> to use crossover and mutation operators interchangeably.
> >> However, in genetic algorithms or other population based search
> algorithms
> >> the operators are used for broadly two purposes, exploration and
> >> exploitation.
> >> In GA crossover is used for exploitation and mutation is used for
> >> exploration. Keeping them in a common hierarchy and allowing
> >> interchangeable operation violates that purpose.
> >
> >I'm not sure that the semantics of "exploitation" and "exploration"
> >should drive the API design.
> >Saying it differently: We don't need to know how various operators
> >will be used in order to implement them; hence IMO, there is no
> >need to discriminate at the API level.
>
> -- Even if we want to use a common abstraction, algorithm implementation
> needs to use them in different ways.
> The new proposed implementation does not consider different operators
> separately. I have mentioned this in the summary section at the end.

Please provide a concrete example of what the proposed implementation
is unable to handle in principle.  In practice, this implementation indeed
provides a "GeneticOperator" abstraction; but I do not understand how it
limits a user's possibility to select any concrete class that provides any
form of either "crossover" or "mutation".

> >>
> >> 2) Chromosome Fitness: In the new design the chromosome fitness is
> >> maintained as a hashmap where the key is the chromosome itself.
> >> Are we going to retain the fitness value in chromosome too otherwise
> >> implementation of Comparable won't be possible?
> >
> >Sorry, I don't follow.
> >
> >> Assuming the chromosome representation would be used to calculate
> hashcode,
> >> it would be a very time consuming process depending on the length of
> >> chromosome.
> >
> >Is this assumption correct?
> >For what purpose do we need to compute a custom hash code?
>
> -- A standard practice is to provide a custom implementation of hashCode
> and equals method whenever an object is used as key to any hash based data
> structure like HashMap we are using here.
> If you think it is good to rely on the JVM specific implementation, I have
> no concerns.

You are right that this should be examined carefully, to avoid unintended
behaviour.
A good start would perhaps be to check that "Population" behaves
correctly, with a minimal API.

> >
> >> E.g. in binary chromosomes we have provision to allow chromosome length
> >> upto (2^31 - 1).
> >
> >That's a lot. ;-)
> >Do you have use cases where such long representations can be handled?
>
> -- One such use case could be tuning of neural networks. Once we introduce
> genetic programming there would be a lot more use cases.

A discussion in itself, primarily focused on ensuring that the implementation
does not put undue limitations on genotype representation IIUC.
Is it the case with my proposal?

> >
> >> Along with that the chromosome implements Comparable
> >> interface.
> >> Equality by Comparable interface would be decided by fitness
> >
> >Sure.
> >
> >> however
> >> equality by equals() method would depend on representation.
> >
> >Do we need a custom "equals"?
>
> -- Mentioned above.

Yes, but without indicating whether there is a problem in my proposal.
I don't say that there isn't, but this should be confirmed with e.g. unit
tests (or "integration" tests?).

We don't want to support any unnecessary API or behaviours.
In this particular case, do we need to keep track of whether 2 genotypes
share the exact same representation?

> >
> >> As chromosomes with different representations may also have the same
> >> fitness, this would produce a conflict.
> >
> >Please provide an example.
>
> -- Think of a problem where we need to find out the minimum value of a
> function using GA where the function is a square of sinusoid like mentioned
> below.
> y = Math.pow(sin(x), 2)
>          where -180 deg <= x <= 180 deg
> We may have the same value of y for different values of x due to the
> replicative nature of the function.
> Similar situations can arrive in real life optimization problems too.
> Even the MathFunctionOptimization example we have used belongs to this
> category of problem.

I fail to see the "conflict" being referred to above (I forgot about the context
of that remark).

> >>
> >> 3) The model does not consider anything related to adaptive approaches.
> >> Would it be a separate factory?
> >
> >What I've proposed is an alternative "skeleton" for the API.  Of course,
> >more classes will provide specific functionality.
> >An adaptive operator "is-a" specialized genetic operator (perhaps the
> >notion of "Adaptive" will be defined through an interface?).
>
> -- In "commons-math4-ga2" module separate ApplicationRate has been
> introduced to handle the adaptive nature of the algorithm.
> But this model always uses sorting as the default option which is
> unnecessary for simple genetic algorithms. Isn't it a performance
> bottleneck?

It could be, and could be checked with benchmarks.
My current impression, based on the one example that uses the new
implementation, is that a much more important time consumer is the
logger!
Moreover, isn't it the case that sorting of the population is almost always
needed, so that better fitted chromosomes can be selected with higher
probability?

> >
> >>
> >> 4) Comparison of chromosomes: In the current model two chromosomes are
> >> compared after decoding the genotype to phenotype using the internal
> >> decoder.
> >> How can we address this in the new model?
> >
> >As mentioned above: Do we really need to compare representations?
>
> -- This is a minor scenario and can be ignored as of now.

Well, as I suggested from the outset, we should not implement anything
not strictly required for a minimally useful GA functionality.
However, all "scenarios" are worth mentioning (e.g. in a JIRA report); they
help picturing a design that is as simple as possible without blocking future
functionality.  [We also explicitly discussed early on that we did not intend
to create a framework for research *about* GA.]

> [...]

> >> >> >(5)
> >> >> >Why does a "Chromosome" need an "identifier"?
> >> >> >Method "getId()" is only used in "PopulationStatisticalSummaryImpl"
> >> >> >that is an internal class, where it seems that the chromosome itself
> >> >> >(rather than its "id") could serve as the map's key.
> >> >> -- How can we generate a map's key from the chromosome itself? Please
> >> >> suggest.
> >> >
> >> >A class to be used as a key only needs to implement "equals" and
> >> >"hashCode".
> >> -- The current chromosome class implements Comparable interface which
> uses
> >> chromosome fitness for comparison. Use of both Comparable and equals()
> >> might introduce inconsistencies.
> >
> >An example?
>
> -- Inconsistency can appear in case we provide a custom implementation of
> equals and hashcode following the representation of chromosome or use the
> default implementation.
> Since Comparable uses the fitness value to compare and as described above
> two chromosomes with separate representations can have the same fitness
> value this might result in inconsistency.
> But in the new module chromosome does not implement Comparable, so there is
> no possibility of the same.

IIUC, this is another example of why we should avoid any functionality
(such as "Comparable" in this case) not strictly necessary within the
intended scope.

> [...]
> >>
> >> We still need to agree on my proposal...
> >> Would you try to follow up on that basis, i.e. add a minimal set of GA
> >> operators, in order to show that some of the "examples" can be run?
> >
> >I've added the "MathFunctionOptimizer2" in the "Standalone" example
> >application.  Please check that it produces the expected output.
> >[I had to slightly change the function being optimized, and your decoding
> >function. Why are you assuming that "Coordinates" cannot be negative?]
>
> -- The fitness function uses the square of each dimensional value of the
> coordinate. So negative signs won't impact the fitness value.

No, but the example function could be just slightly more general (i.e. allow
for an optimum with a negative coordinate) and in that case, the encoding
prevents finding it...

>>>> [...]

> >
> >Hoping we can make significant progress so as to include the new
> >GA functionality in the upcoming release of "Commons Math"...
>
>
> I would like to summarize a few points which I think could be a problem in
> the new model.
>
> 1) Separation of crossover and mutation operator at the API level and the
> corresponding usage of the GeneticOperator in the algorithm implementation:
> -- As mentioned earlier we would need a separation of GeneticOperators.
> The current implementation of the algorithm does not distinguish the
> crossover and mutation operator and treats them uniformly.

There could be public classes that embody the intrinsic difference between
a crossover and a mutation; the current implementation shows they are
not necessary.
What would they bring from a developer's POV?
From a user's POV, is there any loss in not having those public classes?

> This violates the basic concept of the algorithm itself and needs to be
> fixed.

The "Operators" (in package "gene/binary") utility class would define
factory methods for all the available operators (whose name univocally
relates to the kind of operation that it performs).

>
> 2) Use of adaptive probability:
> -- Currently the Mutation class keeps a constant probability value as well
> as a separate ApplicationRate was passed for each genetic operator.
> Since we are going to use the operator probability in an adaptive way we
> would need to pass the generated probability as an argument to the apply
> method of GeneticOperator.

Maybe I missed all the implications of the "adaptive" terminology; I've
implemented what I grokked from the "function" example.
You'd need to explicitly (re)state
 * how the "rate" is computed,
 * how it is used,
 * where the design fails to accommodate the functionality.

From your remark above, I seem to understand that the *same* adaptive
rate value (for a given generation) would determine two things:
 (1) the probability of whether the operator will be applied
 (2) how some specific operator will behave *internally*
I intentionally did not implement (2) since that would seem to imply that the
operator could "internally" decide to *not* do what it is meant to do (i.e. a
mutate any gene with some fixed probability).  In particular, I understood
"adaptivity" as changing the probability of how often the operator is applied,
not changing the effect of the operator.
Since the user can add as many operator instances as he wishes

>
> 3) Separation of logic for adaptive and Simple GA:
> -- We might want to separate two different strategies of GA i.e. simple GA
> and adaptive GA.
> As for simple GA with constant probability we can avoid computation like
> rank calculation.

Isn't rank also (and primarily) used to select (elitistic) parents?
[Performance of the current code can indeed be slightly improved for the
case where "elitism" is strictly zero.]

Otherwise, ranking is performed only when the "Population" contents is
provided to the user (using the "GenerationCallback"), in which case it is
probably expected that the list of chromosomes will need to be sorted
sooner or later (e.g. to display it in the most obviously sensible way).

For example, ranking is *not* performed for the "Tournament" selection.

>
> 4) OffspringGenerator following Strategy pattern:
> -- Can we have an implementation of OffspringGenerator following strategy
> pattern. We should have default implementation following your first
> proposal.
> But there can be an optional parameter in the API method, where library
> users would be able to provide their custom implementation.

Where do you see a reference to "OffspringGenerator"?

>
> If you need a separate mail chain for each of the review comments mentioned
> above kindly let me know.

Yes, better discuss each topic in its own thread.
Also, as noted above, some can be moved to JIRA.

Thanks,
Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [Math] Review of "genetic algorithm" module

Posted by Avijit Basak <av...@gmail.com>.
Hi All

            Please find my responses. Sorry for the delay.

> Comments related to new model:
>>
>> 1) Hierarchy of GeneticOperator: In the proposed model the genetic
>> operators are designed hierarchically implementing a common interface and
>> operators are accepted as a list.
>> This enables interchangeability of operators. Library users would be able
>> to use crossover and mutation operators interchangeably.
>> However, in genetic algorithms or other population based search
algorithms
>> the operators are used for broadly two purposes, exploration and
>> exploitation.
>> In GA crossover is used for exploitation and mutation is used for
>> exploration. Keeping them in a common hierarchy and allowing
>> interchangeable operation violates that purpose.
>
>I'm not sure that the semantics of "exploitation" and "exploration"
>should drive the API design.
>Saying it differently: We don't need to know how various operators
>will be used in order to implement them; hence IMO, there is no
>need to discriminate at the API level.

-- Even if we want to use a common abstraction, algorithm implementation
needs to use them in different ways.
The new proposed implementation does not consider different operators
separately. I have mentioned this in the summary section at the end.

>
>>
>> 2) Chromosome Fitness: In the new design the chromosome fitness is
>> maintained as a hashmap where the key is the chromosome itself.
>> Are we going to retain the fitness value in chromosome too otherwise
>> implementation of Comparable won't be possible?
>
>Sorry, I don't follow.
>
>> Assuming the chromosome representation would be used to calculate
hashcode,
>> it would be a very time consuming process depending on the length of
>> chromosome.
>
>Is this assumption correct?
>For what purpose do we need to compute a custom hash code?

-- A standard practice is to provide a custom implementation of hashCode
and equals method whenever an object is used as key to any hash based data
structure like HashMap we are using here.
If you think it is good to rely on the JVM specific implementation, I have
no concerns.

>
>> E.g. in binary chromosomes we have provision to allow chromosome length
>> upto (2^31 - 1).
>
>That's a lot. ;-)
>Do you have use cases where such long representations can be handled?

-- One such use case could be tuning of neural networks. Once we introduce
genetic programming there would be a lot more use cases.

>
>
>
>> Along with that the chromosome implements Comparable
>> interface.
>> Equality by Comparable interface would be decided by fitness
>
>Sure.
>
>> however
>> equality by equals() method would depend on representation.
>
>Do we need a custom "equals"?

-- Mentioned above.

>
>> As chromosomes with different representations may also have the same
>> fitness, this would produce a conflict.
>
>Please provide an example.

-- Think of a problem where we need to find out the minimum value of a
function using GA where the function is a square of sinusoid like mentioned
below.
y = Math.pow(sin(x), 2)
         where -180 deg <= x <= 180 deg
We may have the same value of y for different values of x due to the
replicative nature of the function.
Similar situations can arrive in real life optimization problems too.
Even the MathFunctionOptimization example we have used belongs to this
category of problem.

>
>>
>> 3) The model does not consider anything related to adaptive approaches.
>> Would it be a separate factory?
>
>What I've proposed is an alternative "skeleton" for the API.  Of course,
>more classes will provide specific functionality.
>An adaptive operator "is-a" specialized genetic operator (perhaps the
>notion of "Adaptive" will be defined through an interface?).

-- In "commons-math4-ga2" module separate ApplicationRate has been
introduced to handle the adaptive nature of the algorithm.
But this model always uses sorting as the default option which is
unnecessary for simple genetic algorithms. Isn't it a performance
bottleneck?

>
>>
>> 4) Comparison of chromosomes: In the current model two chromosomes are
>> compared after decoding the genotype to phenotype using the internal
>> decoder.
>> How can we address this in the new model?
>
>As mentioned above: Do we really need to compare representations?

-- This is a minor scenario and can be ignored as of now.

>
>>
>> 5) Chromosome String representation: Currently we use the toString()
method
>> to print the chromosome's phenotype. In the new model we would need to
>> avoid this approach as decoders won't be available to chromosomes.
>
>This seems like a minor issue (or perhaps no issue at all?) unless I'm
>missing something.

-- We can ignore this for now.

[...]
>> >Doing manually will be too tedious.
>> >If you are reasonably confident that you've ported everything
>> >valuable, I guess that we'll rely on coverage tools...
>> >The current log statements could also be construed as (totally)
>> >unnecessary, as they merely document statements which I would
>> >consider part of an "application", not the GA "library".
>> >I believe that we do not agree yet on where to draw the line (my
>> >proposal in MATH-1618 is related to that difference of opinion).
>> >
>> -- Probably I am missing something here. These log statements are more
>> useful for library users. If we remove all log statements how users would
>> be able to debug any issue.
>> What if there are issues in any part of library code or may be anything
>> fails due to some application related customization.
>> It may not be necessarily a library bug.
>
>I'm not clear about what is the intended purpose of logging.
>But I'd rather defer this discussion, and first focus on the actual GA
>functionality being implemented.

-- This can be postponed for now.

>>[...]
>> >AFAICT, the "PopulationStatisticsLogger" class satisfies *your* needs
>> >(as a user), but as developers we should only include functionality that
>> >is broadly useful.  I think I provided arguments that it is not the
case of
>> >that class.
>> -- So do you want this class to be removed?
>
>Short answer would be "yes".  But the rationale is that we should
>defer all functionality that just seems "nice-to-have" until the core
>API is stabilized.

-- I agree with you.

>
>>
>> >Great!
>> >
>> >> >> [...]
>> >
>> >Maybe I'm still missing something, but IMHO the distinction between
>> >adaptive vs not should not impact code at that level.
>> -- Actually I wanted to keep a provision where users can mix both
>> approaches. Maybe keep crossover probability as constant but mutation as
>> adaptive.
>
>AFAICT, this would follow from my proposal (when "AdaptiveMutation"
>and "AdaptiveCrossover" operators are implemented).

-- Addressed in the new implementation.

>
>>
>> [...]
>> >> >(3)
>> >> >o.a.c.m.ga.utils.ChromosomeRepresentationUtils
>> >> >
>> >> >It seems to be a "mixed-bag" kind of class (that is being frowned
>> >> >upon nowadays).
>> >> >Its comment refers to "random" but some methods are not using
>> >> >any randomization.  Most methods are only used in unit tests.
>> >> >
>> >> -- Actually most methods are taken from the legacy GA application. In
the
>> >> legacy library representation there was no separate concept for the
>> decoder
>> >> and all representation utility methods were kept in the corresponding
>> >> Chromosome classes. ChromosomeRepresentationUtils is created as a
>> >> placeholder for those utility methods.
>> >
>> >I'm not sure I follow what you mean.
>> >If these methods are only needed by the (legacy?) examples, they
>> >should be moved to the "ga-examples" module (where they can be
>> >removed later on without regards to BC).
>> -- These were adopted from the legacy model and should be useful in this
>> new model too.
>
>Anything not _surely_ useful (i.e. actually used by the new library)
>should be left out.  If necessary, code can be re-added later.
>
>>
>> > > > [...]
>> >
>> >OK.  [Is there a new PR?]
>> -- I shall create one.
>
>We still need to agree on my proposal...
>Would you try to follow up on that basis, i.e. add a minimal set of GA
>operators, in order to show that some of the "examples" can be run?

-- Implementation of the algorithm still needs to be corrected in
commons-math4-ga2 module as described in the summary.

>
>> >
>> >> >(5)
>> >> >Why does a "Chromosome" need an "identifier"?
>> >> >Method "getId()" is only used in "PopulationStatisticalSummaryImpl"
>> >> >that is an internal class, where it seems that the chromosome itself
>> >> >(rather than its "id") could serve as the map's key.
>> >> -- How can we generate a map's key from the chromosome itself? Please
>> >> suggest.
>> >
>> >A class to be used as a key only needs to implement "equals" and
>> >"hashCode".
>> -- The current chromosome class implements Comparable interface which
uses
>> chromosome fitness for comparison. Use of both Comparable and equals()
>> might introduce inconsistencies.
>
>An example?

-- Inconsistency can appear in case we provide a custom implementation of
equals and hashcode following the representation of chromosome or use the
default implementation.
Since Comparable uses the fitness value to compare and as described above
two chromosomes with separate representations can have the same fitness
value this might result in inconsistency.
But in the new module chromosome does not implement Comparable, so there is
no possibility of the same.

>
>> >
>> >> >
>> >> >(6)
>> >> >o.a.c.m.ga.chromsome.AbstractChromosome
>> >> >
>> >> >Field "fitness" is not "final", yet it could be: a "FitnessFunction"
>> >> >object (used in "evaluate() to compute that field) is passed to the
>> >> >constructor.  Is there a reason for the "lazy" evaluation?
>> >> >Dropping it would make the instance immutable (and "evaluate()"
>> >> >should be renamed to "getFitness()").
>> >> >
>> >> >Why should the "FitnessFunction" be stored in every chromosome?
>> >> >
>> >> -- I have modified the fitness as final and initialized the same in
the
>> >> constructor.
>> >
>> >Better, but did you check my proposal in MATH-1618, where
>> >Chromosome and fitness are decoupled, and their relationship
>> >is held within a "Population" instance?
>> --Mentioned earlier.
>
>I still don't know whether you agree that my proposal makes it
>simpler to express a GA.

-- I think there are few points where we are not aligned and those are
mentioned in the summary section.

>
>> >
>> >> [...]
>
>> >
>> >> >
>> >> >(9)
>> >> >Naming of factory methods should be harmonized to match the
convention
>> >> >adopted in components like [RNG] and [Numbers].
>> >> >E.g. instead of "newChromosome(...)", please use "of(...)" or
>> "from(...)"
>> >> >for "value object", and "create(...)" otherwise.
>> >> >
>> >> -- I have renamed the same for Chromosome classes.
>> >> What about the nextGeneration() method of ListPopulation class.
Renaming
>> >> this to create() or from() won't convey the purpose of it.
>> >
>> >I agree, and that's why the new "Population" class (in MATH-1618) does
>> >not provide a factory method (see also the "GeneticAlgorithmFactory"
>> >class).
>> -- We can avoid the same in the current model if we agree to use a
default
>> implementation of population and remove the Population interface
following
>> your new model.
>
>So, do we adopt that "new model"?
>Or do you still have objections?

-- Mentioned above.

>
>> >
>> >> >(10)
>> >> >o.a.c.m.ga.chromosome.AbstractListChromosome
>> >> >
>> >> >Constructor is called with an argument that is a previously
instantiated
>> >> >"representation".  If the latter is mutable, the caller will be able
to
>> >> modify
>> >> >the underlying data structure of the newly created chromosome.  [The
>> >> >doc assumes immutability of the representation but this cannot be
>> >> >enforced, and mixed ownership can entail subtle bugs.]
>> >> -- I think this applies to both representation as well as generic
>> parameter
>> >> type T. But I don't see any other option but to rely on the user.
>> >
>> >The Javadoc (at line 84) is misleading in its mention of "immutable".
>> >
>> >> If you have any suggestions kindly share.
>> >
>> >I may not understand all the implications, but I'd suggest that the
>> >"representation" be instantiated within the control of the library (e.g.
>> >through a "builder"/"factory").
>> -- Currently we have the ChromosomeRepresentationUtils for the same. Its
>> methods are designed to generate the representations.
>
>My suggestion is that this design can be improved (a.o. according to my
>above suggestion).

-- Sure.

>
>> >
>> >> >
>> >> >(11)
>> >> >Do we agree that, in a GA, the most time-consuming task is the
fitness
>> >> >computation?  Hence IMO, it should be the focus of the multithreading
>> >> >tools (i.e. "ExecutorService"), probably keeping the other parts
(namely
>> >> >the genetic operators) within a simple sequential loop (as in class
>> >> >"GeneticAlgorithmFactory" in MATH-1618).
>> >> -- Current implementation uses separate threads for applying crossover
>> and
>> >> mutation operators for each pair of selected chromosomes.
>> >> I think this ensures better utilization of multi-core processors
compared
>> >> to use of multi-threading only for the fitness calculation.
>> >
>> >I have the opposite intuition: Parallel application of the genetic
>> >operators would only provide marginal gains wrt the fitness
>> >computation.
>> >In any case, I think that it will be fairly easy to modify my proposed
>> >"OffspringGenerator" class to use an "ExecutorService" (if benchmarks
>> >show that a substantial gain could indeed be achieved).
>> >
>> >> -- Some codes are checked in. But there is a conflict in the pull
>> request.
>> >> So I shall create a new one and delete the old branch itself.
>> >
>> >IMHO, we could make more substantial progress if you could
>> >first point to issues with my proposal in MATH-1618.
>> --Mentioned earlier.
>
>Well, I don't know where we stand...
>
>
>
>
>
>
>
>
>Responding below to some of my own questions following commit
>   ddfd5bf859d04cc5da604b20021ceaba9de7def6
>in branch
>  feature__MATH-1563__genetic_algorithm
>
>Le mar. 24 mai 2022 à 01:54, Gilles Sadowski <gi...@gmail.com> a
écrit :
>>
>> Hello.
>>
>> Le mer. 18 mai 2022 à 16:34, Avijit Basak <av...@gmail.com> a
écrit :
>> >
>> > Hi All
>> >
>> >         Please find my comments below.
>> >
>> > Comments related to new model:
>> >
>> > 1) Hierarchy of GeneticOperator: In the proposed model the genetic
>> > operators are designed hierarchically implementing a common interface
and
>> > operators are accepted as a list.
>> > This enables interchangeability of operators. Library users would be
able
>> > to use crossover and mutation operators interchangeably.
>> > However, in genetic algorithms or other population based search
algorithms
>> > the operators are used for broadly two purposes, exploration and
>> > exploitation.
>> > In GA crossover is used for exploitation and mutation is used for
>> > exploration. Keeping them in a common hierarchy and allowing
>> > interchangeable operation violates that purpose.
>>
>> I'm not sure that the semantics of "exploitation" and "exploration"
>> should drive the API design.
>> Saying it differently: We don't need to know how various operators
>> will be used in order to implement them; hence IMO, there is no
>> need to discriminate at the API level.
>
>The "core" GA algorithm (see class "GeneticAlgorithmFactory") is
>oblivious to whether a genetic operator "is-a" crossover or mutation.

-- I have provided my comments related to this in the summary section at
the end.

>
>> >
>> > 2) Chromosome Fitness: In the new design the chromosome fitness is
>> > maintained as a hashmap where the key is the chromosome itself.
>> > Are we going to retain the fitness value in chromosome too otherwise
>> > implementation of Comparable won't be possible?
>>
>> Sorry, I don't follow.
>
>Implementing "Comparable" is not necessary.
-- Agreed

>
>> > Assuming the chromosome representation would be used to calculate
hashcode,
>> > it would be a very time consuming process depending on the length of
>> > chromosome.
>>
>> Is this assumption correct?
>> For what purpose do we need to compute a custom hash code?
>
>A custom "hashCode" method is not necessary.
>The only consequence seems that 2 different instances of a genotype that is
>logically the same (genotype-wise) could both be added to a population
while
>the same (in-memory) instance would only appear once in the hash map.

-- This would be fine with the proposed implementation.

>
>>
>> > E.g. in binary chromosomes we have provision to allow chromosome length
>> > upto (2^31 - 1).
>>
>> That's a lot. ;-)
>> Do you have use cases where such long representations can be handled?
>
>For now, I've use "BitSet" as the internal representation (but this could
>be changed if necessary, because it is not part of the public API).

-- Sure.

>
>> > Along with that the chromosome implements Comparable
>> > interface.
>> > Equality by Comparable interface would be decided by fitness
>>
>> Sure.
>>
>> > however
>> > equality by equals() method would depend on representation.
>>
>> Do we need a custom "equals"?
>
>We don't.

-- Agreed.

>
>> > As chromosomes with different representations may also have the same
>> > fitness, this would produce a conflict.
>>
>> Please provide an example.
>>
>> >
>> > 3) The model does not consider anything related to adaptive approaches.
>> > Would it be a separate factory?
>>
>> What I've proposed is an alternative "skeleton" for the API.  Of course,
>> more classes will provide specific functionality.
>> An adaptive operator "is-a" specialized genetic operator (perhaps the
>> notion of "Adaptive" will be defined through an interface?).
>
>We don't need anything that complex (IIUC).
>See interface "ApplicationRate" and its implementations in package "rate".

-- This is fine.

>
>> >
>> > 4) Comparison of chromosomes: In the current model two chromosomes are
>> > compared after decoding the genotype to phenotype using the internal
>> > decoder.
>> > How can we address this in the new model?
>>
>> As mentioned above: Do we really need to compare representations?
>
>Within the library, it does not seem necessary.

-- Agreed.

>
>> >
>> > 5) Chromosome String representation: Currently we use the toString()
method
>> > to print the chromosome's phenotype. In the new model we would need to
>> > avoid this approach as decoders won't be available to chromosomes.
>>
>> This seems like a minor issue (or perhaps no issue at all?) unless I'm
>> missing something.
>
>The GA does not need to print the phenotype.
>The decoder is user-defined, hence he can obviously apply it whenever
>he needs a printable version of the chromosomes.

-- Agreed.

>
>> >
>> > 6) However in the new model the generic type parameters and java.util
>> > packages are used in a more organized way. We can adopt the same in the
>> > existing model.
>>
>> I don't understand what you are proposing.
>
>I've used two generic parameters (one for genotype, one for phenotype).
>The code does not use any "@SuppressWarning" annotation.

-- Agreed.

>
>> >
>> > >If we need to document software (e.g. the "examples") produced
>> > >by a "Commons" component, it is preferable that we use and refer
>> > >to "Log4j2".
>> > >[The logger API is another matter since it does not impact how the
>> > >software is used (from a user's POV).]
>> > -- I shall make the changes.
>
>Logging statements are marginally useful in such a small piece of code.
>Tracking evolution can be implemented at the application level.
>See interface "GenerationCallback".

-- As long as users have some option to enable the same it is fine.

>
>>
>> Thanks.
>>
>> >
>> > > [...]
>> > >Doing manually will be too tedious.
>> > >If you are reasonably confident that you've ported everything
>> > >valuable, I guess that we'll rely on coverage tools...
>> > >The current log statements could also be construed as (totally)
>> > >unnecessary, as they merely document statements which I would
>> > >consider part of an "application", not the GA "library".
>> > >I believe that we do not agree yet on where to draw the line (my
>> > >proposal in MATH-1618 is related to that difference of opinion).
>> > >
>> > -- Probably I am missing something here. These log statements are more
>> > useful for library users. If we remove all log statements how users
would
>> > be able to debug any issue.
>> > What if there are issues in any part of library code or may be anything
>> > fails due to some application related customization.
>> > It may not be necessarily a library bug.
>>
>> I'm not clear about what is the intended purpose of logging.
>
>See previous response.
>
>> But I'd rather defer this discussion, and first focus on the actual GA
>> functionality being implemented.
>>
>> >
>> > >> >> >Likewise, the "PopulationStatisticsLogger" is not general
enough to
>> > >> >> >be worth being part of the library.
>> > >> >> -- I would love to keep this simple implementation of convergence
>> > >> >> listener. This can provide a very basic overview on the nature of
>> > >> >> convergence.
>> > >> >
>> > >> >As a user, you'll be able to provide the feature to your
application
>> > >> >with less than 10 lines of code (by registering a "listener").
>> > >> >Everyone else might want a slightly different output (contents
and/or
>> > >> >format) than what this class generates.
>> > >> -- Users would always have the freedom to register their own
listener.
>> > >> PopulationStatisticsLogger provides only a very basic option to
track the
>> > >> population statistics during the convergence process.
>> > >> This is never a very generic solution to meet the needs of all
users.
>> > >
>> > >AFAICT, the "PopulationStatisticsLogger" class satisfies *your* needs
>> > >(as a user), but as developers we should only include functionality
that
>> > >is broadly useful.  I think I provided arguments that it is not the
case of
>> > >that class.
>> > -- So do you want this class to be removed?
>>
>> Short answer would be "yes".  But the rationale is that we should
>> defer all functionality that just seems "nice-to-have" until the core
>> API is stabilized.
>
>Displaying statistics should be done at the application level.
>See "GenerationLogger" in class "MathFunctionOptimizer2" (adapted
>from yours).
>
>> >
>> > >Great!
>> > >
>> > >> >> [...]
>> > >
>> > >Maybe I'm still missing something, but IMHO the distinction between
>> > >adaptive vs not should not impact code at that level.
>> > -- Actually I wanted to keep a provision where users can mix both
>> > approaches. Maybe keep crossover probability as constant but mutation
as
>> > adaptive.
>>
>> AFAICT, this would follow from my proposal (when "AdaptiveMutation"
>> and "AdaptiveCrossover" operators are implemented).
>
>Such operators are not necessary (see previous comment, about
>"ApplicationRate").
>
>> >
>> > [...]
>> > >> >(3)
>> > >> >o.a.c.m.ga.utils.ChromosomeRepresentationUtils
>> > >> >
>> > >> >It seems to be a "mixed-bag" kind of class (that is being frowned
>> > >> >upon nowadays).
>> > >> >Its comment refers to "random" but some methods are not using
>> > >> >any randomization.  Most methods are only used in unit tests.
>> > >> >
>> > >> -- Actually most methods are taken from the legacy GA application.
In the
>> > >> legacy library representation there was no separate concept for the
>> > decoder
>> > >> and all representation utility methods were kept in the
corresponding
>> > >> Chromosome classes. ChromosomeRepresentationUtils is created as a
>> > >> placeholder for those utility methods.
>> > >
>> > >I'm not sure I follow what you mean.
>> > >If these methods are only needed by the (legacy?) examples, they
>> > >should be moved to the "ga-examples" module (where they can be
>> > >removed later on without regards to BC).
>> > -- These were adopted from the legacy model and should be useful in
this
>> > new model too.
>>
>> Anything not _surely_ useful (i.e. actually used by the new library)
>> should be left out.  If necessary, code can be re-added later.
>
>Representation(s) should be internal, together with the operators
>that manipulate them.
>
>> >
>> > > > > [...]
>> > >
>> > >OK.  [Is there a new PR?]
>> > -- I shall create one.
>>
>> We still need to agree on my proposal...
>> Would you try to follow up on that basis, i.e. add a minimal set of GA
>> operators, in order to show that some of the "examples" can be run?
>
>I've added the "MathFunctionOptimizer2" in the "Standalone" example
>application.  Please check that it produces the expected output.
>[I had to slightly change the function being optimized, and your decoding
>function. Why are you assuming that "Coordinates" cannot be negative?]

-- The fitness function uses the square of each dimensional value of the
coordinate. So negative signs won't impact the fitness value.
But we can always consider -ve.

>
>> > >
>> > >> >(5)
>> > >> >Why does a "Chromosome" need an "identifier"?
>> > >> >Method "getId()" is only used in "PopulationStatisticalSummaryImpl"
>> > >> >that is an internal class, where it seems that the chromosome
itself
>> > >> >(rather than its "id") could serve as the map's key.
>> > >> -- How can we generate a map's key from the chromosome itself?
Please
>> > >> suggest.
>> > >
>> > >A class to be used as a key only needs to implement "equals" and
>> > >"hashCode".
>> > -- The current chromosome class implements Comparable interface which
uses
>> > chromosome fitness for comparison. Use of both Comparable and equals()
>> > might introduce inconsistencies.
>
>In my proposal, neither is implemented.
>
>> An example?
>>
>> > >
>> > >> >
>> > >> >(6)
>> > >> >o.a.c.m.ga.chromsome.AbstractChromosome
>> > >> >
>> > >> >Field "fitness" is not "final", yet it could be: a
"FitnessFunction"
>> > >> >object (used in "evaluate() to compute that field) is passed to the
>> > >> >constructor.  Is there a reason for the "lazy" evaluation?
>> > >> >Dropping it would make the instance immutable (and "evaluate()"
>> > >> >should be renamed to "getFitness()").
>> > >> >
>> > >> >Why should the "FitnessFunction" be stored in every chromosome?
>> > >> >
>> > >> -- I have modified the fitness as final and initialized the same in
the
>> > >> constructor.
>> > >
>> > >Better, but did you check my proposal in MATH-1618, where
>> > >Chromosome and fitness are decoupled, and their relationship
>> > >is held within a "Population" instance?
>> > --Mentioned earlier.
>>
>> I still don't know whether you agree that my proposal makes it
>> simpler to express a GA.
>
>Please have a look at the commit mentioned above, and
> * for each design issue, post a new discussion thread,
> * for each bug, file a JIRA report.
>
>> > >
>> > >> [...]
>> > >
>> > >>
>> > >> >(8)
>> > >> >@SuppressWarnings("unchecked")
>> > >> >
>> > >> >By default, I'm a bit suspicious about having to resort to these
>> > >> annotations,
>> > >> >especially for the kind of algorithms we are trying to implement.
>> > >> >What do you think of the alternative approach outlined in the ZIP
file
>> > >> >attached in MATH-1618:
>> > >> >    https://issues.apache.org/jira/browse/MATH-1618
>> > >> >?
>> > >> -- This annotation is required because we have kept an option to use
>> > >> different types of genotypes including primitive.
>> > >> Because of that our base interfaces only declares phenotype not
genotype.
>> > >> This introduced a kind of hierarchy in all operators and chromosome
>> > classes
>> > >> which required us to use the mentioned annotation.
>> > >
>> > >I may again be missing something.
>> > >Could you please explain the case that makes these annotations
>> > >necessary.
>> > -- This has been only used to avoid the warning in the place of
typecasting.
>> > However, I can work to minimize this following your new model.
>>
>> "Minimize"?
>
>No unsafe cast seems necessary.

-- Agreed.

>
>> > >
>> > >> >
>> > >> >(9)
>> > >> >Naming of factory methods should be harmonized to match the
convention
>> > >> >adopted in components like [RNG] and [Numbers].
>> > >> >E.g. instead of "newChromosome(...)", please use "of(...)" or
>> > "from(...)"
>> > >> >for "value object", and "create(...)" otherwise.
>> > >> >
>> > >> -- I have renamed the same for Chromosome classes.
>> > >> What about the nextGeneration() method of ListPopulation class.
Renaming
>> > >> this to create() or from() won't convey the purpose of it.
>> > >
>> > >I agree, and that's why the new "Population" class (in MATH-1618) does
>> > >not provide a factory method (see also the "GeneticAlgorithmFactory"
>> > >class).
>> > -- We can avoid the same in the current model if we agree to use a
default
>> > implementation of population and remove the Population interface
following
>> > your new model.
>>
>> So, do we adopt that "new model"?
>> Or do you still have objections?
>
>As the one who investigates GA behaviours, please list the concrete
>problems with my proposal.

-- Mentioned at the end in the summary section.

>
>> > >
>> > >> >(10)
>> > >> >o.a.c.m.ga.chromosome.AbstractListChromosome
>> > >> >
>> > >> >Constructor is called with an argument that is a previously
instantiated
>> > >> >"representation".  If the latter is mutable, the caller will be
able to
>> > >> modify
>> > >> >the underlying data structure of the newly created chromosome.
 [The
>> > >> >doc assumes immutability of the representation but this cannot be
>> > >> >enforced, and mixed ownership can entail subtle bugs.]
>> > >> -- I think this applies to both representation as well as generic
>> > parameter
>> > >> type T. But I don't see any other option but to rely on the user.
>> > >
>> > >The Javadoc (at line 84) is misleading in its mention of "immutable".
>> > >
>> > >> If you have any suggestions kindly share.
>> > >
>> > >I may not understand all the implications, but I'd suggest that the
>> > >"representation" be instantiated within the control of the library
(e.g.
>> > >through a "builder"/"factory").
>> > -- Currently we have the ChromosomeRepresentationUtils for the same.
Its
>> > methods are designed to generate the representations.
>>
>> My suggestion is that this design can be improved (a.o. according to my
>> above suggestion).
>
>In my proposal, representations would be not be "public", and each
>would be declared and used within a dedicated package.
>See package "gene/binary".

-- This is fine.

>
>> > >
>> > >> >
>> > >> >(11)
>> > >> >Do we agree that, in a GA, the most time-consuming task is the
fitness
>> > >> >computation?  Hence IMO, it should be the focus of the
multithreading
>> > >> >tools (i.e. "ExecutorService"), probably keeping the other parts
(namely
>> > >> >the genetic operators) within a simple sequential loop (as in class
>> > >> >"GeneticAlgorithmFactory" in MATH-1618).
>> > >> -- Current implementation uses separate threads for applying
crossover
>> > and
>> > >> mutation operators for each pair of selected chromosomes.
>> > >> I think this ensures better utilization of multi-core processors
compared
>> > >> to use of multi-threading only for the fitness calculation.
>> > >
>> > >I have the opposite intuition: Parallel application of the genetic
>> > >operators would only provide marginal gains wrt the fitness
>> > >computation.
>> > >In any case, I think that it will be fairly easy to modify my proposed
>> > >"OffspringGenerator" class to use an "ExecutorService" (if benchmarks
>> > >show that a substantial gain could indeed be achieved).
>> > >
>> > >> -- Some codes are checked in. But there is a conflict in the pull
>> > request.
>> > >> So I shall create a new one and delete the old branch itself.
>> > >
>> > >IMHO, we could make more substantial progress if you could
>> > >first point to issues with my proposal in MATH-1618.
>> > --Mentioned earlier.
>>
>> Well, I don't know where we stand...
>
>Hoping we can make significant progress so as to include the new
>GA functionality in the upcoming release of "Commons Math"...


I would like to summarize a few points which I think could be a problem in
the new model.

1) Separation of crossover and mutation operator at the API level and the
corresponding usage of the GeneticOperator in the algorithm implementation:
-- As mentioned earlier we would need a separation of GeneticOperators.
The current implementation of the algorithm does not distinguish the
crossover and mutation operator and treats them uniformly.
This violates the basic concept of the algorithm itself and needs to be
fixed.

2) Use of adaptive probability:
-- Currently the Mutation class keeps a constant probability value as well
as a separate ApplicationRate was passed for each genetic operator.
Since we are going to use the operator probability in an adaptive way we
would need to pass the generated probability as an argument to the apply
method of GeneticOperator.

3) Separation of logic for adaptive and Simple GA:
-- We might want to separate two different strategies of GA i.e. simple GA
and adaptive GA.
As for simple GA with constant probability we can avoid computation like
rank calculation.

4) OffspringGenerator following Strategy pattern:
-- Can we have an implementation of OffspringGenerator following strategy
pattern. We should have default implementation following your first
proposal.
But there can be an optional parameter in the API method, where library
users would be able to provide their custom implementation.

If you need a separate mail chain for each of the review comments mentioned
above kindly let me know.

Please share further thoughts regarding this.


Thanks & Regards
--Avijit Basak


On Wed, 1 Jun 2022 at 04:53, Gilles Sadowski <gi...@gmail.com> wrote:

> Responding below to some of my own questions following commit
>    ddfd5bf859d04cc5da604b20021ceaba9de7def6
> in branch
>   feature__MATH-1563__genetic_algorithm
>
> Le mar. 24 mai 2022 à 01:54, Gilles Sadowski <gi...@gmail.com> a
> écrit :
> >
> > Hello.
> >
> > Le mer. 18 mai 2022 à 16:34, Avijit Basak <av...@gmail.com> a
> écrit :
> > >
> > > Hi All
> > >
> > >         Please find my comments below.
> > >
> > > Comments related to new model:
> > >
> > > 1) Hierarchy of GeneticOperator: In the proposed model the genetic
> > > operators are designed hierarchically implementing a common interface
> and
> > > operators are accepted as a list.
> > > This enables interchangeability of operators. Library users would be
> able
> > > to use crossover and mutation operators interchangeably.
> > > However, in genetic algorithms or other population based search
> algorithms
> > > the operators are used for broadly two purposes, exploration and
> > > exploitation.
> > > In GA crossover is used for exploitation and mutation is used for
> > > exploration. Keeping them in a common hierarchy and allowing
> > > interchangeable operation violates that purpose.
> >
> > I'm not sure that the semantics of "exploitation" and "exploration"
> > should drive the API design.
> > Saying it differently: We don't need to know how various operators
> > will be used in order to implement them; hence IMO, there is no
> > need to discriminate at the API level.
>
> The "core" GA algorithm (see class "GeneticAlgorithmFactory") is
> oblivious to whether a genetic operator "is-a" crossover or mutation.
>
> > >
> > > 2) Chromosome Fitness: In the new design the chromosome fitness is
> > > maintained as a hashmap where the key is the chromosome itself.
> > > Are we going to retain the fitness value in chromosome too otherwise
> > > implementation of Comparable won't be possible?
> >
> > Sorry, I don't follow.
>
> Implementing "Comparable" is not necessary.
>
> > > Assuming the chromosome representation would be used to calculate
> hashcode,
> > > it would be a very time consuming process depending on the length of
> > > chromosome.
> >
> > Is this assumption correct?
> > For what purpose do we need to compute a custom hash code?
>
> A custom "hashCode" method is not necessary.
> The only consequence seems that 2 different instances of a genotype that is
> logically the same (genotype-wise) could both be added to a population
> while
> the same (in-memory) instance would only appear once in the hash map.
>
> >
> > > E.g. in binary chromosomes we have provision to allow chromosome length
> > > upto (2^31 - 1).
> >
> > That's a lot. ;-)
> > Do you have use cases where such long representations can be handled?
>
> For now, I've use "BitSet" as the internal representation (but this could
> be changed if necessary, because it is not part of the public API).
>
> > > Along with that the chromosome implements Comparable
> > > interface.
> > > Equality by Comparable interface would be decided by fitness
> >
> > Sure.
> >
> > > however
> > > equality by equals() method would depend on representation.
> >
> > Do we need a custom "equals"?
>
> We don't.
>
> > > As chromosomes with different representations may also have the same
> > > fitness, this would produce a conflict.
> >
> > Please provide an example.
> >
> > >
> > > 3) The model does not consider anything related to adaptive approaches.
> > > Would it be a separate factory?
> >
> > What I've proposed is an alternative "skeleton" for the API.  Of course,
> > more classes will provide specific functionality.
> > An adaptive operator "is-a" specialized genetic operator (perhaps the
> > notion of "Adaptive" will be defined through an interface?).
>
> We don't need anything that complex (IIUC).
> See interface "ApplicationRate" and its implementations in package "rate".
>
> > >
> > > 4) Comparison of chromosomes: In the current model two chromosomes are
> > > compared after decoding the genotype to phenotype using the internal
> > > decoder.
> > > How can we address this in the new model?
> >
> > As mentioned above: Do we really need to compare representations?
>
> Within the library, it does not seem necessary.
>
> > >
> > > 5) Chromosome String representation: Currently we use the toString()
> method
> > > to print the chromosome's phenotype. In the new model we would need to
> > > avoid this approach as decoders won't be available to chromosomes.
> >
> > This seems like a minor issue (or perhaps no issue at all?) unless I'm
> > missing something.
>
> The GA does not need to print the phenotype.
> The decoder is user-defined, hence he can obviously apply it whenever
> he needs a printable version of the chromosomes.
>
> > >
> > > 6) However in the new model the generic type parameters and java.util
> > > packages are used in a more organized way. We can adopt the same in the
> > > existing model.
> >
> > I don't understand what you are proposing.
>
> I've used two generic parameters (one for genotype, one for phenotype).
> The code does not use any "@SuppressWarning" annotation.
>
> > >
> > > >If we need to document software (e.g. the "examples") produced
> > > >by a "Commons" component, it is preferable that we use and refer
> > > >to "Log4j2".
> > > >[The logger API is another matter since it does not impact how the
> > > >software is used (from a user's POV).]
> > > -- I shall make the changes.
>
> Logging statements are marginally useful in such a small piece of code.
> Tracking evolution can be implemented at the application level.
> See interface "GenerationCallback".
>
> >
> > Thanks.
> >
> > >
> > > > [...]
> > > >Doing manually will be too tedious.
> > > >If you are reasonably confident that you've ported everything
> > > >valuable, I guess that we'll rely on coverage tools...
> > > >The current log statements could also be construed as (totally)
> > > >unnecessary, as they merely document statements which I would
> > > >consider part of an "application", not the GA "library".
> > > >I believe that we do not agree yet on where to draw the line (my
> > > >proposal in MATH-1618 is related to that difference of opinion).
> > > >
> > > -- Probably I am missing something here. These log statements are more
> > > useful for library users. If we remove all log statements how users
> would
> > > be able to debug any issue.
> > > What if there are issues in any part of library code or may be anything
> > > fails due to some application related customization.
> > > It may not be necessarily a library bug.
> >
> > I'm not clear about what is the intended purpose of logging.
>
> See previous response.
>
> > But I'd rather defer this discussion, and first focus on the actual GA
> > functionality being implemented.
> >
> > >
> > > >> >> >Likewise, the "PopulationStatisticsLogger" is not general
> enough to
> > > >> >> >be worth being part of the library.
> > > >> >> -- I would love to keep this simple implementation of convergence
> > > >> >> listener. This can provide a very basic overview on the nature of
> > > >> >> convergence.
> > > >> >
> > > >> >As a user, you'll be able to provide the feature to your
> application
> > > >> >with less than 10 lines of code (by registering a "listener").
> > > >> >Everyone else might want a slightly different output (contents
> and/or
> > > >> >format) than what this class generates.
> > > >> -- Users would always have the freedom to register their own
> listener.
> > > >> PopulationStatisticsLogger provides only a very basic option to
> track the
> > > >> population statistics during the convergence process.
> > > >> This is never a very generic solution to meet the needs of all
> users.
> > > >
> > > >AFAICT, the "PopulationStatisticsLogger" class satisfies *your* needs
> > > >(as a user), but as developers we should only include functionality
> that
> > > >is broadly useful.  I think I provided arguments that it is not the
> case of
> > > >that class.
> > > -- So do you want this class to be removed?
> >
> > Short answer would be "yes".  But the rationale is that we should
> > defer all functionality that just seems "nice-to-have" until the core
> > API is stabilized.
>
> Displaying statistics should be done at the application level.
> See "GenerationLogger" in class "MathFunctionOptimizer2" (adapted
> from yours).
>
> > >
> > > >Great!
> > > >
> > > >> >> [...]
> > > >
> > > >Maybe I'm still missing something, but IMHO the distinction between
> > > >adaptive vs not should not impact code at that level.
> > > -- Actually I wanted to keep a provision where users can mix both
> > > approaches. Maybe keep crossover probability as constant but mutation
> as
> > > adaptive.
> >
> > AFAICT, this would follow from my proposal (when "AdaptiveMutation"
> > and "AdaptiveCrossover" operators are implemented).
>
> Such operators are not necessary (see previous comment, about
> "ApplicationRate").
>
> > >
> > > [...]
> > > >> >(3)
> > > >> >o.a.c.m.ga.utils.ChromosomeRepresentationUtils
> > > >> >
> > > >> >It seems to be a "mixed-bag" kind of class (that is being frowned
> > > >> >upon nowadays).
> > > >> >Its comment refers to "random" but some methods are not using
> > > >> >any randomization.  Most methods are only used in unit tests.
> > > >> >
> > > >> -- Actually most methods are taken from the legacy GA application.
> In the
> > > >> legacy library representation there was no separate concept for the
> > > decoder
> > > >> and all representation utility methods were kept in the
> corresponding
> > > >> Chromosome classes. ChromosomeRepresentationUtils is created as a
> > > >> placeholder for those utility methods.
> > > >
> > > >I'm not sure I follow what you mean.
> > > >If these methods are only needed by the (legacy?) examples, they
> > > >should be moved to the "ga-examples" module (where they can be
> > > >removed later on without regards to BC).
> > > -- These were adopted from the legacy model and should be useful in
> this
> > > new model too.
> >
> > Anything not _surely_ useful (i.e. actually used by the new library)
> > should be left out.  If necessary, code can be re-added later.
>
> Representation(s) should be internal, together with the operators
> that manipulate them.
>
> > >
> > > > > > [...]
> > > >
> > > >OK.  [Is there a new PR?]
> > > -- I shall create one.
> >
> > We still need to agree on my proposal...
> > Would you try to follow up on that basis, i.e. add a minimal set of GA
> > operators, in order to show that some of the "examples" can be run?
>
> I've added the "MathFunctionOptimizer2" in the "Standalone" example
> application.  Please check that it produces the expected output.
> [I had to slightly change the function being optimized, and your decoding
> function. Why are you assuming that "Coordinates" cannot be negative?]
>
> > > >
> > > >> >(5)
> > > >> >Why does a "Chromosome" need an "identifier"?
> > > >> >Method "getId()" is only used in "PopulationStatisticalSummaryImpl"
> > > >> >that is an internal class, where it seems that the chromosome
> itself
> > > >> >(rather than its "id") could serve as the map's key.
> > > >> -- How can we generate a map's key from the chromosome itself?
> Please
> > > >> suggest.
> > > >
> > > >A class to be used as a key only needs to implement "equals" and
> > > >"hashCode".
> > > -- The current chromosome class implements Comparable interface which
> uses
> > > chromosome fitness for comparison. Use of both Comparable and equals()
> > > might introduce inconsistencies.
>
> In my proposal, neither is implemented.
>
> > An example?
> >
> > > >
> > > >> >
> > > >> >(6)
> > > >> >o.a.c.m.ga.chromsome.AbstractChromosome
> > > >> >
> > > >> >Field "fitness" is not "final", yet it could be: a
> "FitnessFunction"
> > > >> >object (used in "evaluate() to compute that field) is passed to the
> > > >> >constructor.  Is there a reason for the "lazy" evaluation?
> > > >> >Dropping it would make the instance immutable (and "evaluate()"
> > > >> >should be renamed to "getFitness()").
> > > >> >
> > > >> >Why should the "FitnessFunction" be stored in every chromosome?
> > > >> >
> > > >> -- I have modified the fitness as final and initialized the same in
> the
> > > >> constructor.
> > > >
> > > >Better, but did you check my proposal in MATH-1618, where
> > > >Chromosome and fitness are decoupled, and their relationship
> > > >is held within a "Population" instance?
> > > --Mentioned earlier.
> >
> > I still don't know whether you agree that my proposal makes it
> > simpler to express a GA.
>
> Please have a look at the commit mentioned above, and
>  * for each design issue, post a new discussion thread,
>  * for each bug, file a JIRA report.
>
> > > >
> > > >> [...]
> > > >
> > > >>
> > > >> >(8)
> > > >> >@SuppressWarnings("unchecked")
> > > >> >
> > > >> >By default, I'm a bit suspicious about having to resort to these
> > > >> annotations,
> > > >> >especially for the kind of algorithms we are trying to implement.
> > > >> >What do you think of the alternative approach outlined in the ZIP
> file
> > > >> >attached in MATH-1618:
> > > >> >    https://issues.apache.org/jira/browse/MATH-1618
> > > >> >?
> > > >> -- This annotation is required because we have kept an option to use
> > > >> different types of genotypes including primitive.
> > > >> Because of that our base interfaces only declares phenotype not
> genotype.
> > > >> This introduced a kind of hierarchy in all operators and chromosome
> > > classes
> > > >> which required us to use the mentioned annotation.
> > > >
> > > >I may again be missing something.
> > > >Could you please explain the case that makes these annotations
> > > >necessary.
> > > -- This has been only used to avoid the warning in the place of
> typecasting.
> > > However, I can work to minimize this following your new model.
> >
> > "Minimize"?
>
> No unsafe cast seems necessary.
>
> > > >
> > > >> >
> > > >> >(9)
> > > >> >Naming of factory methods should be harmonized to match the
> convention
> > > >> >adopted in components like [RNG] and [Numbers].
> > > >> >E.g. instead of "newChromosome(...)", please use "of(...)" or
> > > "from(...)"
> > > >> >for "value object", and "create(...)" otherwise.
> > > >> >
> > > >> -- I have renamed the same for Chromosome classes.
> > > >> What about the nextGeneration() method of ListPopulation class.
> Renaming
> > > >> this to create() or from() won't convey the purpose of it.
> > > >
> > > >I agree, and that's why the new "Population" class (in MATH-1618) does
> > > >not provide a factory method (see also the "GeneticAlgorithmFactory"
> > > >class).
> > > -- We can avoid the same in the current model if we agree to use a
> default
> > > implementation of population and remove the Population interface
> following
> > > your new model.
> >
> > So, do we adopt that "new model"?
> > Or do you still have objections?
>
> As the one who investigates GA behaviours, please list the concrete
> problems with my proposal.
>
> > > >
> > > >> >(10)
> > > >> >o.a.c.m.ga.chromosome.AbstractListChromosome
> > > >> >
> > > >> >Constructor is called with an argument that is a previously
> instantiated
> > > >> >"representation".  If the latter is mutable, the caller will be
> able to
> > > >> modify
> > > >> >the underlying data structure of the newly created chromosome.
> [The
> > > >> >doc assumes immutability of the representation but this cannot be
> > > >> >enforced, and mixed ownership can entail subtle bugs.]
> > > >> -- I think this applies to both representation as well as generic
> > > parameter
> > > >> type T. But I don't see any other option but to rely on the user.
> > > >
> > > >The Javadoc (at line 84) is misleading in its mention of "immutable".
> > > >
> > > >> If you have any suggestions kindly share.
> > > >
> > > >I may not understand all the implications, but I'd suggest that the
> > > >"representation" be instantiated within the control of the library
> (e.g.
> > > >through a "builder"/"factory").
> > > -- Currently we have the ChromosomeRepresentationUtils for the same.
> Its
> > > methods are designed to generate the representations.
> >
> > My suggestion is that this design can be improved (a.o. according to my
> > above suggestion).
>
> In my proposal, representations would be not be "public", and each
> would be declared and used within a dedicated package.
> See package "gene/binary".
>
> > > >
> > > >> >
> > > >> >(11)
> > > >> >Do we agree that, in a GA, the most time-consuming task is the
> fitness
> > > >> >computation?  Hence IMO, it should be the focus of the
> multithreading
> > > >> >tools (i.e. "ExecutorService"), probably keeping the other parts
> (namely
> > > >> >the genetic operators) within a simple sequential loop (as in class
> > > >> >"GeneticAlgorithmFactory" in MATH-1618).
> > > >> -- Current implementation uses separate threads for applying
> crossover
> > > and
> > > >> mutation operators for each pair of selected chromosomes.
> > > >> I think this ensures better utilization of multi-core processors
> compared
> > > >> to use of multi-threading only for the fitness calculation.
> > > >
> > > >I have the opposite intuition: Parallel application of the genetic
> > > >operators would only provide marginal gains wrt the fitness
> > > >computation.
> > > >In any case, I think that it will be fairly easy to modify my proposed
> > > >"OffspringGenerator" class to use an "ExecutorService" (if benchmarks
> > > >show that a substantial gain could indeed be achieved).
> > > >
> > > >> -- Some codes are checked in. But there is a conflict in the pull
> > > request.
> > > >> So I shall create a new one and delete the old branch itself.
> > > >
> > > >IMHO, we could make more substantial progress if you could
> > > >first point to issues with my proposal in MATH-1618.
> > > --Mentioned earlier.
> >
> > Well, I don't know where we stand...
>
> Hoping we can make significant progress so as to include the new
> GA functionality in the upcoming release of "Commons Math"...
>
> Regards,
> Gilles
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

Re: [Math] Review of "genetic algorithm" module

Posted by Gilles Sadowski <gi...@gmail.com>.
Responding below to some of my own questions following commit
   ddfd5bf859d04cc5da604b20021ceaba9de7def6
in branch
  feature__MATH-1563__genetic_algorithm

Le mar. 24 mai 2022 à 01:54, Gilles Sadowski <gi...@gmail.com> a écrit :
>
> Hello.
>
> Le mer. 18 mai 2022 à 16:34, Avijit Basak <av...@gmail.com> a écrit :
> >
> > Hi All
> >
> >         Please find my comments below.
> >
> > Comments related to new model:
> >
> > 1) Hierarchy of GeneticOperator: In the proposed model the genetic
> > operators are designed hierarchically implementing a common interface and
> > operators are accepted as a list.
> > This enables interchangeability of operators. Library users would be able
> > to use crossover and mutation operators interchangeably.
> > However, in genetic algorithms or other population based search algorithms
> > the operators are used for broadly two purposes, exploration and
> > exploitation.
> > In GA crossover is used for exploitation and mutation is used for
> > exploration. Keeping them in a common hierarchy and allowing
> > interchangeable operation violates that purpose.
>
> I'm not sure that the semantics of "exploitation" and "exploration"
> should drive the API design.
> Saying it differently: We don't need to know how various operators
> will be used in order to implement them; hence IMO, there is no
> need to discriminate at the API level.

The "core" GA algorithm (see class "GeneticAlgorithmFactory") is
oblivious to whether a genetic operator "is-a" crossover or mutation.

> >
> > 2) Chromosome Fitness: In the new design the chromosome fitness is
> > maintained as a hashmap where the key is the chromosome itself.
> > Are we going to retain the fitness value in chromosome too otherwise
> > implementation of Comparable won't be possible?
>
> Sorry, I don't follow.

Implementing "Comparable" is not necessary.

> > Assuming the chromosome representation would be used to calculate hashcode,
> > it would be a very time consuming process depending on the length of
> > chromosome.
>
> Is this assumption correct?
> For what purpose do we need to compute a custom hash code?

A custom "hashCode" method is not necessary.
The only consequence seems that 2 different instances of a genotype that is
logically the same (genotype-wise) could both be added to a population while
the same (in-memory) instance would only appear once in the hash map.

>
> > E.g. in binary chromosomes we have provision to allow chromosome length
> > upto (2^31 - 1).
>
> That's a lot. ;-)
> Do you have use cases where such long representations can be handled?

For now, I've use "BitSet" as the internal representation (but this could
be changed if necessary, because it is not part of the public API).

> > Along with that the chromosome implements Comparable
> > interface.
> > Equality by Comparable interface would be decided by fitness
>
> Sure.
>
> > however
> > equality by equals() method would depend on representation.
>
> Do we need a custom "equals"?

We don't.

> > As chromosomes with different representations may also have the same
> > fitness, this would produce a conflict.
>
> Please provide an example.
>
> >
> > 3) The model does not consider anything related to adaptive approaches.
> > Would it be a separate factory?
>
> What I've proposed is an alternative "skeleton" for the API.  Of course,
> more classes will provide specific functionality.
> An adaptive operator "is-a" specialized genetic operator (perhaps the
> notion of "Adaptive" will be defined through an interface?).

We don't need anything that complex (IIUC).
See interface "ApplicationRate" and its implementations in package "rate".

> >
> > 4) Comparison of chromosomes: In the current model two chromosomes are
> > compared after decoding the genotype to phenotype using the internal
> > decoder.
> > How can we address this in the new model?
>
> As mentioned above: Do we really need to compare representations?

Within the library, it does not seem necessary.

> >
> > 5) Chromosome String representation: Currently we use the toString() method
> > to print the chromosome's phenotype. In the new model we would need to
> > avoid this approach as decoders won't be available to chromosomes.
>
> This seems like a minor issue (or perhaps no issue at all?) unless I'm
> missing something.

The GA does not need to print the phenotype.
The decoder is user-defined, hence he can obviously apply it whenever
he needs a printable version of the chromosomes.

> >
> > 6) However in the new model the generic type parameters and java.util
> > packages are used in a more organized way. We can adopt the same in the
> > existing model.
>
> I don't understand what you are proposing.

I've used two generic parameters (one for genotype, one for phenotype).
The code does not use any "@SuppressWarning" annotation.

> >
> > >If we need to document software (e.g. the "examples") produced
> > >by a "Commons" component, it is preferable that we use and refer
> > >to "Log4j2".
> > >[The logger API is another matter since it does not impact how the
> > >software is used (from a user's POV).]
> > -- I shall make the changes.

Logging statements are marginally useful in such a small piece of code.
Tracking evolution can be implemented at the application level.
See interface "GenerationCallback".

>
> Thanks.
>
> >
> > > [...]
> > >Doing manually will be too tedious.
> > >If you are reasonably confident that you've ported everything
> > >valuable, I guess that we'll rely on coverage tools...
> > >The current log statements could also be construed as (totally)
> > >unnecessary, as they merely document statements which I would
> > >consider part of an "application", not the GA "library".
> > >I believe that we do not agree yet on where to draw the line (my
> > >proposal in MATH-1618 is related to that difference of opinion).
> > >
> > -- Probably I am missing something here. These log statements are more
> > useful for library users. If we remove all log statements how users would
> > be able to debug any issue.
> > What if there are issues in any part of library code or may be anything
> > fails due to some application related customization.
> > It may not be necessarily a library bug.
>
> I'm not clear about what is the intended purpose of logging.

See previous response.

> But I'd rather defer this discussion, and first focus on the actual GA
> functionality being implemented.
>
> >
> > >> >> >Likewise, the "PopulationStatisticsLogger" is not general enough to
> > >> >> >be worth being part of the library.
> > >> >> -- I would love to keep this simple implementation of convergence
> > >> >> listener. This can provide a very basic overview on the nature of
> > >> >> convergence.
> > >> >
> > >> >As a user, you'll be able to provide the feature to your application
> > >> >with less than 10 lines of code (by registering a "listener").
> > >> >Everyone else might want a slightly different output (contents and/or
> > >> >format) than what this class generates.
> > >> -- Users would always have the freedom to register their own listener.
> > >> PopulationStatisticsLogger provides only a very basic option to track the
> > >> population statistics during the convergence process.
> > >> This is never a very generic solution to meet the needs of all users.
> > >
> > >AFAICT, the "PopulationStatisticsLogger" class satisfies *your* needs
> > >(as a user), but as developers we should only include functionality that
> > >is broadly useful.  I think I provided arguments that it is not the case of
> > >that class.
> > -- So do you want this class to be removed?
>
> Short answer would be "yes".  But the rationale is that we should
> defer all functionality that just seems "nice-to-have" until the core
> API is stabilized.

Displaying statistics should be done at the application level.
See "GenerationLogger" in class "MathFunctionOptimizer2" (adapted
from yours).

> >
> > >Great!
> > >
> > >> >> [...]
> > >
> > >Maybe I'm still missing something, but IMHO the distinction between
> > >adaptive vs not should not impact code at that level.
> > -- Actually I wanted to keep a provision where users can mix both
> > approaches. Maybe keep crossover probability as constant but mutation as
> > adaptive.
>
> AFAICT, this would follow from my proposal (when "AdaptiveMutation"
> and "AdaptiveCrossover" operators are implemented).

Such operators are not necessary (see previous comment, about
"ApplicationRate").

> >
> > [...]
> > >> >(3)
> > >> >o.a.c.m.ga.utils.ChromosomeRepresentationUtils
> > >> >
> > >> >It seems to be a "mixed-bag" kind of class (that is being frowned
> > >> >upon nowadays).
> > >> >Its comment refers to "random" but some methods are not using
> > >> >any randomization.  Most methods are only used in unit tests.
> > >> >
> > >> -- Actually most methods are taken from the legacy GA application. In the
> > >> legacy library representation there was no separate concept for the
> > decoder
> > >> and all representation utility methods were kept in the corresponding
> > >> Chromosome classes. ChromosomeRepresentationUtils is created as a
> > >> placeholder for those utility methods.
> > >
> > >I'm not sure I follow what you mean.
> > >If these methods are only needed by the (legacy?) examples, they
> > >should be moved to the "ga-examples" module (where they can be
> > >removed later on without regards to BC).
> > -- These were adopted from the legacy model and should be useful in this
> > new model too.
>
> Anything not _surely_ useful (i.e. actually used by the new library)
> should be left out.  If necessary, code can be re-added later.

Representation(s) should be internal, together with the operators
that manipulate them.

> >
> > > > > [...]
> > >
> > >OK.  [Is there a new PR?]
> > -- I shall create one.
>
> We still need to agree on my proposal...
> Would you try to follow up on that basis, i.e. add a minimal set of GA
> operators, in order to show that some of the "examples" can be run?

I've added the "MathFunctionOptimizer2" in the "Standalone" example
application.  Please check that it produces the expected output.
[I had to slightly change the function being optimized, and your decoding
function. Why are you assuming that "Coordinates" cannot be negative?]

> > >
> > >> >(5)
> > >> >Why does a "Chromosome" need an "identifier"?
> > >> >Method "getId()" is only used in "PopulationStatisticalSummaryImpl"
> > >> >that is an internal class, where it seems that the chromosome itself
> > >> >(rather than its "id") could serve as the map's key.
> > >> -- How can we generate a map's key from the chromosome itself? Please
> > >> suggest.
> > >
> > >A class to be used as a key only needs to implement "equals" and
> > >"hashCode".
> > -- The current chromosome class implements Comparable interface which uses
> > chromosome fitness for comparison. Use of both Comparable and equals()
> > might introduce inconsistencies.

In my proposal, neither is implemented.

> An example?
>
> > >
> > >> >
> > >> >(6)
> > >> >o.a.c.m.ga.chromsome.AbstractChromosome
> > >> >
> > >> >Field "fitness" is not "final", yet it could be: a "FitnessFunction"
> > >> >object (used in "evaluate() to compute that field) is passed to the
> > >> >constructor.  Is there a reason for the "lazy" evaluation?
> > >> >Dropping it would make the instance immutable (and "evaluate()"
> > >> >should be renamed to "getFitness()").
> > >> >
> > >> >Why should the "FitnessFunction" be stored in every chromosome?
> > >> >
> > >> -- I have modified the fitness as final and initialized the same in the
> > >> constructor.
> > >
> > >Better, but did you check my proposal in MATH-1618, where
> > >Chromosome and fitness are decoupled, and their relationship
> > >is held within a "Population" instance?
> > --Mentioned earlier.
>
> I still don't know whether you agree that my proposal makes it
> simpler to express a GA.

Please have a look at the commit mentioned above, and
 * for each design issue, post a new discussion thread,
 * for each bug, file a JIRA report.

> > >
> > >> [...]
> > >
> > >>
> > >> >(8)
> > >> >@SuppressWarnings("unchecked")
> > >> >
> > >> >By default, I'm a bit suspicious about having to resort to these
> > >> annotations,
> > >> >especially for the kind of algorithms we are trying to implement.
> > >> >What do you think of the alternative approach outlined in the ZIP file
> > >> >attached in MATH-1618:
> > >> >    https://issues.apache.org/jira/browse/MATH-1618
> > >> >?
> > >> -- This annotation is required because we have kept an option to use
> > >> different types of genotypes including primitive.
> > >> Because of that our base interfaces only declares phenotype not genotype.
> > >> This introduced a kind of hierarchy in all operators and chromosome
> > classes
> > >> which required us to use the mentioned annotation.
> > >
> > >I may again be missing something.
> > >Could you please explain the case that makes these annotations
> > >necessary.
> > -- This has been only used to avoid the warning in the place of typecasting.
> > However, I can work to minimize this following your new model.
>
> "Minimize"?

No unsafe cast seems necessary.

> > >
> > >> >
> > >> >(9)
> > >> >Naming of factory methods should be harmonized to match the convention
> > >> >adopted in components like [RNG] and [Numbers].
> > >> >E.g. instead of "newChromosome(...)", please use "of(...)" or
> > "from(...)"
> > >> >for "value object", and "create(...)" otherwise.
> > >> >
> > >> -- I have renamed the same for Chromosome classes.
> > >> What about the nextGeneration() method of ListPopulation class. Renaming
> > >> this to create() or from() won't convey the purpose of it.
> > >
> > >I agree, and that's why the new "Population" class (in MATH-1618) does
> > >not provide a factory method (see also the "GeneticAlgorithmFactory"
> > >class).
> > -- We can avoid the same in the current model if we agree to use a default
> > implementation of population and remove the Population interface following
> > your new model.
>
> So, do we adopt that "new model"?
> Or do you still have objections?

As the one who investigates GA behaviours, please list the concrete
problems with my proposal.

> > >
> > >> >(10)
> > >> >o.a.c.m.ga.chromosome.AbstractListChromosome
> > >> >
> > >> >Constructor is called with an argument that is a previously instantiated
> > >> >"representation".  If the latter is mutable, the caller will be able to
> > >> modify
> > >> >the underlying data structure of the newly created chromosome.  [The
> > >> >doc assumes immutability of the representation but this cannot be
> > >> >enforced, and mixed ownership can entail subtle bugs.]
> > >> -- I think this applies to both representation as well as generic
> > parameter
> > >> type T. But I don't see any other option but to rely on the user.
> > >
> > >The Javadoc (at line 84) is misleading in its mention of "immutable".
> > >
> > >> If you have any suggestions kindly share.
> > >
> > >I may not understand all the implications, but I'd suggest that the
> > >"representation" be instantiated within the control of the library (e.g.
> > >through a "builder"/"factory").
> > -- Currently we have the ChromosomeRepresentationUtils for the same. Its
> > methods are designed to generate the representations.
>
> My suggestion is that this design can be improved (a.o. according to my
> above suggestion).

In my proposal, representations would be not be "public", and each
would be declared and used within a dedicated package.
See package "gene/binary".

> > >
> > >> >
> > >> >(11)
> > >> >Do we agree that, in a GA, the most time-consuming task is the fitness
> > >> >computation?  Hence IMO, it should be the focus of the multithreading
> > >> >tools (i.e. "ExecutorService"), probably keeping the other parts (namely
> > >> >the genetic operators) within a simple sequential loop (as in class
> > >> >"GeneticAlgorithmFactory" in MATH-1618).
> > >> -- Current implementation uses separate threads for applying crossover
> > and
> > >> mutation operators for each pair of selected chromosomes.
> > >> I think this ensures better utilization of multi-core processors compared
> > >> to use of multi-threading only for the fitness calculation.
> > >
> > >I have the opposite intuition: Parallel application of the genetic
> > >operators would only provide marginal gains wrt the fitness
> > >computation.
> > >In any case, I think that it will be fairly easy to modify my proposed
> > >"OffspringGenerator" class to use an "ExecutorService" (if benchmarks
> > >show that a substantial gain could indeed be achieved).
> > >
> > >> -- Some codes are checked in. But there is a conflict in the pull
> > request.
> > >> So I shall create a new one and delete the old branch itself.
> > >
> > >IMHO, we could make more substantial progress if you could
> > >first point to issues with my proposal in MATH-1618.
> > --Mentioned earlier.
>
> Well, I don't know where we stand...

Hoping we can make significant progress so as to include the new
GA functionality in the upcoming release of "Commons Math"...

Regards,
Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [Math] Review of "genetic algorithm" module

Posted by Gilles Sadowski <gi...@gmail.com>.
Hello.

Le mer. 18 mai 2022 à 16:34, Avijit Basak <av...@gmail.com> a écrit :
>
> Hi All
>
>         Please find my comments below.
>
> Comments related to new model:
>
> 1) Hierarchy of GeneticOperator: In the proposed model the genetic
> operators are designed hierarchically implementing a common interface and
> operators are accepted as a list.
> This enables interchangeability of operators. Library users would be able
> to use crossover and mutation operators interchangeably.
> However, in genetic algorithms or other population based search algorithms
> the operators are used for broadly two purposes, exploration and
> exploitation.
> In GA crossover is used for exploitation and mutation is used for
> exploration. Keeping them in a common hierarchy and allowing
> interchangeable operation violates that purpose.

I'm not sure that the semantics of "exploitation" and "exploration"
should drive the API design.
Saying it differently: We don't need to know how various operators
will be used in order to implement them; hence IMO, there is no
need to discriminate at the API level.

>
> 2) Chromosome Fitness: In the new design the chromosome fitness is
> maintained as a hashmap where the key is the chromosome itself.
> Are we going to retain the fitness value in chromosome too otherwise
> implementation of Comparable won't be possible?

Sorry, I don't follow.

> Assuming the chromosome representation would be used to calculate hashcode,
> it would be a very time consuming process depending on the length of
> chromosome.

Is this assumption correct?
For what purpose do we need to compute a custom hash code?

> E.g. in binary chromosomes we have provision to allow chromosome length
> upto (2^31 - 1).

That's a lot. ;-)
Do you have use cases where such long representations can be handled?

> Along with that the chromosome implements Comparable
> interface.
> Equality by Comparable interface would be decided by fitness

Sure.

> however
> equality by equals() method would depend on representation.

Do we need a custom "equals"?

> As chromosomes with different representations may also have the same
> fitness, this would produce a conflict.

Please provide an example.

>
> 3) The model does not consider anything related to adaptive approaches.
> Would it be a separate factory?

What I've proposed is an alternative "skeleton" for the API.  Of course,
more classes will provide specific functionality.
An adaptive operator "is-a" specialized genetic operator (perhaps the
notion of "Adaptive" will be defined through an interface?).

>
> 4) Comparison of chromosomes: In the current model two chromosomes are
> compared after decoding the genotype to phenotype using the internal
> decoder.
> How can we address this in the new model?

As mentioned above: Do we really need to compare representations?

>
> 5) Chromosome String representation: Currently we use the toString() method
> to print the chromosome's phenotype. In the new model we would need to
> avoid this approach as decoders won't be available to chromosomes.

This seems like a minor issue (or perhaps no issue at all?) unless I'm
missing something.

>
> 6) However in the new model the generic type parameters and java.util
> packages are used in a more organized way. We can adopt the same in the
> existing model.

I don't understand what you are proposing.

>
> >If we need to document software (e.g. the "examples") produced
> >by a "Commons" component, it is preferable that we use and refer
> >to "Log4j2".
> >[The logger API is another matter since it does not impact how the
> >software is used (from a user's POV).]
> -- I shall make the changes.

Thanks.

>
> > [...]
> >Doing manually will be too tedious.
> >If you are reasonably confident that you've ported everything
> >valuable, I guess that we'll rely on coverage tools...
> >The current log statements could also be construed as (totally)
> >unnecessary, as they merely document statements which I would
> >consider part of an "application", not the GA "library".
> >I believe that we do not agree yet on where to draw the line (my
> >proposal in MATH-1618 is related to that difference of opinion).
> >
> -- Probably I am missing something here. These log statements are more
> useful for library users. If we remove all log statements how users would
> be able to debug any issue.
> What if there are issues in any part of library code or may be anything
> fails due to some application related customization.
> It may not be necessarily a library bug.

I'm not clear about what is the intended purpose of logging.
But I'd rather defer this discussion, and first focus on the actual GA
functionality being implemented.

>
> >> >> >Likewise, the "PopulationStatisticsLogger" is not general enough to
> >> >> >be worth being part of the library.
> >> >> -- I would love to keep this simple implementation of convergence
> >> >> listener. This can provide a very basic overview on the nature of
> >> >> convergence.
> >> >
> >> >As a user, you'll be able to provide the feature to your application
> >> >with less than 10 lines of code (by registering a "listener").
> >> >Everyone else might want a slightly different output (contents and/or
> >> >format) than what this class generates.
> >> -- Users would always have the freedom to register their own listener.
> >> PopulationStatisticsLogger provides only a very basic option to track the
> >> population statistics during the convergence process.
> >> This is never a very generic solution to meet the needs of all users.
> >
> >AFAICT, the "PopulationStatisticsLogger" class satisfies *your* needs
> >(as a user), but as developers we should only include functionality that
> >is broadly useful.  I think I provided arguments that it is not the case of
> >that class.
> -- So do you want this class to be removed?

Short answer would be "yes".  But the rationale is that we should
defer all functionality that just seems "nice-to-have" until the core
API is stabilized.

>
> >Great!
> >
> >> >> [...]
> >
> >Maybe I'm still missing something, but IMHO the distinction between
> >adaptive vs not should not impact code at that level.
> -- Actually I wanted to keep a provision where users can mix both
> approaches. Maybe keep crossover probability as constant but mutation as
> adaptive.

AFAICT, this would follow from my proposal (when "AdaptiveMutation"
and "AdaptiveCrossover" operators are implemented).

>
> [...]
> >> >(3)
> >> >o.a.c.m.ga.utils.ChromosomeRepresentationUtils
> >> >
> >> >It seems to be a "mixed-bag" kind of class (that is being frowned
> >> >upon nowadays).
> >> >Its comment refers to "random" but some methods are not using
> >> >any randomization.  Most methods are only used in unit tests.
> >> >
> >> -- Actually most methods are taken from the legacy GA application. In the
> >> legacy library representation there was no separate concept for the
> decoder
> >> and all representation utility methods were kept in the corresponding
> >> Chromosome classes. ChromosomeRepresentationUtils is created as a
> >> placeholder for those utility methods.
> >
> >I'm not sure I follow what you mean.
> >If these methods are only needed by the (legacy?) examples, they
> >should be moved to the "ga-examples" module (where they can be
> >removed later on without regards to BC).
> -- These were adopted from the legacy model and should be useful in this
> new model too.

Anything not _surely_ useful (i.e. actually used by the new library)
should be left out.  If necessary, code can be re-added later.

>
> > > > [...]
> >
> >OK.  [Is there a new PR?]
> -- I shall create one.

We still need to agree on my proposal...
Would you try to follow up on that basis, i.e. add a minimal set of GA
operators, in order to show that some of the "examples" can be run?

> >
> >> >(5)
> >> >Why does a "Chromosome" need an "identifier"?
> >> >Method "getId()" is only used in "PopulationStatisticalSummaryImpl"
> >> >that is an internal class, where it seems that the chromosome itself
> >> >(rather than its "id") could serve as the map's key.
> >> -- How can we generate a map's key from the chromosome itself? Please
> >> suggest.
> >
> >A class to be used as a key only needs to implement "equals" and
> >"hashCode".
> -- The current chromosome class implements Comparable interface which uses
> chromosome fitness for comparison. Use of both Comparable and equals()
> might introduce inconsistencies.

An example?

> >
> >> >
> >> >(6)
> >> >o.a.c.m.ga.chromsome.AbstractChromosome
> >> >
> >> >Field "fitness" is not "final", yet it could be: a "FitnessFunction"
> >> >object (used in "evaluate() to compute that field) is passed to the
> >> >constructor.  Is there a reason for the "lazy" evaluation?
> >> >Dropping it would make the instance immutable (and "evaluate()"
> >> >should be renamed to "getFitness()").
> >> >
> >> >Why should the "FitnessFunction" be stored in every chromosome?
> >> >
> >> -- I have modified the fitness as final and initialized the same in the
> >> constructor.
> >
> >Better, but did you check my proposal in MATH-1618, where
> >Chromosome and fitness are decoupled, and their relationship
> >is held within a "Population" instance?
> --Mentioned earlier.

I still don't know whether you agree that my proposal makes it
simpler to express a GA.

> >
> >> [...]
> >
> >>
> >> >(8)
> >> >@SuppressWarnings("unchecked")
> >> >
> >> >By default, I'm a bit suspicious about having to resort to these
> >> annotations,
> >> >especially for the kind of algorithms we are trying to implement.
> >> >What do you think of the alternative approach outlined in the ZIP file
> >> >attached in MATH-1618:
> >> >    https://issues.apache.org/jira/browse/MATH-1618
> >> >?
> >> -- This annotation is required because we have kept an option to use
> >> different types of genotypes including primitive.
> >> Because of that our base interfaces only declares phenotype not genotype.
> >> This introduced a kind of hierarchy in all operators and chromosome
> classes
> >> which required us to use the mentioned annotation.
> >
> >I may again be missing something.
> >Could you please explain the case that makes these annotations
> >necessary.
> -- This has been only used to avoid the warning in the place of typecasting.
> However, I can work to minimize this following your new model.

"Minimize"?

> >
> >> >
> >> >(9)
> >> >Naming of factory methods should be harmonized to match the convention
> >> >adopted in components like [RNG] and [Numbers].
> >> >E.g. instead of "newChromosome(...)", please use "of(...)" or
> "from(...)"
> >> >for "value object", and "create(...)" otherwise.
> >> >
> >> -- I have renamed the same for Chromosome classes.
> >> What about the nextGeneration() method of ListPopulation class. Renaming
> >> this to create() or from() won't convey the purpose of it.
> >
> >I agree, and that's why the new "Population" class (in MATH-1618) does
> >not provide a factory method (see also the "GeneticAlgorithmFactory"
> >class).
> -- We can avoid the same in the current model if we agree to use a default
> implementation of population and remove the Population interface following
> your new model.

So, do we adopt that "new model"?
Or do you still have objections?

> >
> >> >(10)
> >> >o.a.c.m.ga.chromosome.AbstractListChromosome
> >> >
> >> >Constructor is called with an argument that is a previously instantiated
> >> >"representation".  If the latter is mutable, the caller will be able to
> >> modify
> >> >the underlying data structure of the newly created chromosome.  [The
> >> >doc assumes immutability of the representation but this cannot be
> >> >enforced, and mixed ownership can entail subtle bugs.]
> >> -- I think this applies to both representation as well as generic
> parameter
> >> type T. But I don't see any other option but to rely on the user.
> >
> >The Javadoc (at line 84) is misleading in its mention of "immutable".
> >
> >> If you have any suggestions kindly share.
> >
> >I may not understand all the implications, but I'd suggest that the
> >"representation" be instantiated within the control of the library (e.g.
> >through a "builder"/"factory").
> -- Currently we have the ChromosomeRepresentationUtils for the same. Its
> methods are designed to generate the representations.

My suggestion is that this design can be improved (a.o. according to my
above suggestion).

> >
> >> >
> >> >(11)
> >> >Do we agree that, in a GA, the most time-consuming task is the fitness
> >> >computation?  Hence IMO, it should be the focus of the multithreading
> >> >tools (i.e. "ExecutorService"), probably keeping the other parts (namely
> >> >the genetic operators) within a simple sequential loop (as in class
> >> >"GeneticAlgorithmFactory" in MATH-1618).
> >> -- Current implementation uses separate threads for applying crossover
> and
> >> mutation operators for each pair of selected chromosomes.
> >> I think this ensures better utilization of multi-core processors compared
> >> to use of multi-threading only for the fitness calculation.
> >
> >I have the opposite intuition: Parallel application of the genetic
> >operators would only provide marginal gains wrt the fitness
> >computation.
> >In any case, I think that it will be fairly easy to modify my proposed
> >"OffspringGenerator" class to use an "ExecutorService" (if benchmarks
> >show that a substantial gain could indeed be achieved).
> >
> >> -- Some codes are checked in. But there is a conflict in the pull
> request.
> >> So I shall create a new one and delete the old branch itself.
> >
> >IMHO, we could make more substantial progress if you could
> >first point to issues with my proposal in MATH-1618.
> --Mentioned earlier.

Well, I don't know where we stand...

Regards,
Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [Math] Review of "genetic algorithm" module

Posted by Avijit Basak <av...@gmail.com>.
Hi All

        Please find my comments below.

Comments related to new model:

1) Hierarchy of GeneticOperator: In the proposed model the genetic
operators are designed hierarchically implementing a common interface and
operators are accepted as a list.
This enables interchangeability of operators. Library users would be able
to use crossover and mutation operators interchangeably.
However, in genetic algorithms or other population based search algorithms
the operators are used for broadly two purposes, exploration and
exploitation.
In GA crossover is used for exploitation and mutation is used for
exploration. Keeping them in a common hierarchy and allowing
interchangeable operation violates that purpose.

2) Chromosome Fitness: In the new design the chromosome fitness is
maintained as a hashmap where the key is the chromosome itself.
Are we going to retain the fitness value in chromosome too otherwise
implementation of Comparable won't be possible?
Assuming the chromosome representation would be used to calculate hashcode,
it would be a very time consuming process depending on the length of
chromosome.
E.g. in binary chromosomes we have provision to allow chromosome length
upto (2^31 - 1). Along with that the chromosome implements Comparable
interface.
Equality by Comparable interface would be decided by fitness however
equality by equals() method would depend on representation.
As chromosomes with different representations may also have the same
fitness, this would produce a conflict.

3) The model does not consider anything related to adaptive approaches.
Would it be a separate factory?

4) Comparison of chromosomes: In the current model two chromosomes are
compared after decoding the genotype to phenotype using the internal
decoder.
How can we address this in the new model?

5) Chromosome String representation: Currently we use the toString() method
to print the chromosome's phenotype. In the new model we would need to
avoid this approach as decoders won't be available to chromosomes.

6) However in the new model the generic type parameters and java.util
packages are used in a more organized way. We can adopt the same in the
existing model.



>If we need to document software (e.g. the "examples") produced
>by a "Commons" component, it is preferable that we use and refer
>to "Log4j2".
>[The logger API is another matter since it does not impact how the
>software is used (from a user's POV).]
-- I shall make the changes.

>
>> >
>> >> There was a discussion
>> >> earlier on this and the decision was to use console for example
modules
>> >> log messages.
>> >
>> >I recall that we want a separation between the logger API (library
>> >dependency) and logger implementation ("examples" dependency).
>> >
>> >Is it a feature of "slf4j-simple" to only print to "stderr" or can it
>> >be configured?
>> >There is no issue on the "examples" module to depend on a
>> >full-featured logger (primary candidate would be "Log4j2").
>> --Already mentioned earlier.
>
>We should change SimpleLogger -> Log4j2 equivalent.
--I shall do this.

[...]
>Doing manually will be too tedious.
>If you are reasonably confident that you've ported everything
>valuable, I guess that we'll rely on coverage tools...
>The current log statements could also be construed as (totally)
>unnecessary, as they merely document statements which I would
>consider part of an "application", not the GA "library".
>I believe that we do not agree yet on where to draw the line (my
>proposal in MATH-1618 is related to that difference of opinion).
>
-- Probably I am missing something here. These log statements are more
useful for library users. If we remove all log statements how users would
be able to debug any issue.
What if there are issues in any part of library code or may be anything
fails due to some application related customization.
It may not be necessarily a library bug.

>> >> >Likewise, the "PopulationStatisticsLogger" is not general enough to
>> >> >be worth being part of the library.
>> >> -- I would love to keep this simple implementation of convergence
>> >> listener. This can provide a very basic overview on the nature of
>> >> convergence.
>> >
>> >As a user, you'll be able to provide the feature to your application
>> >with less than 10 lines of code (by registering a "listener").
>> >Everyone else might want a slightly different output (contents and/or
>> >format) than what this class generates.
>> -- Users would always have the freedom to register their own listener.
>> PopulationStatisticsLogger provides only a very basic option to track the
>> population statistics during the convergence process.
>> This is never a very generic solution to meet the needs of all users.
>
>AFAICT, the "PopulationStatisticsLogger" class satisfies *your* needs
>(as a user), but as developers we should only include functionality that
>is broadly useful.  I think I provided arguments that it is not the case of
>that class.
-- So do you want this class to be removed?

>Great!
>
>> >> [...]
>> >> >In "AdaptiveGeneticAlgorithm":
>> >> >* There should be a single constructor (same remark as above).
>> >> -- Removed the constructor with default argument.
>> >>
>> >> >* Why the use of reflection ("isAssignableFrom")?
>> >> -- Replaced it by instanceof.
>> >
>> >Marginally better ;-) it still does not say why the statistics is
disabled
>> >depending on the operator type...
>> --Statistics is required only for the adaptive probability generation.
>> In case application users choose to use a constant probability generation
>> scheme there is no use of this computation.
>> I shall document the same.
>
>Maybe I'm still missing something, but IMHO the distinction between
>adaptive vs not should not impact code at that level.
-- Actually I wanted to keep a provision where users can mix both
approaches. Maybe keep crossover probability as constant but mutation as
adaptive.

[...]
>> >(3)
>> >o.a.c.m.ga.utils.ChromosomeRepresentationUtils
>> >
>> >It seems to be a "mixed-bag" kind of class (that is being frowned
>> >upon nowadays).
>> >Its comment refers to "random" but some methods are not using
>> >any randomization.  Most methods are only used in unit tests.
>> >
>> -- Actually most methods are taken from the legacy GA application. In the
>> legacy library representation there was no separate concept for the
decoder
>> and all representation utility methods were kept in the corresponding
>> Chromosome classes. ChromosomeRepresentationUtils is created as a
>> placeholder for those utility methods.
>
>I'm not sure I follow what you mean.
>If these methods are only needed by the (legacy?) examples, they
>should be moved to the "ga-examples" module (where they can be
>removed later on without regards to BC).
-- These were adopted from the legacy model and should be useful in this
new model too.

>
>> >(4)
>> >o.a.c.m.ga.RandomProviderManager
>> >
>> >As already discussed, this class should not be part of the public
>> >API, namely because the "getRandomProvider()" method returns
>> >an object that is not thread-safe.
>> >If used internally as "syntactic sugar", it should be located in a
>> >package named "internal"; however I'd tend to remove it
>> >altogether, and call "ThreadLocalRandomSource.current(...)"
>> >explicitly.
>> >
>> -- The class o.a.c.m.ga.RandomProviderManager has been removed.
>> As per our previous discussion I have implemented the customization of
>> RandomSource at each operator level.
>
>OK.  [Is there a new PR?]
-- I shall create one.

>
>> >(5)
>> >Why does a "Chromosome" need an "identifier"?
>> >Method "getId()" is only used in "PopulationStatisticalSummaryImpl"
>> >that is an internal class, where it seems that the chromosome itself
>> >(rather than its "id") could serve as the map's key.
>> -- How can we generate a map's key from the chromosome itself? Please
>> suggest.
>
>A class to be used as a key only needs to implement "equals" and
>"hashCode".
-- The current chromosome class implements Comparable interface which uses
chromosome fitness for comparison. Use of both Comparable and equals()
might introduce inconsistencies.

>
>> >
>> >(6)
>> >o.a.c.m.ga.chromsome.AbstractChromosome
>> >
>> >Field "fitness" is not "final", yet it could be: a "FitnessFunction"
>> >object (used in "evaluate() to compute that field) is passed to the
>> >constructor.  Is there a reason for the "lazy" evaluation?
>> >Dropping it would make the instance immutable (and "evaluate()"
>> >should be renamed to "getFitness()").
>> >
>> >Why should the "FitnessFunction" be stored in every chromosome?
>> >
>> -- I have modified the fitness as final and initialized the same in the
>> constructor.
>
>Better, but did you check my proposal in MATH-1618, where
>Chromosome and fitness are decoupled, and their relationship
>is held within a "Population" instance?
--Mentioned earlier.

>
>> >(7)
>> >Spurious "@since" tags: In the new code (in "commons-math-ga"
>> >module), none should refer to a version < 4.0.
>> >
>> -- Some files are taken unchanged from the previous release. I have kept
>> the same @since tag for those files.
>> Do you need any change here?
>
>The old and new files are in different packages; "@since" tags
>thus make no sense IMO.
--I shall change it.

>
>>
>> >(8)
>> >@SuppressWarnings("unchecked")
>> >
>> >By default, I'm a bit suspicious about having to resort to these
>> annotations,
>> >especially for the kind of algorithms we are trying to implement.
>> >What do you think of the alternative approach outlined in the ZIP file
>> >attached in MATH-1618:
>> >    https://issues.apache.org/jira/browse/MATH-1618
>> >?
>> -- This annotation is required because we have kept an option to use
>> different types of genotypes including primitive.
>> Because of that our base interfaces only declares phenotype not genotype.
>> This introduced a kind of hierarchy in all operators and chromosome
classes
>> which required us to use the mentioned annotation.
>
>I may again be missing something.
>Could you please explain the case that makes these annotations
>necessary.
-- This has been only used to avoid the warning in the place of typecasting.
However, I can work to minimize this following your new model.

>
>> Even with the proposed new architecture we may not be able to avoid the
>> same.
>
>The classes which I've added do not use the annotation...
>
>> -- It will be good if you can share some more information about the newly
>> proposed architecture. The areas of current design which it can improve
as
>> well as the underlying intention.
>
>As noted in the comment on the JIRA page, the main intention is
>maximal decoupling of functionalities that make up a GA (population,
>fitness, selection, operator) and that seems achieved with the provided
>classes.
>
>> >
>> >(9)
>> >Naming of factory methods should be harmonized to match the convention
>> >adopted in components like [RNG] and [Numbers].
>> >E.g. instead of "newChromosome(...)", please use "of(...)" or
"from(...)"
>> >for "value object", and "create(...)" otherwise.
>> >
>> -- I have renamed the same for Chromosome classes.
>> What about the nextGeneration() method of ListPopulation class. Renaming
>> this to create() or from() won't convey the purpose of it.
>
>I agree, and that's why the new "Population" class (in MATH-1618) does
>not provide a factory method (see also the "GeneticAlgorithmFactory"
>class).
-- We can avoid the same in the current model if we agree to use a default
implementation of population and remove the Population interface following
your new model.

>
>> >(10)
>> >o.a.c.m.ga.chromosome.AbstractListChromosome
>> >
>> >Constructor is called with an argument that is a previously instantiated
>> >"representation".  If the latter is mutable, the caller will be able to
>> modify
>> >the underlying data structure of the newly created chromosome.  [The
>> >doc assumes immutability of the representation but this cannot be
>> >enforced, and mixed ownership can entail subtle bugs.]
>> -- I think this applies to both representation as well as generic
parameter
>> type T. But I don't see any other option but to rely on the user.
>
>The Javadoc (at line 84) is misleading in its mention of "immutable".
>
>> If you have any suggestions kindly share.
>
>I may not understand all the implications, but I'd suggest that the
>"representation" be instantiated within the control of the library (e.g.
>through a "builder"/"factory").
-- Currently we have the ChromosomeRepresentationUtils for the same. Its
methods are designed to generate the representations.

>
>> >
>> >(11)
>> >Do we agree that, in a GA, the most time-consuming task is the fitness
>> >computation?  Hence IMO, it should be the focus of the multithreading
>> >tools (i.e. "ExecutorService"), probably keeping the other parts (namely
>> >the genetic operators) within a simple sequential loop (as in class
>> >"GeneticAlgorithmFactory" in MATH-1618).
>> -- Current implementation uses separate threads for applying crossover
and
>> mutation operators for each pair of selected chromosomes.
>> I think this ensures better utilization of multi-core processors compared
>> to use of multi-threading only for the fitness calculation.
>
>I have the opposite intuition: Parallel application of the genetic
>operators would only provide marginal gains wrt the fitness
>computation.
>In any case, I think that it will be fairly easy to modify my proposed
>"OffspringGenerator" class to use an "ExecutorService" (if benchmarks
>show that a substantial gain could indeed be achieved).
>
>> -- Some codes are checked in. But there is a conflict in the pull
request.
>> So I shall create a new one and delete the old branch itself.
>
>IMHO, we could make more substantial progress if you could
>first point to issues with my proposal in MATH-1618.
--Mentioned earlier.


Thanks & Regards
--Avijit Basak


On Mon, 2 May 2022 at 03:49, Gilles Sadowski <gi...@gmail.com> wrote:

> Hello.
>
> Le dim. 1 mai 2022 à 10:07, Avijit Basak <av...@gmail.com> a écrit
> :
> >
> > Hi All
> >
> >          Please find my responses.
> >
> > >Currently; it prints everything on "stderr" (so that simple usage of
> > >the UNIX shell's "pipe" syntax is not possible).
> > -- This is the default behaviour of slf4j-simple.
> > It is possible to provide specific system properties as VM args to
> redirect
> > the log to desired location like stdout or file.
> > Kindly look into the documentation provided below:
> > https://www.slf4j.org/api/org/slf4j/impl/SimpleLogger.html
>
> If we need to document software (e.g. the "examples") produced
> by a "Commons" component, it is preferable that we use and refer
> to "Log4j2".
> [The logger API is another matter since it does not impact how the
> software is used (from a user's POV).]
>
> > >
> > >> There was a discussion
> > >> earlier on this and the decision was to use console for example
> modules
> > >> log messages.
> > >
> > >I recall that we want a separation between the logger API (library
> > >dependency) and logger implementation ("examples" dependency).
> > >
> > >Is it a feature of "slf4j-simple" to only print to "stderr" or can it
> > >be configured?
> > >There is no issue on the "examples" module to depend on a
> > >full-featured logger (primary candidate would be "Log4j2").
> > --Already mentioned earlier.
>
> We should change SimpleLogger -> Log4j2 equivalent.
>
> > >
> > >> [...]
> > >
> > >Also, IMO "examples" codes should make a difference
> > >between "logging" (optional information about internals)
> > >and "output" (mandatory result).
> > >For example, the TSP application result (itinerary and travel
> > >distance) could be saved in a file whose name is given as
> > >argument to a mandatory "--output" command-line option, as
> > >in module "commons-math-examples/examples-sofm/tsp".
> > --I have made the changes to accommodate the --output as input argument.
>
> Thanks.
>
> > >
> > >> >
> > >> >There is still a mix between library code and application code (but
> > >> >this is to be discussed in MATH-1643.
> > >> -- Kindly update the JIRA with your suggestions.
> > >
> > >Sure, we'll come to that.  But I think we should first converge to
> > >a codebase that can be merged to "master" as a replacement of
> > >the "o.a.c.math4.genetics" package (to be removed).
> > >Do you confirm that all the unit tests in that package have an
> > >equivalent in the new "commons-math-ga" module?
> > -- I have accommodated mostly. But some are altered due to changes in
> > design. But it is good to have a review of the same.
>
> Doing manually will be too tedious.
> If you are reasonably confident that you've ported everything
> valuable, I guess that we'll rely on coverage tools...
>
> > >
> > >> >From browsing the library code, I'm tempted to believe that the
> > >> >dependency towards a logging framework is not necessary (or
> > >> >underused).
> > >> -- Do you mean dependency on the slf4j?
> > >
> > >Yes.
> > >
> > >>   I think that such a feature could be left to the application
> > >> >layer (per the "ConvergenceListener" registry).
> > >> -- ConvergenceListener is only to log the nature of convergence,
> > >> not internal details of all operators. Currently we have a very low
> > >> number of log messages which can be increased later.
> > >
> > >"Increased later" for what purpose?
> > >Logging can be very useful (e.g. for debugging); however once
> > >this new implementation is trusted to behave correctly, I don't see
> > >that additional logging statements inside the library will be very
> > >useful; they would rather be needed by the application in order to
> > >e.g. display the content of the current "Population".  The "data" (to
> > >be eventually displayed) would be provided by the library through
> > >regular accessors.
> > >Another case in point is that the library cannot know what display
> > >format is suitable for the application (e.g. a GUI).
> > >
> > >If we really need (not clear as of the current codebase) to provide
> > >runtime access to other parts of the library (operators, etc.), why
> > >not generalise the "observer" pattern?
> > -- IMHO fine grain log statements are a better alternative than operator
> > level observer to convey internal behavior of operators for the specific
> > application.
> > Use of an observer for each operator is a bit unnecessary.
>
> The current log statements could also be construed as (totally)
> unnecessary, as they merely document statements which I would
> consider part of an "application", not the GA "library".
> I believe that we do not agree yet on where to draw the line (my
> proposal in MATH-1618 is related to that difference of opinion).
>
> > >> >Likewise, the "PopulationStatisticsLogger" is not general enough to
> > >> >be worth being part of the library.
> > >> -- I would love to keep this simple implementation of convergence
> > >> listener. This can provide a very basic overview on the nature of
> > >> convergence.
> > >
> > >As a user, you'll be able to provide the feature to your application
> > >with less than 10 lines of code (by registering a "listener").
> > >Everyone else might want a slightly different output (contents and/or
> > >format) than what this class generates.
> > -- Users would always have the freedom to register their own listener.
> > PopulationStatisticsLogger provides only a very basic option to track the
> > population statistics during the convergence process.
> > This is never a very generic solution to meet the needs of all users.
>
> AFAICT, the "PopulationStatisticsLogger" class satisfies *your* needs
> (as a user), but as developers we should only include functionality that
> is broadly useful.  I think I provided arguments that it is not the case of
> that class.
>
> > >Yes.
> > >Not everybody follows strict rules, and certainly not all "Commons"
> > >components have the same rules (unfortunately), but since it was
> > >decided that this module belongs in "Commons Math", I'd like to
> > >minimize inconsistent coding styles (converging on what is used in
> > >non-"legacy" source files).
> > >
> > >> This would
> > >> initiate changes in almost all classes.
> > >
> > >Do you mean "all classes" in the GA module?
> > >Then so be it. ;-)
> > -- I shall start to work on this.
>
> Great!
>
> > >> [...]
> > >> >In "AdaptiveGeneticAlgorithm":
> > >> >* There should be a single constructor (same remark as above).
> > >> -- Removed the constructor with default argument.
> > >>
> > >> >* Why the use of reflection ("isAssignableFrom")?
> > >> -- Replaced it by instanceof.
> > >
> > >Marginally better ;-) it still does not say why the statistics is
> disabled
> > >depending on the operator type...
> > --Statistics is required only for the adaptive probability generation.
> > In case application users choose to use a constant probability generation
> > scheme there is no use of this computation.
> > I shall document the same.
>
> Maybe I'm still missing something, but IMHO the distinction between
> adaptive vs not should not impact code at that level.
>
> > >
> > >(1)
> > >o.a.c.m.ga.GeneticAlgorithmTestPermutations
> > >(under "src/test")
> > >
> > >As per your comment in that class, it is a usage example.
> > >Given that its name does not end with "Test", it is not run by the
> > >test suite.  Please move it to the "examples" module.
> > -- This was taken from legacy GA. There is another example of binary
> > genotype called onemax. We already have examples in the "examples-ga"
> > module.
> > If we don't need these in test packages we can delete the same. But in
> that
> > case there won't be any end-to-end test case execution scenario during
> > build. Are we fine with that?
>
> I don't know.  The underlying question is whether this class is executed
> during the execution of the unit tests; if not, it doesn't make any sense
> there.  If yes, it should be documented.
> We should strive for maximum coverage (through unit tests) but "dummy"
> unit tests that don't actually test anything (and whose only purpose is to
> "mechanically" increase coverage) should be avoided IMO.
>
> > >
> > >(2)
> > >I'm missing a high-level doc that would enable a newbie to figure
> > >out what to implement in order to get going.
> > >E.g. what is the interplay between
> > > * genotype
> > > * allele
> > > * phenotype
> > > * decoder
> > > * fitness function
> > >?
> > >Several classes do not provide explanations (or links) about the
> > >concept which they represent.  For example, there is no doc about
> > >what a "RandomKeyDecoder" is, and the reason for using it (or not).
> > -- I need to work on that.
>
> That would be very useful; thanks.
>
> > >
> > >(3)
> > >o.a.c.m.ga.utils.ChromosomeRepresentationUtils
> > >
> > >It seems to be a "mixed-bag" kind of class (that is being frowned
> > >upon nowadays).
> > >Its comment refers to "random" but some methods are not using
> > >any randomization.  Most methods are only used in unit tests.
> > >
> > -- Actually most methods are taken from the legacy GA application. In the
> > legacy library representation there was no separate concept for the
> decoder
> > and all representation utility methods were kept in the corresponding
> > Chromosome classes. ChromosomeRepresentationUtils is created as a
> > placeholder for those utility methods.
>
> I'm not sure I follow what you mean.
> If these methods are only needed by the (legacy?) examples, they
> should be moved to the "ga-examples" module (where they can be
> removed later on without regards to BC).
>
> > >(4)
> > >o.a.c.m.ga.RandomProviderManager
> > >
> > >As already discussed, this class should not be part of the public
> > >API, namely because the "getRandomProvider()" method returns
> > >an object that is not thread-safe.
> > >If used internally as "syntactic sugar", it should be located in a
> > >package named "internal"; however I'd tend to remove it
> > >altogether, and call "ThreadLocalRandomSource.current(...)"
> > >explicitly.
> > >
> > -- The class o.a.c.m.ga.RandomProviderManager has been removed.
> > As per our previous discussion I have implemented the customization of
> > RandomSource at each operator level.
>
> OK.  [Is there a new PR?]
>
> > >(5)
> > >Why does a "Chromosome" need an "identifier"?
> > >Method "getId()" is only used in "PopulationStatisticalSummaryImpl"
> > >that is an internal class, where it seems that the chromosome itself
> > >(rather than its "id") could serve as the map's key.
> > -- How can we generate a map's key from the chromosome itself? Please
> > suggest.
>
> A class to be used as a key only needs to implement "equals" and
> "hashCode".
>
> > >
> > >(6)
> > >o.a.c.m.ga.chromsome.AbstractChromosome
> > >
> > >Field "fitness" is not "final", yet it could be: a "FitnessFunction"
> > >object (used in "evaluate() to compute that field) is passed to the
> > >constructor.  Is there a reason for the "lazy" evaluation?
> > >Dropping it would make the instance immutable (and "evaluate()"
> > >should be renamed to "getFitness()").
> > >
> > >Why should the "FitnessFunction" be stored in every chromosome?
> > >
> > -- I have modified the fitness as final and initialized the same in the
> > constructor.
>
> Better, but did you check my proposal in MATH-1618, where
> Chromosome and fitness are decoupled, and their relationship
> is held within a "Population" instance?
>
> > >(7)
> > >Spurious "@since" tags: In the new code (in "commons-math-ga"
> > >module), none should refer to a version < 4.0.
> > >
> > -- Some files are taken unchanged from the previous release. I have kept
> > the same @since tag for those files.
> > Do you need any change here?
>
> The old and new files are in different packages; "@since" tags
> thus make no sense IMO.
>
> >
> > >(8)
> > >@SuppressWarnings("unchecked")
> > >
> > >By default, I'm a bit suspicious about having to resort to these
> > annotations,
> > >especially for the kind of algorithms we are trying to implement.
> > >What do you think of the alternative approach outlined in the ZIP file
> > >attached in MATH-1618:
> > >    https://issues.apache.org/jira/browse/MATH-1618
> > >?
> > -- This annotation is required because we have kept an option to use
> > different types of genotypes including primitive.
> > Because of that our base interfaces only declares phenotype not genotype.
> > This introduced a kind of hierarchy in all operators and chromosome
> classes
> > which required us to use the mentioned annotation.
>
> I may again be missing something.
> Could you please explain the case that makes these annotations
> necessary.
>
> > Even with the proposed new architecture we may not be able to avoid the
> > same.
>
> The classes which I've added do not use the annotation...
>
> > -- It will be good if you can share some more information about the newly
> > proposed architecture. The areas of current design which it can improve
> as
> > well as the underlying intention.
>
> As noted in the comment on the JIRA page, the main intention is
> maximal decoupling of functionalities that make up a GA (population,
> fitness, selection, operator) and that seems achieved with the provided
> classes.
>
> > >
> > >(9)
> > >Naming of factory methods should be harmonized to match the convention
> > >adopted in components like [RNG] and [Numbers].
> > >E.g. instead of "newChromosome(...)", please use "of(...)" or
> "from(...)"
> > >for "value object", and "create(...)" otherwise.
> > >
> > -- I have renamed the same for Chromosome classes.
> > What about the nextGeneration() method of ListPopulation class. Renaming
> > this to create() or from() won't convey the purpose of it.
>
> I agree, and that's why the new "Population" class (in MATH-1618) does
> not provide a factory method (see also the "GeneticAlgorithmFactory"
> class).
>
> > >(10)
> > >o.a.c.m.ga.chromosome.AbstractListChromosome
> > >
> > >Constructor is called with an argument that is a previously instantiated
> > >"representation".  If the latter is mutable, the caller will be able to
> > modify
> > >the underlying data structure of the newly created chromosome.  [The
> > >doc assumes immutability of the representation but this cannot be
> > >enforced, and mixed ownership can entail subtle bugs.]
> > -- I think this applies to both representation as well as generic
> parameter
> > type T. But I don't see any other option but to rely on the user.
>
> The Javadoc (at line 84) is misleading in its mention of "immutable".
>
> > If you have any suggestions kindly share.
>
> I may not understand all the implications, but I'd suggest that the
> "representation" be instantiated within the control of the library (e.g.
> through a "builder"/"factory").
>
> > >
> > >(11)
> > >Do we agree that, in a GA, the most time-consuming task is the fitness
> > >computation?  Hence IMO, it should be the focus of the multithreading
> > >tools (i.e. "ExecutorService"), probably keeping the other parts (namely
> > >the genetic operators) within a simple sequential loop (as in class
> > >"GeneticAlgorithmFactory" in MATH-1618).
> > -- Current implementation uses separate threads for applying crossover
> and
> > mutation operators for each pair of selected chromosomes.
> > I think this ensures better utilization of multi-core processors compared
> > to use of multi-threading only for the fitness calculation.
>
> I have the opposite intuition: Parallel application of the genetic
> operators would only provide marginal gains wrt the fitness
> computation.
> In any case, I think that it will be fairly easy to modify my proposed
> "OffspringGenerator" class to use an "ExecutorService" (if benchmarks
> show that a substantial gain could indeed be achieved).
>
> > -- Some codes are checked in. But there is a conflict in the pull
> request.
> > So I shall create a new one and delete the old branch itself.
>
> IMHO, we could make more substantial progress if you could
> first point to issues with my proposal in MATH-1618.
>
> Thanks,
> Gilles
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

-- 
Avijit Basak

Re: [Math] Review of "genetic algorithm" module

Posted by Gilles Sadowski <gi...@gmail.com>.
Hello.

Le dim. 1 mai 2022 à 10:07, Avijit Basak <av...@gmail.com> a écrit :
>
> Hi All
>
>          Please find my responses.
>
> >Currently; it prints everything on "stderr" (so that simple usage of
> >the UNIX shell's "pipe" syntax is not possible).
> -- This is the default behaviour of slf4j-simple.
> It is possible to provide specific system properties as VM args to redirect
> the log to desired location like stdout or file.
> Kindly look into the documentation provided below:
> https://www.slf4j.org/api/org/slf4j/impl/SimpleLogger.html

If we need to document software (e.g. the "examples") produced
by a "Commons" component, it is preferable that we use and refer
to "Log4j2".
[The logger API is another matter since it does not impact how the
software is used (from a user's POV).]

> >
> >> There was a discussion
> >> earlier on this and the decision was to use console for example modules
> >> log messages.
> >
> >I recall that we want a separation between the logger API (library
> >dependency) and logger implementation ("examples" dependency).
> >
> >Is it a feature of "slf4j-simple" to only print to "stderr" or can it
> >be configured?
> >There is no issue on the "examples" module to depend on a
> >full-featured logger (primary candidate would be "Log4j2").
> --Already mentioned earlier.

We should change SimpleLogger -> Log4j2 equivalent.

> >
> >> [...]
> >
> >Also, IMO "examples" codes should make a difference
> >between "logging" (optional information about internals)
> >and "output" (mandatory result).
> >For example, the TSP application result (itinerary and travel
> >distance) could be saved in a file whose name is given as
> >argument to a mandatory "--output" command-line option, as
> >in module "commons-math-examples/examples-sofm/tsp".
> --I have made the changes to accommodate the --output as input argument.

Thanks.

> >
> >> >
> >> >There is still a mix between library code and application code (but
> >> >this is to be discussed in MATH-1643.
> >> -- Kindly update the JIRA with your suggestions.
> >
> >Sure, we'll come to that.  But I think we should first converge to
> >a codebase that can be merged to "master" as a replacement of
> >the "o.a.c.math4.genetics" package (to be removed).
> >Do you confirm that all the unit tests in that package have an
> >equivalent in the new "commons-math-ga" module?
> -- I have accommodated mostly. But some are altered due to changes in
> design. But it is good to have a review of the same.

Doing manually will be too tedious.
If you are reasonably confident that you've ported everything
valuable, I guess that we'll rely on coverage tools...

> >
> >> >From browsing the library code, I'm tempted to believe that the
> >> >dependency towards a logging framework is not necessary (or
> >> >underused).
> >> -- Do you mean dependency on the slf4j?
> >
> >Yes.
> >
> >>   I think that such a feature could be left to the application
> >> >layer (per the "ConvergenceListener" registry).
> >> -- ConvergenceListener is only to log the nature of convergence,
> >> not internal details of all operators. Currently we have a very low
> >> number of log messages which can be increased later.
> >
> >"Increased later" for what purpose?
> >Logging can be very useful (e.g. for debugging); however once
> >this new implementation is trusted to behave correctly, I don't see
> >that additional logging statements inside the library will be very
> >useful; they would rather be needed by the application in order to
> >e.g. display the content of the current "Population".  The "data" (to
> >be eventually displayed) would be provided by the library through
> >regular accessors.
> >Another case in point is that the library cannot know what display
> >format is suitable for the application (e.g. a GUI).
> >
> >If we really need (not clear as of the current codebase) to provide
> >runtime access to other parts of the library (operators, etc.), why
> >not generalise the "observer" pattern?
> -- IMHO fine grain log statements are a better alternative than operator
> level observer to convey internal behavior of operators for the specific
> application.
> Use of an observer for each operator is a bit unnecessary.

The current log statements could also be construed as (totally)
unnecessary, as they merely document statements which I would
consider part of an "application", not the GA "library".
I believe that we do not agree yet on where to draw the line (my
proposal in MATH-1618 is related to that difference of opinion).

> >> >Likewise, the "PopulationStatisticsLogger" is not general enough to
> >> >be worth being part of the library.
> >> -- I would love to keep this simple implementation of convergence
> >> listener. This can provide a very basic overview on the nature of
> >> convergence.
> >
> >As a user, you'll be able to provide the feature to your application
> >with less than 10 lines of code (by registering a "listener").
> >Everyone else might want a slightly different output (contents and/or
> >format) than what this class generates.
> -- Users would always have the freedom to register their own listener.
> PopulationStatisticsLogger provides only a very basic option to track the
> population statistics during the convergence process.
> This is never a very generic solution to meet the needs of all users.

AFAICT, the "PopulationStatisticsLogger" class satisfies *your* needs
(as a user), but as developers we should only include functionality that
is broadly useful.  I think I provided arguments that it is not the case of
that class.

> >Yes.
> >Not everybody follows strict rules, and certainly not all "Commons"
> >components have the same rules (unfortunately), but since it was
> >decided that this module belongs in "Commons Math", I'd like to
> >minimize inconsistent coding styles (converging on what is used in
> >non-"legacy" source files).
> >
> >> This would
> >> initiate changes in almost all classes.
> >
> >Do you mean "all classes" in the GA module?
> >Then so be it. ;-)
> -- I shall start to work on this.

Great!

> >> [...]
> >> >In "AdaptiveGeneticAlgorithm":
> >> >* There should be a single constructor (same remark as above).
> >> -- Removed the constructor with default argument.
> >>
> >> >* Why the use of reflection ("isAssignableFrom")?
> >> -- Replaced it by instanceof.
> >
> >Marginally better ;-) it still does not say why the statistics is disabled
> >depending on the operator type...
> --Statistics is required only for the adaptive probability generation.
> In case application users choose to use a constant probability generation
> scheme there is no use of this computation.
> I shall document the same.

Maybe I'm still missing something, but IMHO the distinction between
adaptive vs not should not impact code at that level.

> >
> >(1)
> >o.a.c.m.ga.GeneticAlgorithmTestPermutations
> >(under "src/test")
> >
> >As per your comment in that class, it is a usage example.
> >Given that its name does not end with "Test", it is not run by the
> >test suite.  Please move it to the "examples" module.
> -- This was taken from legacy GA. There is another example of binary
> genotype called onemax. We already have examples in the "examples-ga"
> module.
> If we don't need these in test packages we can delete the same. But in that
> case there won't be any end-to-end test case execution scenario during
> build. Are we fine with that?

I don't know.  The underlying question is whether this class is executed
during the execution of the unit tests; if not, it doesn't make any sense
there.  If yes, it should be documented.
We should strive for maximum coverage (through unit tests) but "dummy"
unit tests that don't actually test anything (and whose only purpose is to
"mechanically" increase coverage) should be avoided IMO.

> >
> >(2)
> >I'm missing a high-level doc that would enable a newbie to figure
> >out what to implement in order to get going.
> >E.g. what is the interplay between
> > * genotype
> > * allele
> > * phenotype
> > * decoder
> > * fitness function
> >?
> >Several classes do not provide explanations (or links) about the
> >concept which they represent.  For example, there is no doc about
> >what a "RandomKeyDecoder" is, and the reason for using it (or not).
> -- I need to work on that.

That would be very useful; thanks.

> >
> >(3)
> >o.a.c.m.ga.utils.ChromosomeRepresentationUtils
> >
> >It seems to be a "mixed-bag" kind of class (that is being frowned
> >upon nowadays).
> >Its comment refers to "random" but some methods are not using
> >any randomization.  Most methods are only used in unit tests.
> >
> -- Actually most methods are taken from the legacy GA application. In the
> legacy library representation there was no separate concept for the decoder
> and all representation utility methods were kept in the corresponding
> Chromosome classes. ChromosomeRepresentationUtils is created as a
> placeholder for those utility methods.

I'm not sure I follow what you mean.
If these methods are only needed by the (legacy?) examples, they
should be moved to the "ga-examples" module (where they can be
removed later on without regards to BC).

> >(4)
> >o.a.c.m.ga.RandomProviderManager
> >
> >As already discussed, this class should not be part of the public
> >API, namely because the "getRandomProvider()" method returns
> >an object that is not thread-safe.
> >If used internally as "syntactic sugar", it should be located in a
> >package named "internal"; however I'd tend to remove it
> >altogether, and call "ThreadLocalRandomSource.current(...)"
> >explicitly.
> >
> -- The class o.a.c.m.ga.RandomProviderManager has been removed.
> As per our previous discussion I have implemented the customization of
> RandomSource at each operator level.

OK.  [Is there a new PR?]

> >(5)
> >Why does a "Chromosome" need an "identifier"?
> >Method "getId()" is only used in "PopulationStatisticalSummaryImpl"
> >that is an internal class, where it seems that the chromosome itself
> >(rather than its "id") could serve as the map's key.
> -- How can we generate a map's key from the chromosome itself? Please
> suggest.

A class to be used as a key only needs to implement "equals" and
"hashCode".

> >
> >(6)
> >o.a.c.m.ga.chromsome.AbstractChromosome
> >
> >Field "fitness" is not "final", yet it could be: a "FitnessFunction"
> >object (used in "evaluate() to compute that field) is passed to the
> >constructor.  Is there a reason for the "lazy" evaluation?
> >Dropping it would make the instance immutable (and "evaluate()"
> >should be renamed to "getFitness()").
> >
> >Why should the "FitnessFunction" be stored in every chromosome?
> >
> -- I have modified the fitness as final and initialized the same in the
> constructor.

Better, but did you check my proposal in MATH-1618, where
Chromosome and fitness are decoupled, and their relationship
is held within a "Population" instance?

> >(7)
> >Spurious "@since" tags: In the new code (in "commons-math-ga"
> >module), none should refer to a version < 4.0.
> >
> -- Some files are taken unchanged from the previous release. I have kept
> the same @since tag for those files.
> Do you need any change here?

The old and new files are in different packages; "@since" tags
thus make no sense IMO.

>
> >(8)
> >@SuppressWarnings("unchecked")
> >
> >By default, I'm a bit suspicious about having to resort to these
> annotations,
> >especially for the kind of algorithms we are trying to implement.
> >What do you think of the alternative approach outlined in the ZIP file
> >attached in MATH-1618:
> >    https://issues.apache.org/jira/browse/MATH-1618
> >?
> -- This annotation is required because we have kept an option to use
> different types of genotypes including primitive.
> Because of that our base interfaces only declares phenotype not genotype.
> This introduced a kind of hierarchy in all operators and chromosome classes
> which required us to use the mentioned annotation.

I may again be missing something.
Could you please explain the case that makes these annotations
necessary.

> Even with the proposed new architecture we may not be able to avoid the
> same.

The classes which I've added do not use the annotation...

> -- It will be good if you can share some more information about the newly
> proposed architecture. The areas of current design which it can improve as
> well as the underlying intention.

As noted in the comment on the JIRA page, the main intention is
maximal decoupling of functionalities that make up a GA (population,
fitness, selection, operator) and that seems achieved with the provided
classes.

> >
> >(9)
> >Naming of factory methods should be harmonized to match the convention
> >adopted in components like [RNG] and [Numbers].
> >E.g. instead of "newChromosome(...)", please use "of(...)" or "from(...)"
> >for "value object", and "create(...)" otherwise.
> >
> -- I have renamed the same for Chromosome classes.
> What about the nextGeneration() method of ListPopulation class. Renaming
> this to create() or from() won't convey the purpose of it.

I agree, and that's why the new "Population" class (in MATH-1618) does
not provide a factory method (see also the "GeneticAlgorithmFactory"
class).

> >(10)
> >o.a.c.m.ga.chromosome.AbstractListChromosome
> >
> >Constructor is called with an argument that is a previously instantiated
> >"representation".  If the latter is mutable, the caller will be able to
> modify
> >the underlying data structure of the newly created chromosome.  [The
> >doc assumes immutability of the representation but this cannot be
> >enforced, and mixed ownership can entail subtle bugs.]
> -- I think this applies to both representation as well as generic parameter
> type T. But I don't see any other option but to rely on the user.

The Javadoc (at line 84) is misleading in its mention of "immutable".

> If you have any suggestions kindly share.

I may not understand all the implications, but I'd suggest that the
"representation" be instantiated within the control of the library (e.g.
through a "builder"/"factory").

> >
> >(11)
> >Do we agree that, in a GA, the most time-consuming task is the fitness
> >computation?  Hence IMO, it should be the focus of the multithreading
> >tools (i.e. "ExecutorService"), probably keeping the other parts (namely
> >the genetic operators) within a simple sequential loop (as in class
> >"GeneticAlgorithmFactory" in MATH-1618).
> -- Current implementation uses separate threads for applying crossover and
> mutation operators for each pair of selected chromosomes.
> I think this ensures better utilization of multi-core processors compared
> to use of multi-threading only for the fitness calculation.

I have the opposite intuition: Parallel application of the genetic
operators would only provide marginal gains wrt the fitness
computation.
In any case, I think that it will be fairly easy to modify my proposed
"OffspringGenerator" class to use an "ExecutorService" (if benchmarks
show that a substantial gain could indeed be achieved).

> -- Some codes are checked in. But there is a conflict in the pull request.
> So I shall create a new one and delete the old branch itself.

IMHO, we could make more substantial progress if you could
first point to issues with my proposal in MATH-1618.

Thanks,
Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org