You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Avijit Basak <av...@gmail.com> on 2021/10/18 07:35:22 UTC

Re: [MATH][GENETICS] Review of PR #197

Hi All

        I have created PR#197 as mentioned earlier. Kindly let me know if
there is any concern or comments.
        I have created another *PR#199* consisting of the changes with
adaptive probability generations. Kindly review the same. The build has
failed due to a spot bug issue as mentioned below.

[ERROR] Medium: Public static
org.apache.commons.math4.ga.listener.ConvergenceListenerRegistry.getInstance()
may expose internal representation by returning
ConvergenceListenerRegistry.INSTANCE
[org.apache.commons.math4.ga.listener.ConvergenceListenerRegistry] At
ConvergenceListenerRegistry.java:[line 89] MS_EXPOSE_REP

        However, the same code in my previous PR(#197) was
built successfully. I am not sure how to resolve this issue. Any help will
be appreciated.

Thanks & Regards
--Avijit Basak

On Mon, 27 Sept 2021 at 18:21, Avijit Basak <av...@gmail.com> wrote:

> Hi All
>
>          I have created the *PR #197* consisting of changes for JIRA
> MATH-1563, Task MATH-1618. This is the primary work to standardize the
> design of GA module. The build has passed. I would like to request a review
> of the PR. Once the primary design is standardized I can check in further
> changes like introduction of adaptive model and data structure change for
> Binary chromosomes.
>          Kindly let me know for any concerns or queries.
>
> Thanks & Regards
> -- Avijit Basak
>


-- 
Avijit Basak

Re: [MATH][GENETICS] Review of PR #197

Posted by Alex Herbert <al...@gmail.com>.
Hi,

SpotBugs was recently updated from 3.1.1 to 4.4.2 and it identifies a
lot more exposure references of this type You may have to add
exclusions to the file:

src/main/resources/spotbugs/spotbugs-exclude-filter.xml

You can run the checks using:

mvn spotbugs:check

Regards,

Alex

On Mon, 18 Oct 2021 at 08:35, Avijit Basak <av...@gmail.com> wrote:
>
> Hi All
>
>         I have created PR#197 as mentioned earlier. Kindly let me know if
> there is any concern or comments.
>         I have created another *PR#199* consisting of the changes with
> adaptive probability generations. Kindly review the same. The build has
> failed due to a spot bug issue as mentioned below.
>
> [ERROR] Medium: Public static
> org.apache.commons.math4.ga.listener.ConvergenceListenerRegistry.getInstance()
> may expose internal representation by returning
> ConvergenceListenerRegistry.INSTANCE
> [org.apache.commons.math4.ga.listener.ConvergenceListenerRegistry] At
> ConvergenceListenerRegistry.java:[line 89] MS_EXPOSE_REP
>
>         However, the same code in my previous PR(#197) was
> built successfully. I am not sure how to resolve this issue. Any help will
> be appreciated.
>
> Thanks & Regards
> --Avijit Basak
>
> On Mon, 27 Sept 2021 at 18:21, Avijit Basak <av...@gmail.com> wrote:
>
> > Hi All
> >
> >          I have created the *PR #197* consisting of changes for JIRA
> > MATH-1563, Task MATH-1618. This is the primary work to standardize the
> > design of GA module. The build has passed. I would like to request a review
> > of the PR. Once the primary design is standardized I can check in further
> > changes like introduction of adaptive model and data structure change for
> > Binary chromosomes.
> >          Kindly let me know for any concerns or queries.
> >
> > Thanks & Regards
> > -- Avijit Basak
> >
>
>
> --
> Avijit Basak

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [MATH][GENETICS] Review of PR #197

Posted by Alex Herbert <al...@gmail.com>.
On Wed, 20 Oct 2021 at 07:47, Avijit Basak <av...@gmail.com> wrote:

> (B)
> I'm confused by your defining "legacy" packages in new modules...
> --This is kept for comparison purposes between the legacy and the new
> implementation of GA.

You should be able to compare to the old implementation by including
Commons Math 3 as a test dependency (assuming this is for testing).
The base package is math3 and not math4 so you can import both.

Alex

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [MATH][GENETICS] Review of PR #197

Posted by Gilles Sadowski <gi...@gmail.com>.
Hello.

Le mar. 9 nov. 2021 à 13:05, Avijit Basak <av...@gmail.com> a écrit :
>
> Hi All
>
> >Depending on released code (e.g. version 3.6.1 of Commons Math)
> >is fine too for the "main" codes of the "examples" module.
> >Setting up all the "policies" (mutation, crossover, ...) must be done
> >anyways, by the user; the above factories just add an argument to
> >be passed at instantiation.
>
> >> Separate RandomSource for each place may be
> >> redundant.
>
> >I'm not sure what you mean by "redundant".
> >Since the RNG instances are not thread-safe, either separate sources
> >*must* be used or the synchronization must be handled separately (as
> >done for the "ThreadLocalRandomSource" possibly with a performance
> >loss).
>
> >> I also believe the system should provide a default option for
> >> RandomSource instead of completely depending on the choice of users.
>
> >Why?
> >This is a library, and the RNG should be viewed as an input (user's choice
> >to be made at the application level).
> >There are multiple problems (already noted previously):
> >1. It hardcodes a specific "default" source (whereas the GA functionality
> >is "agnostic" about which source is actually used).
> >2. The RNG instance is shared among all the classes that need it (which
> >makes access unnecessarily slower).
> >3. The static field "randomSource" is mutable; this is looking for trouble
> >(race condition).
>
> --The way I tried to design this is that GA functionality should likely be
> agnostic to the rng feature. If we accept the randomsource as a parameter
> to GA operators it would likely mandate the users having knowledge about
> rng.

Usage of [RNG] is pretty obvious.

> Given the vast amount of RandomSources I am not sure how many options
> will be really considered by users and mostly might go for JDK source.

This is the only one they should *not* use!

> This
> will increase the learning curve of users as well as a bit of complexity of
> API.

IIUC, there is already a significant increase in the API complexity wrt
the previous implementation.
Moreover we could add another default method to interface:
---CUT---
public interface MutationPolicy<P> {
    Chromosome<P> mutate(Chromosome<P> original, double mutationRate);

    interface Factory<P> {
        /**
         * Creates an instance with a dedicated source of randomness.
         *
         * @param rng RNG algorithm.
         * @param seed Seed.
         * @return an instance that must <em>not</em> be shared among threads.
         */
        MutationPolicy<P> create(RandomSource rng, Object seed);

        default MutationPolicy<P> create(RandomSource rng) {
            return create(rng, null);
        }
        default MutationPolicy<P> create() {
            return create(RandomSource.SPLIT_MIX_64);
        }
    }
}
---CUT---

> Customization of any operators will mandate the implementation of the
> factory as well.

What kind of customization are you referring to?
I'd have thought that this library would provide all the common operators
for all the commonly used genotypes.

> [IMHO] The users do not need to choose the RandomSource for each operator
> separately.

With the above default method, they won't need to.
[The "create()" method won't specify the actual algorithm being used,
so that we can change it at any time.  We could even imagine it to be
randomly chosen at instantiation (similar to how the seeds can be
auto-generated in "RandomSource").]

> That is the reason I proposed the use of customization option
> in the RandomNumberGenerator class itself. But this is also true that it
> cannot mandate the configuration only once. In case that is configured
> inside any custom operator then that might result in race conditions once
> we go for parallel implementation.

IMO, this would be a plain bug.

> [IMHO] We really don't need users to customize RandomSource for each and
> every operator or for the application. Can we stick to the previous
> implementation and remove the configure(RandomSource rng) method of
> RandomNumberGenerator class? Kindly share your thoughts.

I did, above.
Hopefully we can get other opinions as to what is the better design choice.

Gilles

>>> [...]

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [MATH][GENETICS] Review of PR #197

Posted by Avijit Basak <av...@gmail.com>.
Hi All

>Depending on released code (e.g. version 3.6.1 of Commons Math)
>is fine too for the "main" codes of the "examples" module.
>Setting up all the "policies" (mutation, crossover, ...) must be done
>anyways, by the user; the above factories just add an argument to
>be passed at instantiation.

>> Separate RandomSource for each place may be
>> redundant.

>I'm not sure what you mean by "redundant".
>Since the RNG instances are not thread-safe, either separate sources
>*must* be used or the synchronization must be handled separately (as
>done for the "ThreadLocalRandomSource" possibly with a performance
>loss).

>> I also believe the system should provide a default option for
>> RandomSource instead of completely depending on the choice of users.

>Why?
>This is a library, and the RNG should be viewed as an input (user's choice
>to be made at the application level).
>There are multiple problems (already noted previously):
>1. It hardcodes a specific "default" source (whereas the GA functionality
>is "agnostic" about which source is actually used).
>2. The RNG instance is shared among all the classes that need it (which
>makes access unnecessarily slower).
>3. The static field "randomSource" is mutable; this is looking for trouble
>(race condition).

--The way I tried to design this is that GA functionality should likely be
agnostic to the rng feature. If we accept the randomsource as a parameter
to GA operators it would likely mandate the users having knowledge about
rng. Given the vast amount of RandomSources I am not sure how many options
will be really considered by users and mostly might go for JDK source. This
will increase the learning curve of users as well as a bit of complexity of
API. Customization of any operators will mandate the implementation of the
factory as well.
[IMHO] The users do not need to choose the RandomSource for each operator
separately. That is the reason I proposed the use of customization option
in the RandomNumberGenerator class itself. But this is also true that it
cannot mandate the configuration only once. In case that is configured
inside any custom operator then that might result in race conditions once
we go for parallel implementation.
[IMHO] We really don't need users to customize RandomSource for each and
every operator or for the application. Can we stick to the previous
implementation and remove the configure(RandomSource rng) method of
RandomNumberGenerator class? Kindly share your thoughts.

Thanks & Regards
--Avijit Basak

On Tue, 9 Nov 2021 at 00:11, Gilles Sadowski <gi...@gmail.com> wrote:

> Hello.
>
> Side-note: Please try to not remove quotes from previous emails,
> as it leads to a confusing sequence (e.g. things I wrote previously
> now appear below as quotes from your last message).
>
> Le dim. 7 nov. 2021 à 10:34, Avijit Basak <av...@gmail.com> a
> écrit :
> >
> > Hi All
> >
> >          Please find my comments below:
> >
> >
> > [...]
> >
> > > >
> > > > (B)
> > > > I'm confused by your defining "legacy" packages in new modules...
> > > > What kind of comparisons are you considering?
> > > > It is fine to depend (with scope "test") on CM v3.6.1 to perhaps (?)
> > > > regression testing; but please note that when your proposal is
> > > > merged, it will imply that the "legacy" codes *must* be removed.
> > > > [We don't want to keep duplicate functionality.]
> > > > -- The new implementation has improved the quality of optimization
> over
> > > the
> > > > legacy model.
> > >
> > > "Improved" in what sense?
> > > If you mean enhanced performance, such checks should be done
> > > using JMH (producing data to be published on the web site).
> > >
> > > --Along with performance and memory utilization, stochastic algorithms
> > have
> > > another comparison parameter "quality of result". In stochastic
> > algorithms,
> > > global optimum is not guaranteed, We have to compare the quality of the
> > > result along with performance and memory consumption to compare two
> > > algorithm implementations. I have kept the legacy example just for
> > > comparison between the new and the legacy implementations.
> >
> > Great that you take care of checking improvements on the quality
> > measures.  Just make sure that the new code do not depend on
> > anything in module "commons-math-legacy".  As noted earlier, a
> > dependency on a previous release of CM with
> >   <scope>test</scope>
> > is fine.
> >
> > --test scope can only be used for test packages. However, this is kept in
> > src of example module. Changing previous release's dependency scope to
> test
> > is showing compilation error in src package. I am not sure how to manage
> > this.
>
> Depending on released code (e.g. version 3.6.1 of Commons Math)
> is fine too for the "main" codes of the "examples" module.
>
> > >
> > > Also (if no objections are made):
> > > 1. the declaration should go in the main POM (in the
> > <dependencyManagement>
> > >     section).
> > > 2. the library (CM) must not depend on a specific logger
> implementation.
> > > --Do you mean I should also remove the simple implementation
> > "slf4j-simple"
> > > in commons-math-ga.
> >
> > Yes.
> > Specific implementations a user's choice that could be different in
> > each of the "examples" modules.
> >
> > --Done.
> >
> > > [...]
> > > >
> > > > * Class "RandomGenerator"
> > > > 1. Duplicates functionality (storage of thread-local instances)
> > > > readily available in "Commons RNG".
> > > > 2. (IMHO) Thread-local instances should not be used for "heavy" usage
> > > > (like in GA).
> > > > --Modified
> > >
> > > You still use "ThreadLocalRandomSource".
> > > I think that if some functionality requires a RNG instance it should be
> > > passed to its constructor (or created by it).
> > >
> > > I noticed that the interface "ChromosomeRepresentationUtils"
> > > is actually a utility class (as the name indicates): It only contains
> > > "static" functions.  This will probably require refactoring (utility
> > > classes are frowned upon nowadays).
> > > Some functionality in that class (e.g. sampling from a range of
> > > integers) exists in the "sampling" module of "Commons RNG".
> > >
> > > --Will look into this.
> >
> > I understand that "hiding" the access to the RNG functionality makes
> > it for a seemingly nicer API, but I'm not sure that we should do it given
> > that the "random" aspect is a fundamental, and explicit, part of the GA
> > algorithm.
> > How about introducing "factories" to create the objects that use a RNG,
> > giving them their "own" RNG instance.  For example:
> > ---CUT---
> > public interface MutationPolicy<P> {
> >     Chromosome<P> mutate(Chromosome<P> original, double mutationRate);
> >
> >     interface Factory<P> {
> >         /**
> >          * Creates an instance with a dedicated source of randomness.
> >          *
> >          * @param rng RNG algorithm.
> >          * @param seed Seed.
> >          * @return an instance that must <em>not</em> be shared among
> > threads.
> >          */
> >         MutationPolicy<P> create(RandomSource rng, Object seed);
> >
> >         default MutationPolicy<P> create(RandomSource rng) {
> >             return create(rng, null);
> >         }
> >     }
> > }
> > ---CUT---
> > ?
> >
> > --[IMHO] We can give users the option of choosing RandomSource. However,
> > choosing the RandomSource in all places separately seems to be little
> > overwork for the users.
>
> Setting up all the "policies" (mutation, crossover, ...) must be done
> anyways, by the user; the above factories just add an argument to
> be passed at instantiation.
>
> > Separate RandomSource for each place may be
> > redundant.
>
> I'm not sure what you mean by "redundant".
> Since the RNG instances are not thread-safe, either separate sources
> *must* be used or the synchronization must be handled separately (as
> done for the "ThreadLocalRandomSource" possibly with a performance
> loss).
>
> > I also believe the system should provide a default option for
> > RandomSource instead of completely depending on the choice of users.
>
> Why?
> This is a library, and the RNG should be viewed as an input (user's choice
> to be made at the application level).
>
> > What
> > if we provide the feature of configuring RandomSource in our
> > RandomNumberGenerator(earlier RandomGenerator) class like this. Please
> > share your thoughts regarding this.
> >
> > --CUT--
> >     /** The default RandomSource for random number generation. **/
> >     private static RandomSource randomSource =
> > RandomSource.XO_RO_SHI_RO_128_PP;
> >
> >     /**
> >      * Sets the random source for this random generator.
> >      * @param randomSource
> >      */
> >     public static void configure(RandomSource randomSource) {
> >         RandomNumberGenerator.randomSource = randomSource;
> >     }
> >
> >     /**
> >      * constructs the singleton instance.
> >      */
> >     private RandomNumberGenerator() {
> >     }
> >
> >     /**
> >      * Returns the (static) random generator.
> >      * @return the static random generator shared by GA implementation
> > classes
> >      */
> >     public static UniformRandomProvider getRandomGenerator() {
> >         return
> > ThreadLocalRandomSource.current(RandomNumberGenerator.randomSource);
> >     }
> > --CUT--
>
> There are multiple problems (already noted previously):
> 1. It hardcodes a specific "default" source (whereas the GA functionality
> is "agnostic" about which source is actually used).
> 2. The RNG instance is shared among all the classes that need it (which
> makes access unnecessarily slower).
> 3. The static field "randomSource" is mutable; this is looking for trouble
> (race condition).
>
> > [...]
> >
> >>>> [...]
>
> Regards,
> Gilles
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

-- 
Avijit Basak

Re: [MATH][GENETICS] Review of PR #197

Posted by Gilles Sadowski <gi...@gmail.com>.
Hello.

Side-note: Please try to not remove quotes from previous emails,
as it leads to a confusing sequence (e.g. things I wrote previously
now appear below as quotes from your last message).

Le dim. 7 nov. 2021 à 10:34, Avijit Basak <av...@gmail.com> a écrit :
>
> Hi All
>
>          Please find my comments below:
>
>
> [...]
>
> > >
> > > (B)
> > > I'm confused by your defining "legacy" packages in new modules...
> > > What kind of comparisons are you considering?
> > > It is fine to depend (with scope "test") on CM v3.6.1 to perhaps (?)
> > > regression testing; but please note that when your proposal is
> > > merged, it will imply that the "legacy" codes *must* be removed.
> > > [We don't want to keep duplicate functionality.]
> > > -- The new implementation has improved the quality of optimization over
> > the
> > > legacy model.
> >
> > "Improved" in what sense?
> > If you mean enhanced performance, such checks should be done
> > using JMH (producing data to be published on the web site).
> >
> > --Along with performance and memory utilization, stochastic algorithms
> have
> > another comparison parameter "quality of result". In stochastic
> algorithms,
> > global optimum is not guaranteed, We have to compare the quality of the
> > result along with performance and memory consumption to compare two
> > algorithm implementations. I have kept the legacy example just for
> > comparison between the new and the legacy implementations.
>
> Great that you take care of checking improvements on the quality
> measures.  Just make sure that the new code do not depend on
> anything in module "commons-math-legacy".  As noted earlier, a
> dependency on a previous release of CM with
>   <scope>test</scope>
> is fine.
>
> --test scope can only be used for test packages. However, this is kept in
> src of example module. Changing previous release's dependency scope to test
> is showing compilation error in src package. I am not sure how to manage
> this.

Depending on released code (e.g. version 3.6.1 of Commons Math)
is fine too for the "main" codes of the "examples" module.

> >
> > Also (if no objections are made):
> > 1. the declaration should go in the main POM (in the
> <dependencyManagement>
> >     section).
> > 2. the library (CM) must not depend on a specific logger implementation.
> > --Do you mean I should also remove the simple implementation
> "slf4j-simple"
> > in commons-math-ga.
>
> Yes.
> Specific implementations a user's choice that could be different in
> each of the "examples" modules.
>
> --Done.
>
> > [...]
> > >
> > > * Class "RandomGenerator"
> > > 1. Duplicates functionality (storage of thread-local instances)
> > > readily available in "Commons RNG".
> > > 2. (IMHO) Thread-local instances should not be used for "heavy" usage
> > > (like in GA).
> > > --Modified
> >
> > You still use "ThreadLocalRandomSource".
> > I think that if some functionality requires a RNG instance it should be
> > passed to its constructor (or created by it).
> >
> > I noticed that the interface "ChromosomeRepresentationUtils"
> > is actually a utility class (as the name indicates): It only contains
> > "static" functions.  This will probably require refactoring (utility
> > classes are frowned upon nowadays).
> > Some functionality in that class (e.g. sampling from a range of
> > integers) exists in the "sampling" module of "Commons RNG".
> >
> > --Will look into this.
>
> I understand that "hiding" the access to the RNG functionality makes
> it for a seemingly nicer API, but I'm not sure that we should do it given
> that the "random" aspect is a fundamental, and explicit, part of the GA
> algorithm.
> How about introducing "factories" to create the objects that use a RNG,
> giving them their "own" RNG instance.  For example:
> ---CUT---
> public interface MutationPolicy<P> {
>     Chromosome<P> mutate(Chromosome<P> original, double mutationRate);
>
>     interface Factory<P> {
>         /**
>          * Creates an instance with a dedicated source of randomness.
>          *
>          * @param rng RNG algorithm.
>          * @param seed Seed.
>          * @return an instance that must <em>not</em> be shared among
> threads.
>          */
>         MutationPolicy<P> create(RandomSource rng, Object seed);
>
>         default MutationPolicy<P> create(RandomSource rng) {
>             return create(rng, null);
>         }
>     }
> }
> ---CUT---
> ?
>
> --[IMHO] We can give users the option of choosing RandomSource. However,
> choosing the RandomSource in all places separately seems to be little
> overwork for the users.

Setting up all the "policies" (mutation, crossover, ...) must be done
anyways, by the user; the above factories just add an argument to
be passed at instantiation.

> Separate RandomSource for each place may be
> redundant.

I'm not sure what you mean by "redundant".
Since the RNG instances are not thread-safe, either separate sources
*must* be used or the synchronization must be handled separately (as
done for the "ThreadLocalRandomSource" possibly with a performance
loss).

> I also believe the system should provide a default option for
> RandomSource instead of completely depending on the choice of users.

Why?
This is a library, and the RNG should be viewed as an input (user's choice
to be made at the application level).

> What
> if we provide the feature of configuring RandomSource in our
> RandomNumberGenerator(earlier RandomGenerator) class like this. Please
> share your thoughts regarding this.
>
> --CUT--
>     /** The default RandomSource for random number generation. **/
>     private static RandomSource randomSource =
> RandomSource.XO_RO_SHI_RO_128_PP;
>
>     /**
>      * Sets the random source for this random generator.
>      * @param randomSource
>      */
>     public static void configure(RandomSource randomSource) {
>         RandomNumberGenerator.randomSource = randomSource;
>     }
>
>     /**
>      * constructs the singleton instance.
>      */
>     private RandomNumberGenerator() {
>     }
>
>     /**
>      * Returns the (static) random generator.
>      * @return the static random generator shared by GA implementation
> classes
>      */
>     public static UniformRandomProvider getRandomGenerator() {
>         return
> ThreadLocalRandomSource.current(RandomNumberGenerator.randomSource);
>     }
> --CUT--

There are multiple problems (already noted previously):
1. It hardcodes a specific "default" source (whereas the GA functionality
is "agnostic" about which source is actually used).
2. The RNG instance is shared among all the classes that need it (which
makes access unnecessarily slower).
3. The static field "randomSource" is mutable; this is looking for trouble
(race condition).

> [...]
>
>>>> [...]

Regards,
Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [MATH][GENETICS] Review of PR #197

Posted by Avijit Basak <av...@gmail.com>.
Hi All

         Please find my comments below:


The convention for the log summary (first line of the log message) is to
put the issue number in front; thus, instead of
---CUT---
Developed the new genetic algorithm module following the JIRA MATH-1563.
---CUT---
it should be something like
---CUT---
MATH-1563: Introducing new genetic algorithm module.
---CUT---

--Will change this.

> >
> > (B)
> > I'm confused by your defining "legacy" packages in new modules...
> > What kind of comparisons are you considering?
> > It is fine to depend (with scope "test") on CM v3.6.1 to perhaps (?)
> > regression testing; but please note that when your proposal is
> > merged, it will imply that the "legacy" codes *must* be removed.
> > [We don't want to keep duplicate functionality.]
> > -- The new implementation has improved the quality of optimization over
> the
> > legacy model.
>
> "Improved" in what sense?
> If you mean enhanced performance, such checks should be done
> using JMH (producing data to be published on the web site).
>
> --Along with performance and memory utilization, stochastic algorithms
have
> another comparison parameter "quality of result". In stochastic
algorithms,
> global optimum is not guaranteed, We have to compare the quality of the
> result along with performance and memory consumption to compare two
> algorithm implementations. I have kept the legacy example just for
> comparison between the new and the legacy implementations.

Great that you take care of checking improvements on the quality
measures.  Just make sure that the new code do not depend on
anything in module "commons-math-legacy".  As noted earlier, a
dependency on a previous release of CM with
  <scope>test</scope>
is fine.

--test scope can only be used for test packages. However, this is kept in
src of example module. Changing previous release's dependency scope to test
is showing compilation error in src package. I am not sure how to manage
this.

>
> Also (if no objections are made):
> 1. the declaration should go in the main POM (in the
<dependencyManagement>
>     section).
> 2. the library (CM) must not depend on a specific logger implementation.
> --Do you mean I should also remove the simple implementation
"slf4j-simple"
> in commons-math-ga.

Yes.
Specific implementations a user's choice that could be different in
each of the "examples" modules.

--Done.

> [...]
> >
> > * Class "RandomGenerator"
> > 1. Duplicates functionality (storage of thread-local instances)
> > readily available in "Commons RNG".
> > 2. (IMHO) Thread-local instances should not be used for "heavy" usage
> > (like in GA).
> > --Modified
>
> You still use "ThreadLocalRandomSource".
> I think that if some functionality requires a RNG instance it should be
> passed to its constructor (or created by it).
>
> I noticed that the interface "ChromosomeRepresentationUtils"
> is actually a utility class (as the name indicates): It only contains
> "static" functions.  This will probably require refactoring (utility
> classes are frowned upon nowadays).
> Some functionality in that class (e.g. sampling from a range of
> integers) exists in the "sampling" module of "Commons RNG".
>
> --Will look into this.

I understand that "hiding" the access to the RNG functionality makes
it for a seemingly nicer API, but I'm not sure that we should do it given
that the "random" aspect is a fundamental, and explicit, part of the GA
algorithm.
How about introducing "factories" to create the objects that use a RNG,
giving them their "own" RNG instance.  For example:
---CUT---
public interface MutationPolicy<P> {
    Chromosome<P> mutate(Chromosome<P> original, double mutationRate);

    interface Factory<P> {
        /**
         * Creates an instance with a dedicated source of randomness.
         *
         * @param rng RNG algorithm.
         * @param seed Seed.
         * @return an instance that must <em>not</em> be shared among
threads.
         */
        MutationPolicy<P> create(RandomSource rng, Object seed);

        default MutationPolicy<P> create(RandomSource rng) {
            return create(rng, null);
        }
    }
}
---CUT---
?

--[IMHO] We can give users the option of choosing RandomSource. However,
choosing the RandomSource in all places separately seems to be little
overwork for the users. Separate RandomSource for each place may be
redundant. I also believe the system should provide a default option for
RandomSource instead of completely depending on the choice of users. What
if we provide the feature of configuring RandomSource in our
RandomNumberGenerator(earlier RandomGenerator) class like this. Please
share your thoughts regarding this.

--CUT--
    /** The default RandomSource for random number generation. **/
    private static RandomSource randomSource =
RandomSource.XO_RO_SHI_RO_128_PP;

    /**
     * Sets the random source for this random generator.
     * @param randomSource
     */
    public static void configure(RandomSource randomSource) {
        RandomNumberGenerator.randomSource = randomSource;
    }

    /**
     * constructs the singleton instance.
     */
    private RandomNumberGenerator() {
    }

    /**
     * Returns the (static) random generator.
     * @return the static random generator shared by GA implementation
classes
     */
    public static UniformRandomProvider getRandomGenerator() {
        return
ThreadLocalRandomSource.current(RandomNumberGenerator.randomSource);
    }
--CUT--

> >
> > * Class "ValidationUtils"
> > -> should not be public (or should be defined in an "internal" package).
> > --Changed
>
> The class actually provides no added value.
>
> --I was thinking of having a validation utility which can be reused
> everywhere. Otherwise I have to duplicate the code in all places. Do you
> think that is a good way of doing this?

Reuse is good, sure; but in this case, the very small gain (one line)
is not worth the loss of clarity (method call vs direct conditional test).

--Made changes

>> [...]
> >
> > (E) Unit tests
> > * src/test
> > 1. New tests should use Junit 5.
> > 2. "Example usages" probably belong in the "examples-ga" module.
> > --JUnit version is inherited from commons-math module.
>
> Not sure what you mean.
> It's possible to use Junit5 in new modules/test classes even if
> other classes use older Junit.
> --Do you think it is fine to have two separate versions of JUnit library
in
> CM. [IMHO] we should keep only one version only.

In the mid-term, yes.  But the point is that we must start somewhere.
And as you are creating a new tests suite, it's worth using the up-to-date
framework.  [This will also reduce the burden on the people who'll take
on updating all the other tests.]

--Done

Thanks & Regards
--Avijit Basak

On Mon, 1 Nov 2021 at 21:09, Gilles Sadowski <gi...@gmail.com> wrote:

> Hello.
>
> Le lun. 1 nov. 2021 à 08:56, Avijit Basak <av...@gmail.com> a
> écrit :
> >
> > Hi All
> >
> >         Please find my comments below:
> >
> > >
> > > Hi All
> > >
> > >         I have fixed most of the review comments. The changes have been
> > > committed to PR#199.
> > >
> > > (A)
> > > Please "rebase" on "master".
> > > Please "squash" intermediate commits: For a new feature, a single
> commit
> > > should exist (that corresponds to the JIRA report describing it).
> > > --Will be done once all changes are finalized and committed.
> >
> > What is the rationale for not doing it right now?
> > The PR should always be "rebased" on the latest "master".
> >
> > --Done both rebase and squash.
>
> Thanks.
>
> The convention for the log summary (first line of the log message) is to
> put the issue number in front; thus, instead of
> ---CUT---
> Developed the new genetic algorithm module following the JIRA MATH-1563.
> ---CUT---
> it should be something like
> ---CUT---
> MATH-1563: Introducing new genetic algorithm module.
> ---CUT---
>
> > >
> > > (B)
> > > I'm confused by your defining "legacy" packages in new modules...
> > > What kind of comparisons are you considering?
> > > It is fine to depend (with scope "test") on CM v3.6.1 to perhaps (?)
> > > regression testing; but please note that when your proposal is
> > > merged, it will imply that the "legacy" codes *must* be removed.
> > > [We don't want to keep duplicate functionality.]
> > > -- The new implementation has improved the quality of optimization over
> > the
> > > legacy model.
> >
> > "Improved" in what sense?
> > If you mean enhanced performance, such checks should be done
> > using JMH (producing data to be published on the web site).
> >
> > --Along with performance and memory utilization, stochastic algorithms
> have
> > another comparison parameter "quality of result". In stochastic
> algorithms,
> > global optimum is not guaranteed, We have to compare the quality of the
> > result along with performance and memory consumption to compare two
> > algorithm implementations. I have kept the legacy example just for
> > comparison between the new and the legacy implementations.
>
> Great that you take care of checking improvements on the quality
> measures.  Just make sure that the new code do not depend on
> anything in module "commons-math-legacy".  As noted earlier, a
> dependency on a previous release of CM with
>   <scope>test</scope>
> is fine.
>
> > [...]
>
> > >
> > > (D) General design
> > > Class "ConsoleLogger"
> > > -> We should not reinvent the wheel.  We should consider whether
> logging
> > > is necessary, and in the affirmative, depend on the de facto standard:
> > > "slf4j".
> > > -- Introduced the slf4j with simple implementation and removed the
> > > consolelogger class.
> >
> > Could you please post a separate message highlighting the
> > new dependency?
> > --I shall do it.
>
> Thanks.
>
> >
> > Also (if no objections are made):
> > 1. the declaration should go in the main POM (in the
> <dependencyManagement>
> >     section).
> > 2. the library (CM) must not depend on a specific logger implementation.
> > --Do you mean I should also remove the simple implementation
> "slf4j-simple"
> > in commons-math-ga.
>
> Yes.
> Specific implementations a user's choice that could be different in
> each of the "examples" modules.
>
> > [...]
> > >
> > > * Class "RandomGenerator"
> > > 1. Duplicates functionality (storage of thread-local instances)
> > > readily available in "Commons RNG".
> > > 2. (IMHO) Thread-local instances should not be used for "heavy" usage
> > > (like in GA).
> > > --Modified
> >
> > You still use "ThreadLocalRandomSource".
> > I think that if some functionality requires a RNG instance it should be
> > passed to its constructor (or created by it).
> >
> > I noticed that the interface "ChromosomeRepresentationUtils"
> > is actually a utility class (as the name indicates): It only contains
> > "static" functions.  This will probably require refactoring (utility
> > classes are frowned upon nowadays).
> > Some functionality in that class (e.g. sampling from a range of
> > integers) exists in the "sampling" module of "Commons RNG".
> >
> > --Will look into this.
>
> I understand that "hiding" the access to the RNG functionality makes
> it for a seemingly nicer API, but I'm not sure that we should do it given
> that the "random" aspect is a fundamental, and explicit, part of the GA
> algorithm.
> How about introducing "factories" to create the objects that use a RNG,
> giving them their "own" RNG instance.  For example:
> ---CUT---
> public interface MutationPolicy<P> {
>     Chromosome<P> mutate(Chromosome<P> original, double mutationRate);
>
>     interface Factory<P> {
>         /**
>          * Creates an instance with a dedicated source of randomness.
>          *
>          * @param rng RNG algorithm.
>          * @param seed Seed.
>          * @return an instance that must <em>not</em> be shared among
> threads.
>          */
>         MutationPolicy<P> create(RandomSource rng, Object seed);
>
>         default MutationPolicy<P> create(RandomSource rng) {
>             return create(rng, null);
>         }
>     }
> }
> ---CUT---
> ?
>
> > >
> > > * Class "ValidationUtils"
> > > -> should not be public (or should be defined in an "internal"
> package).
> > > --Changed
> >
> > The class actually provides no added value.
> >
> > --I was thinking of having a validation utility which can be reused
> > everywhere. Otherwise I have to duplicate the code in all places. Do you
> > think that is a good way of doing this?
>
> Reuse is good, sure; but in this case, the very small gain (one line)
> is not worth the loss of clarity (method call vs direct conditional test).
>
> >>> [...]
> >
> > The abstraction is quite useful, but the point is indeed that the
> > abstraction
> > is somewhat broken by exposing the List<T> as the representation.
> > I understand that the advantage is to be able to define several
> > functionalities
> > (crossover, ...) in a generic way; but is it worth the obvious drawback
> is
> > that
> > object instances must be used to represent genes?
> >
> > --Initially I also thought of removing the List<T>. But I could not find
> > any wayout for crossover and mutation operators. Those are the primary
> > facilities this library should provide to the users. If any
> implementation
> > needs to use primitive type, then the new chromosome class would be
> > extended from the AbstractChromosome class. The new Binary chromosome
> would
> > follow the same design. However, that much memory optimization would be
> > required only for problems with quite higher dimensions like neuro
> > evolution.[IMHO] For most practical purposes, List of objects might be
> > sufficient.
>
> As this is a fundamental design decision about which another
> conclusion had been some time ago, it is worth posting a
> dedicated message to the ML (to establish that this is the
> choice being made).
>
> > > > Any domain specific new chromosome implementation extending the
> > > > AbstractListChromosome class can reuse all crossover and mutation
> > > operators.
> > > > For our proposed improvement of BinaryChromosome we should be able to
> > > > extend the AbstractChromosome (*not* AbstractListChromosome) for the
> new
> > > > class and provide the dedicated crossover and mutation operators for
> the
> > > > corresponding Genotype. Without an *explicit* abstraction,
> management of
> > > > crossover and mutation operators would be difficult.
> >
> > For sure, the design would be different. But it is not obvious to me that
> > the current one is necessarily better.
> >
> > ---CUT---
> > public interface CrossoverPolicy<P> {
> >    ChromosomePair<P> crossover(Chromosome<P> first, Chromosome<P>
> > second, double crossoverRate);
> > }
> > ---CUT---
> > vs
> > ---CUT---
> > public interface Chromosome {
> >    Chromosome crossover(Chromosome other, double crossoverRate);
> > }
> > ---CUT---
> >
> > -- [IMHO] The introduction of phenotype(P) is a better design. As this
> > would enable reuse of the same chromosome genotype for both direct and
> > indirect encodings. By virtue of this we have removed the RandomKey
> > chromosome. Now RealValuedChromosome can serve the purpose with proper
> > decoder.
> > -- I have introduced crossoverRate as a parameter just to give more
> control
> > to the crossover operator. This will enable users to change any logic by
> > only changing the corresponding CrossoverPolicy. This is also in line
> with
> > changes in MutationPolicy. This will minimize any impact on the algorithm
> > class.
> >
> >
> > > > Please share further thoughts regarding this.
>
> I cannot comment yet as I didn't delve into the implementation details...
>
> > >
> > > Whether we should have a data-based (subclass of "List<T>") or
> > > behaviour-based  access to individual genes is a fundamental design
> > > decision.
> > > The "legacy" implementation chose the former but it might be worth
> > > considering an alternative (lest we'll need to support several APIs if
> > > we later come to the conclusion that we need a more compact
> > > representation for some use cases.
> > > --How are you thinking the design to be behaviour-based. It will be
> > helpful
> > > if you share some examples.
> > > The legacy was only based on data types. But the current model
> introduces
> > > the concept of phenotype along with genotype. The genotype  (List<T>)
> > > represents the actual chromosome implementation and the phenotype (<P>)
> > > represents the problem domain. A Decoder is also introduced to convert
> > > genotype to phenotype. In this way this design would be capable of
> using a
> > > single genotype for a wide variety of problem domains(phenotype)
> including
> > > both direct and indirect encoding. Do you see any challenge to this
> > current
> > > implementation?
>
> You are the expert, who proposes design changes to allow
> extended usage. ;-)
>
> > > Kindly share your thoughts.
> >
> > I admit that I did not fully think about it (but I thought that we had
> > agreed about the List<T> being a problem)...
> > --Initially I also thought of removing the List<T>. But I could not find
> > any wayout for crossover and mutation operators. Those are the primary
> > facilities this library should provide to the users.
>
> Sure.
> If the "waste" in memory is worth the simplified handling, that's
> fine (TBD following your post about, as said above).
>
> >> [...]
> > >
> > > (E) Unit tests
> > > * src/test
> > > 1. New tests should use Junit 5.
> > > 2. "Example usages" probably belong in the "examples-ga" module.
> > > --JUnit version is inherited from commons-math module.
> >
> > Not sure what you mean.
> > It's possible to use Junit5 in new modules/test classes even if
> > other classes use older Junit.
> > --Do you think it is fine to have two separate versions of JUnit library
> in
> > CM. [IMHO] we should keep only one version only.
>
> In the mid-term, yes.  But the point is that we must start somewhere.
> And as you are creating a new tests suite, it's worth using the up-to-date
> framework.  [This will also reduce the burden on the people who'll take
> on updating all the other tests.]
>
> > [...]
> > >
> > > (F) Code readability
> > > * Please write one argument per line.
> > > * Write one condition check per line.
> > > * Avoid comments with no added value (like "constructor" for a
> > constructor).
> > > * Avoid "ASCII art" (see e.g. "OnePointCrossover"); a link[2] is often
> > > preferable.
> > > * Do no duplicate documentation (see e.g. "OnePointCrossover").
> > > --I have formatted the method declaration to have one parameter in one
> > line.
> > > --Most of the if conditions are having a single condition except very
> few
> > > pre existing ones. I could not see any way to format the if statement
> in
> > > eclipse like the suggestion. I cannot introduce any formatting rule
> which
> > > cannot be handled in eclipse as that will be very hard to manage.
> >
> > ?
> > [I can't imagine that Eclipse won't let you add a newline.]
> > --I searched a little bit but could not find anything relevant. However,
> I
> > found the following reference
> >
> https://stackoverflow.com/questions/31808237/formatting-if-else-in-eclipse
> > Putting parenthesis is not an option. Let me know if you find anything
> > relevant to this.
>
> We can leave this nit for later.
>
> >
> > > --ASCII art and other crossover classes are untouched for this release.
> >
> > What do you mean by "this release"?
> > In this instance, it is easy to make the docs clearer by using a link
> > rather than ASCII figures.  If you want to argue that the latter should
> > be kept, please start a new thread in order to collect other opinions.
> > --I can start making the changes.
>
> Great; thanks.
>
> Best regards,
> Gilles
>
> >>>> [...]
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

-- 
Avijit Basak

Re: [MATH][GENETICS] Review of PR #197

Posted by Gilles Sadowski <gi...@gmail.com>.
Hello.

Le lun. 1 nov. 2021 à 08:56, Avijit Basak <av...@gmail.com> a écrit :
>
> Hi All
>
>         Please find my comments below:
>
> >
> > Hi All
> >
> >         I have fixed most of the review comments. The changes have been
> > committed to PR#199.
> >
> > (A)
> > Please "rebase" on "master".
> > Please "squash" intermediate commits: For a new feature, a single commit
> > should exist (that corresponds to the JIRA report describing it).
> > --Will be done once all changes are finalized and committed.
>
> What is the rationale for not doing it right now?
> The PR should always be "rebased" on the latest "master".
>
> --Done both rebase and squash.

Thanks.

The convention for the log summary (first line of the log message) is to
put the issue number in front; thus, instead of
---CUT---
Developed the new genetic algorithm module following the JIRA MATH-1563.
---CUT---
it should be something like
---CUT---
MATH-1563: Introducing new genetic algorithm module.
---CUT---

> >
> > (B)
> > I'm confused by your defining "legacy" packages in new modules...
> > What kind of comparisons are you considering?
> > It is fine to depend (with scope "test") on CM v3.6.1 to perhaps (?)
> > regression testing; but please note that when your proposal is
> > merged, it will imply that the "legacy" codes *must* be removed.
> > [We don't want to keep duplicate functionality.]
> > -- The new implementation has improved the quality of optimization over
> the
> > legacy model.
>
> "Improved" in what sense?
> If you mean enhanced performance, such checks should be done
> using JMH (producing data to be published on the web site).
>
> --Along with performance and memory utilization, stochastic algorithms have
> another comparison parameter "quality of result". In stochastic algorithms,
> global optimum is not guaranteed, We have to compare the quality of the
> result along with performance and memory consumption to compare two
> algorithm implementations. I have kept the legacy example just for
> comparison between the new and the legacy implementations.

Great that you take care of checking improvements on the quality
measures.  Just make sure that the new code do not depend on
anything in module "commons-math-legacy".  As noted earlier, a
dependency on a previous release of CM with
  <scope>test</scope>
is fine.

> [...]

> >
> > (D) General design
> > Class "ConsoleLogger"
> > -> We should not reinvent the wheel.  We should consider whether logging
> > is necessary, and in the affirmative, depend on the de facto standard:
> > "slf4j".
> > -- Introduced the slf4j with simple implementation and removed the
> > consolelogger class.
>
> Could you please post a separate message highlighting the
> new dependency?
> --I shall do it.

Thanks.

>
> Also (if no objections are made):
> 1. the declaration should go in the main POM (in the <dependencyManagement>
>     section).
> 2. the library (CM) must not depend on a specific logger implementation.
> --Do you mean I should also remove the simple implementation "slf4j-simple"
> in commons-math-ga.

Yes.
Specific implementations a user's choice that could be different in
each of the "examples" modules.

> [...]
> >
> > * Class "RandomGenerator"
> > 1. Duplicates functionality (storage of thread-local instances)
> > readily available in "Commons RNG".
> > 2. (IMHO) Thread-local instances should not be used for "heavy" usage
> > (like in GA).
> > --Modified
>
> You still use "ThreadLocalRandomSource".
> I think that if some functionality requires a RNG instance it should be
> passed to its constructor (or created by it).
>
> I noticed that the interface "ChromosomeRepresentationUtils"
> is actually a utility class (as the name indicates): It only contains
> "static" functions.  This will probably require refactoring (utility
> classes are frowned upon nowadays).
> Some functionality in that class (e.g. sampling from a range of
> integers) exists in the "sampling" module of "Commons RNG".
>
> --Will look into this.

I understand that "hiding" the access to the RNG functionality makes
it for a seemingly nicer API, but I'm not sure that we should do it given
that the "random" aspect is a fundamental, and explicit, part of the GA
algorithm.
How about introducing "factories" to create the objects that use a RNG,
giving them their "own" RNG instance.  For example:
---CUT---
public interface MutationPolicy<P> {
    Chromosome<P> mutate(Chromosome<P> original, double mutationRate);

    interface Factory<P> {
        /**
         * Creates an instance with a dedicated source of randomness.
         *
         * @param rng RNG algorithm.
         * @param seed Seed.
         * @return an instance that must <em>not</em> be shared among threads.
         */
        MutationPolicy<P> create(RandomSource rng, Object seed);

        default MutationPolicy<P> create(RandomSource rng) {
            return create(rng, null);
        }
    }
}
---CUT---
?

> >
> > * Class "ValidationUtils"
> > -> should not be public (or should be defined in an "internal" package).
> > --Changed
>
> The class actually provides no added value.
>
> --I was thinking of having a validation utility which can be reused
> everywhere. Otherwise I have to duplicate the code in all places. Do you
> think that is a good way of doing this?

Reuse is good, sure; but in this case, the very small gain (one line)
is not worth the loss of clarity (method call vs direct conditional test).

>>> [...]
>
> The abstraction is quite useful, but the point is indeed that the
> abstraction
> is somewhat broken by exposing the List<T> as the representation.
> I understand that the advantage is to be able to define several
> functionalities
> (crossover, ...) in a generic way; but is it worth the obvious drawback is
> that
> object instances must be used to represent genes?
>
> --Initially I also thought of removing the List<T>. But I could not find
> any wayout for crossover and mutation operators. Those are the primary
> facilities this library should provide to the users. If any implementation
> needs to use primitive type, then the new chromosome class would be
> extended from the AbstractChromosome class. The new Binary chromosome would
> follow the same design. However, that much memory optimization would be
> required only for problems with quite higher dimensions like neuro
> evolution.[IMHO] For most practical purposes, List of objects might be
> sufficient.

As this is a fundamental design decision about which another
conclusion had been some time ago, it is worth posting a
dedicated message to the ML (to establish that this is the
choice being made).

> > > Any domain specific new chromosome implementation extending the
> > > AbstractListChromosome class can reuse all crossover and mutation
> > operators.
> > > For our proposed improvement of BinaryChromosome we should be able to
> > > extend the AbstractChromosome (*not* AbstractListChromosome) for the new
> > > class and provide the dedicated crossover and mutation operators for the
> > > corresponding Genotype. Without an *explicit* abstraction, management of
> > > crossover and mutation operators would be difficult.
>
> For sure, the design would be different. But it is not obvious to me that
> the current one is necessarily better.
>
> ---CUT---
> public interface CrossoverPolicy<P> {
>    ChromosomePair<P> crossover(Chromosome<P> first, Chromosome<P>
> second, double crossoverRate);
> }
> ---CUT---
> vs
> ---CUT---
> public interface Chromosome {
>    Chromosome crossover(Chromosome other, double crossoverRate);
> }
> ---CUT---
>
> -- [IMHO] The introduction of phenotype(P) is a better design. As this
> would enable reuse of the same chromosome genotype for both direct and
> indirect encodings. By virtue of this we have removed the RandomKey
> chromosome. Now RealValuedChromosome can serve the purpose with proper
> decoder.
> -- I have introduced crossoverRate as a parameter just to give more control
> to the crossover operator. This will enable users to change any logic by
> only changing the corresponding CrossoverPolicy. This is also in line with
> changes in MutationPolicy. This will minimize any impact on the algorithm
> class.
>
>
> > > Please share further thoughts regarding this.

I cannot comment yet as I didn't delve into the implementation details...

> >
> > Whether we should have a data-based (subclass of "List<T>") or
> > behaviour-based  access to individual genes is a fundamental design
> > decision.
> > The "legacy" implementation chose the former but it might be worth
> > considering an alternative (lest we'll need to support several APIs if
> > we later come to the conclusion that we need a more compact
> > representation for some use cases.
> > --How are you thinking the design to be behaviour-based. It will be
> helpful
> > if you share some examples.
> > The legacy was only based on data types. But the current model introduces
> > the concept of phenotype along with genotype. The genotype  (List<T>)
> > represents the actual chromosome implementation and the phenotype (<P>)
> > represents the problem domain. A Decoder is also introduced to convert
> > genotype to phenotype. In this way this design would be capable of using a
> > single genotype for a wide variety of problem domains(phenotype) including
> > both direct and indirect encoding. Do you see any challenge to this
> current
> > implementation?

You are the expert, who proposes design changes to allow
extended usage. ;-)

> > Kindly share your thoughts.
>
> I admit that I did not fully think about it (but I thought that we had
> agreed about the List<T> being a problem)...
> --Initially I also thought of removing the List<T>. But I could not find
> any wayout for crossover and mutation operators. Those are the primary
> facilities this library should provide to the users.

Sure.
If the "waste" in memory is worth the simplified handling, that's
fine (TBD following your post about, as said above).

>> [...]
> >
> > (E) Unit tests
> > * src/test
> > 1. New tests should use Junit 5.
> > 2. "Example usages" probably belong in the "examples-ga" module.
> > --JUnit version is inherited from commons-math module.
>
> Not sure what you mean.
> It's possible to use Junit5 in new modules/test classes even if
> other classes use older Junit.
> --Do you think it is fine to have two separate versions of JUnit library in
> CM. [IMHO] we should keep only one version only.

In the mid-term, yes.  But the point is that we must start somewhere.
And as you are creating a new tests suite, it's worth using the up-to-date
framework.  [This will also reduce the burden on the people who'll take
on updating all the other tests.]

> [...]
> >
> > (F) Code readability
> > * Please write one argument per line.
> > * Write one condition check per line.
> > * Avoid comments with no added value (like "constructor" for a
> constructor).
> > * Avoid "ASCII art" (see e.g. "OnePointCrossover"); a link[2] is often
> > preferable.
> > * Do no duplicate documentation (see e.g. "OnePointCrossover").
> > --I have formatted the method declaration to have one parameter in one
> line.
> > --Most of the if conditions are having a single condition except very few
> > pre existing ones. I could not see any way to format the if statement in
> > eclipse like the suggestion. I cannot introduce any formatting rule which
> > cannot be handled in eclipse as that will be very hard to manage.
>
> ?
> [I can't imagine that Eclipse won't let you add a newline.]
> --I searched a little bit but could not find anything relevant. However, I
> found the following reference
> https://stackoverflow.com/questions/31808237/formatting-if-else-in-eclipse
> Putting parenthesis is not an option. Let me know if you find anything
> relevant to this.

We can leave this nit for later.

>
> > --ASCII art and other crossover classes are untouched for this release.
>
> What do you mean by "this release"?
> In this instance, it is easy to make the docs clearer by using a link
> rather than ASCII figures.  If you want to argue that the latter should
> be kept, please start a new thread in order to collect other opinions.
> --I can start making the changes.

Great; thanks.

Best regards,
Gilles

>>>> [...]

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [MATH][GENETICS] Review of PR #197

Posted by Avijit Basak <av...@gmail.com>.
Hi All

        Please find my comments below:

>
> Hi All
>
>         I have fixed most of the review comments. The changes have been
> committed to PR#199.
>
> (A)
> Please "rebase" on "master".
> Please "squash" intermediate commits: For a new feature, a single commit
> should exist (that corresponds to the JIRA report describing it).
> --Will be done once all changes are finalized and committed.

What is the rationale for not doing it right now?
The PR should always be "rebased" on the latest "master".

--Done both rebase and squash.

>
> (B)
> I'm confused by your defining "legacy" packages in new modules...
> What kind of comparisons are you considering?
> It is fine to depend (with scope "test") on CM v3.6.1 to perhaps (?)
> regression testing; but please note that when your proposal is
> merged, it will imply that the "legacy" codes *must* be removed.
> [We don't want to keep duplicate functionality.]
> -- The new implementation has improved the quality of optimization over
the
> legacy model.

"Improved" in what sense?
If you mean enhanced performance, such checks should be done
using JMH (producing data to be published on the web site).

--Along with performance and memory utilization, stochastic algorithms have
another comparison parameter "quality of result". In stochastic algorithms,
global optimum is not guaranteed, We have to compare the quality of the
result along with performance and memory consumption to compare two
algorithm implementations. I have kept the legacy example just for
comparison between the new and the legacy implementations.

> I have added the legacy packages to demonstrate the same.
> Once we remove the genetics packages in the legacy module, the same will
be
> deleted from examples.

I'm probably missing what exactly those "legacy" examples aim to
demonstrate...
--Described above.
In passing, what's the purpose of
  Thread.sleep(5000)
(at line 55 in file "TSPOptimizerLegacy")?
--Removed.

>
> (C)
> File
> "commons-math-examples/examples-ga/src/main/resources/
spotbugs/spotbugs-exclude-filter.xml"
> does not belong there.
> --Removed
>
> (D) General design
> Class "ConsoleLogger"
> -> We should not reinvent the wheel.  We should consider whether logging
> is necessary, and in the affirmative, depend on the de facto standard:
> "slf4j".
> -- Introduced the slf4j with simple implementation and removed the
> consolelogger class.

Could you please post a separate message highlighting the
new dependency?
--I shall do it.

Also (if no objections are made):
1. the declaration should go in the main POM (in the <dependencyManagement>
    section).
2. the library (CM) must not depend on a specific logger implementation.
--Do you mean I should also remove the simple implementation "slf4j-simple"
in commons-math-ga.
>
> Class "Constants"
> -> Any data should be declared where its purpose is obvious.
> --Put the constants in relevant classes.
>
> * Class "RandomGenerator"
> 1. Duplicates functionality (storage of thread-local instances)
> readily available in "Commons RNG".
> 2. (IMHO) Thread-local instances should not be used for "heavy" usage
> (like in GA).
> --Modified

You still use "ThreadLocalRandomSource".
I think that if some functionality requires a RNG instance it should be
passed to its constructor (or created by it).

I noticed that the interface "ChromosomeRepresentationUtils"
is actually a utility class (as the name indicates): It only contains
"static" functions.  This will probably require refactoring (utility
classes are frowned upon nowadays).
Some functionality in that class (e.g. sampling from a range of
integers) exists in the "sampling" module of "Commons RNG".

--Will look into this.

>
> * Class "ValidationUtils"
> -> should not be public (or should be defined in an "internal" package).
> --Changed

The class actually provides no added value.

--I was thinking of having a validation utility which can be reused
everywhere. Otherwise I have to duplicate the code in all places. Do you
think that is a good way of doing this?

> >
> > * Class "AbstractListChromosome" (and subclasses)
> > Didn't we conclude that this was a very wasteful implementation of the
> > "chromosome" concept?
> > -- I have some concerns regarding this. I am not much aware of any
> > discussion regarding this conclusion.
>
> >Please search the ML archive; I seem to recall a detailed discussion
> >where Alex gave hints on how a binary chromosome should be
> >implemented.
> --There is a separate Jira for the same. I shall do the implementation.
>
> > Chromosomes are always conceptualized as collections of allele/genes. So
> We
> > need a collection of the *genotypes* anyway. Here List has been used as
a
> > collection.
> > We need an abstraction for representing the collection of Genotype. All
> > crossover and mutation operators are based on this abstraction. This
> > enabled reuse of crossover and mutation operators for all chromosome
types
> > which extend the abstraction. I am not sure how to achieve this
> reusability
> > without an abstraction.

The abstraction is quite useful, but the point is indeed that the
abstraction
is somewhat broken by exposing the List<T> as the representation.
I understand that the advantage is to be able to define several
functionalities
(crossover, ...) in a generic way; but is it worth the obvious drawback is
that
object instances must be used to represent genes?

--Initially I also thought of removing the List<T>. But I could not find
any wayout for crossover and mutation operators. Those are the primary
facilities this library should provide to the users. If any implementation
needs to use primitive type, then the new chromosome class would be
extended from the AbstractChromosome class. The new Binary chromosome would
follow the same design. However, that much memory optimization would be
required only for problems with quite higher dimensions like neuro
evolution.[IMHO] For most practical purposes, List of objects might be
sufficient.

> > Any domain specific new chromosome implementation extending the
> > AbstractListChromosome class can reuse all crossover and mutation
> operators.
> > For our proposed improvement of BinaryChromosome we should be able to
> > extend the AbstractChromosome (*not* AbstractListChromosome) for the new
> > class and provide the dedicated crossover and mutation operators for the
> > corresponding Genotype. Without an *explicit* abstraction, management of
> > crossover and mutation operators would be difficult.

For sure, the design would be different. But it is not obvious to me that
the current one is necessarily better.

---CUT---
public interface CrossoverPolicy<P> {
   ChromosomePair<P> crossover(Chromosome<P> first, Chromosome<P>
second, double crossoverRate);
}
---CUT---
vs
---CUT---
public interface Chromosome {
   Chromosome crossover(Chromosome other, double crossoverRate);
}
---CUT---

-- [IMHO] The introduction of phenotype(P) is a better design. As this
would enable reuse of the same chromosome genotype for both direct and
indirect encodings. By virtue of this we have removed the RandomKey
chromosome. Now RealValuedChromosome can serve the purpose with proper
decoder.
-- I have introduced crossoverRate as a parameter just to give more control
to the crossover operator. This will enable users to change any logic by
only changing the corresponding CrossoverPolicy. This is also in line with
changes in MutationPolicy. This will minimize any impact on the algorithm
class.


> > Please share further thoughts regarding this.
>
> Whether we should have a data-based (subclass of "List<T>") or
> behaviour-based  access to individual genes is a fundamental design
> decision.
> The "legacy" implementation chose the former but it might be worth
> considering an alternative (lest we'll need to support several APIs if
> we later come to the conclusion that we need a more compact
> representation for some use cases.
> --How are you thinking the design to be behaviour-based. It will be
helpful
> if you share some examples.
> The legacy was only based on data types. But the current model introduces
> the concept of phenotype along with genotype. The genotype  (List<T>)
> represents the actual chromosome implementation and the phenotype (<P>)
> represents the problem domain. A Decoder is also introduced to convert
> genotype to phenotype. In this way this design would be capable of using a
> single genotype for a wide variety of problem domains(phenotype) including
> both direct and indirect encoding. Do you see any challenge to this
current
> implementation?
> Kindly share your thoughts.

I admit that I did not fully think about it (but I thought that we had
agreed about the List<T> being a problem)...
--Initially I also thought of removing the List<T>. But I could not find
any wayout for crossover and mutation operators. Those are the primary
facilities this library should provide to the users.
>
> * Package "convergencecond"
> -> Should probably be renamed "convergence".
> --Done
>
> * Class "GeneticException"
> 1. Should not be public (or should be defined in an "internal"
package"[1]).
> 2. If various types (that map to different JDK subclasses of
> RuntimeException,
> e.g. "IllegalArgumentException" vs "NullPointerException") are necessary,
> there could be a factory for creating the appropriate instance.
> However, for "null" checks, please use the JDK utilities[2].
> --Moved to an internal package. Null checks have been modified too.
>
> >
> > * Class "ConvergenceListenerRegistry"
> > Shouldn't it be thread-safe?
> > -- Yes. We need this to be thread-safe for parallel multi-population
> > parallel genetic algorithms.
> --No change for the time being.
>
> (E) Unit tests
> * src/test
> 1. New tests should use Junit 5.
> 2. "Example usages" probably belong in the "examples-ga" module.
> --JUnit version is inherited from commons-math module.

Not sure what you mean.
It's possible to use Junit5 in new modules/test classes even if
other classes use older Junit.
--Do you think it is fine to have two separate versions of JUnit library in
CM. [IMHO] we should keep only one version only.

> --I could not understand what is meant by "Example usages" here. Which
> component is being referred to here.

I'm referring to a comment such as
---CUT---
        // to test a stochastic algorithm is hard, so this will rather
be an usage
        // example
---CUT---
(at line 72 in "GeneticAlgorithmTestBinary.java").
--Removed. This was an existing comment from previous release.
>
> (F) Code readability
> * Please write one argument per line.
> * Write one condition check per line.
> * Avoid comments with no added value (like "constructor" for a
constructor).
> * Avoid "ASCII art" (see e.g. "OnePointCrossover"); a link[2] is often
> preferable.
> * Do no duplicate documentation (see e.g. "OnePointCrossover").
> --I have formatted the method declaration to have one parameter in one
line.
> --Most of the if conditions are having a single condition except very few
> pre existing ones. I could not see any way to format the if statement in
> eclipse like the suggestion. I cannot introduce any formatting rule which
> cannot be handled in eclipse as that will be very hard to manage.

?
[I can't imagine that Eclipse won't let you add a newline.]
--I searched a little bit but could not find anything relevant. However, I
found the following reference
https://stackoverflow.com/questions/31808237/formatting-if-else-in-eclipse
Putting parenthesis is not an option. Let me know if you find anything
relevant to this.

> --ASCII art and other crossover classes are untouched for this release.

What do you mean by "this release"?
In this instance, it is easy to make the docs clearer by using a link
rather than ASCII figures.  If you want to argue that the latter should
be kept, please start a new thread in order to collect other opinions.
--I can start making the changes.


Thanks & Regards
--Avijit Basak

On Sat, 30 Oct 2021 at 07:11, Gilles Sadowski <gi...@gmail.com> wrote:

> Le ven. 29 oct. 2021 à 17:00, Avijit Basak <av...@gmail.com> a
> écrit :
> >
> > Hi All
> >
> >         I have fixed most of the review comments. The changes have been
> > committed to PR#199.
> >
> > (A)
> > Please "rebase" on "master".
> > Please "squash" intermediate commits: For a new feature, a single commit
> > should exist (that corresponds to the JIRA report describing it).
> > --Will be done once all changes are finalized and committed.
>
> What is the rationale for not doing it right now?
> The PR should always be "rebased" on the latest "master".
>
> >
> > (B)
> > I'm confused by your defining "legacy" packages in new modules...
> > What kind of comparisons are you considering?
> > It is fine to depend (with scope "test") on CM v3.6.1 to perhaps (?)
> > regression testing; but please note that when your proposal is
> > merged, it will imply that the "legacy" codes *must* be removed.
> > [We don't want to keep duplicate functionality.]
> > -- The new implementation has improved the quality of optimization over
> the
> > legacy model.
>
> "Improved" in what sense?
> If you mean enhanced performance, such checks should be done
> using JMH (producing data to be published on the web site).
>
> > I have added the legacy packages to demonstrate the same.
> > Once we remove the genetics packages in the legacy module, the same will
> be
> > deleted from examples.
>
> I'm probably missing what exactly those "legacy" examples aim to
> demonstrate...
> In passing, what's the purpose of
>   Thread.sleep(5000)
> (at line 55 in file "TSPOptimizerLegacy")?
>
> >
> > (C)
> > File
> >
> "commons-math-examples/examples-ga/src/main/resources/spotbugs/spotbugs-exclude-filter.xml"
> > does not belong there.
> > --Removed
> >
> > (D) General design
> > Class "ConsoleLogger"
> > -> We should not reinvent the wheel.  We should consider whether logging
> > is necessary, and in the affirmative, depend on the de facto standard:
> > "slf4j".
> > -- Introduced the slf4j with simple implementation and removed the
> > consolelogger class.
>
> Could you please post a separate message highlighting the
> new dependency?
> Also (if no objections are made):
> 1. the declaration should go in the main POM (in the <dependencyManagement>
>     section).
> 2. the library (CM) must not depend on a specific logger implementation.
>
> >
> > Class "Constants"
> > -> Any data should be declared where its purpose is obvious.
> > --Put the constants in relevant classes.
> >
> > * Class "RandomGenerator"
> > 1. Duplicates functionality (storage of thread-local instances)
> > readily available in "Commons RNG".
> > 2. (IMHO) Thread-local instances should not be used for "heavy" usage
> > (like in GA).
> > --Modified
>
> You still use "ThreadLocalRandomSource".
> I think that if some functionality requires a RNG instance it should be
> passed to its constructor (or created by it).
>
> I noticed that the interface "ChromosomeRepresentationUtils"
> is actually a utility class (as the name indicates): It only contains
> "static" functions.  This will probably require refactoring (utility
> classes are frowned upon nowadays).
> Some functionality in that class (e.g. sampling from a range of
> integers) exists in the "sampling" module of "Commons RNG".
>
> >
> > * Class "ValidationUtils"
> > -> should not be public (or should be defined in an "internal" package).
> > --Changed
>
> The class actually provides no added value.
>
> > >
> > > * Class "AbstractListChromosome" (and subclasses)
> > > Didn't we conclude that this was a very wasteful implementation of the
> > > "chromosome" concept?
> > > -- I have some concerns regarding this. I am not much aware of any
> > > discussion regarding this conclusion.
> >
> > >Please search the ML archive; I seem to recall a detailed discussion
> > >where Alex gave hints on how a binary chromosome should be
> > >implemented.
> > --There is a separate Jira for the same. I shall do the implementation.
> >
> > > Chromosomes are always conceptualized as collections of allele/genes.
> So
> > We
> > > need a collection of the *genotypes* anyway. Here List has been used
> as a
> > > collection.
> > > We need an abstraction for representing the collection of Genotype. All
> > > crossover and mutation operators are based on this abstraction. This
> > > enabled reuse of crossover and mutation operators for all chromosome
> types
> > > which extend the abstraction. I am not sure how to achieve this
> > reusability
> > > without an abstraction.
>
> The abstraction is quite useful, but the point is indeed that the
> abstraction
> is somewhat broken by exposing the List<T> as the representation.
> I understand that the advantage is to be able to define several
> functionalities
> (crossover, ...) in a generic way; but is it worth the obvious drawback is
> that
> object instances must be used to represent genes?
>
> > > Any domain specific new chromosome implementation extending the
> > > AbstractListChromosome class can reuse all crossover and mutation
> > operators.
> > > For our proposed improvement of BinaryChromosome we should be able to
> > > extend the AbstractChromosome (*not* AbstractListChromosome) for the
> new
> > > class and provide the dedicated crossover and mutation operators for
> the
> > > corresponding Genotype. Without an *explicit* abstraction, management
> of
> > > crossover and mutation operators would be difficult.
>
> For sure, the design would be different. But it is not obvious to me that
> the current one is necessarily better.
>
> ---CUT---
> public interface CrossoverPolicy<P> {
>    ChromosomePair<P> crossover(Chromosome<P> first, Chromosome<P>
> second, double crossoverRate);
> }
> ---CUT---
> vs
> ---CUT---
> public interface Chromosome {
>    Chromosome crossover(Chromosome other, double crossoverRate);
> }
> ---CUT---
>
> > > Please share further thoughts regarding this.
> >
> > Whether we should have a data-based (subclass of "List<T>") or
> > behaviour-based  access to individual genes is a fundamental design
> > decision.
> > The "legacy" implementation chose the former but it might be worth
> > considering an alternative (lest we'll need to support several APIs if
> > we later come to the conclusion that we need a more compact
> > representation for some use cases.
> > --How are you thinking the design to be behaviour-based. It will be
> helpful
> > if you share some examples.
> > The legacy was only based on data types. But the current model introduces
> > the concept of phenotype along with genotype. The genotype  (List<T>)
> > represents the actual chromosome implementation and the phenotype (<P>)
> > represents the problem domain. A Decoder is also introduced to convert
> > genotype to phenotype. In this way this design would be capable of using
> a
> > single genotype for a wide variety of problem domains(phenotype)
> including
> > both direct and indirect encoding. Do you see any challenge to this
> current
> > implementation?
> > Kindly share your thoughts.
>
> I admit that I did not fully think about it (but I thought that we had
> agreed about the List<T> being a problem)...
>
> >
> > * Package "convergencecond"
> > -> Should probably be renamed "convergence".
> > --Done
> >
> > * Class "GeneticException"
> > 1. Should not be public (or should be defined in an "internal"
> package"[1]).
> > 2. If various types (that map to different JDK subclasses of
> > RuntimeException,
> > e.g. "IllegalArgumentException" vs "NullPointerException") are necessary,
> > there could be a factory for creating the appropriate instance.
> > However, for "null" checks, please use the JDK utilities[2].
> > --Moved to an internal package. Null checks have been modified too.
> >
> > >
> > > * Class "ConvergenceListenerRegistry"
> > > Shouldn't it be thread-safe?
> > > -- Yes. We need this to be thread-safe for parallel multi-population
> > > parallel genetic algorithms.
> > --No change for the time being.
> >
> > (E) Unit tests
> > * src/test
> > 1. New tests should use Junit 5.
> > 2. "Example usages" probably belong in the "examples-ga" module.
> > --JUnit version is inherited from commons-math module.
>
> Not sure what you mean.
> It's possible to use Junit5 in new modules/test classes even if
> other classes use older Junit.
>
> > --I could not understand what is meant by "Example usages" here. Which
> > component is being referred to here.
>
> I'm referring to a comment such as
> ---CUT---
>         // to test a stochastic algorithm is hard, so this will rather
> be an usage
>         // example
> ---CUT---
> (at line 72 in "GeneticAlgorithmTestBinary.java").
>
> >
> > (F) Code readability
> > * Please write one argument per line.
> > * Write one condition check per line.
> > * Avoid comments with no added value (like "constructor" for a
> constructor).
> > * Avoid "ASCII art" (see e.g. "OnePointCrossover"); a link[2] is often
> > preferable.
> > * Do no duplicate documentation (see e.g. "OnePointCrossover").
> > --I have formatted the method declaration to have one parameter in one
> line.
> > --Most of the if conditions are having a single condition except very few
> > pre existing ones. I could not see any way to format the if statement in
> > eclipse like the suggestion. I cannot introduce any formatting rule which
> > cannot be handled in eclipse as that will be very hard to manage.
>
> ?
> [I can't imagine that Eclipse won't let you add a newline.]
>
> > --ASCII art and other crossover classes are untouched for this release.
>
> What do you mean by "this release"?
> In this instance, it is easy to make the docs clearer by using a link
> rather than ASCII figures.  If you want to argue that the latter should
> be kept, please start a new thread in order to collect other opinions.
>
> Regards,
> Gilles
>
> >
> > (G)
> > Some files contain "tab" characters (e.g. "pom.xml").
> > --Removed tab characters.
> >
> > Thanks & Regards
> > --Avijit Basak
> >
> >>> [...]
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

-- 
Avijit Basak

Re: [MATH][GENETICS] Review of PR #197

Posted by Gilles Sadowski <gi...@gmail.com>.
Le ven. 29 oct. 2021 à 17:00, Avijit Basak <av...@gmail.com> a écrit :
>
> Hi All
>
>         I have fixed most of the review comments. The changes have been
> committed to PR#199.
>
> (A)
> Please "rebase" on "master".
> Please "squash" intermediate commits: For a new feature, a single commit
> should exist (that corresponds to the JIRA report describing it).
> --Will be done once all changes are finalized and committed.

What is the rationale for not doing it right now?
The PR should always be "rebased" on the latest "master".

>
> (B)
> I'm confused by your defining "legacy" packages in new modules...
> What kind of comparisons are you considering?
> It is fine to depend (with scope "test") on CM v3.6.1 to perhaps (?)
> regression testing; but please note that when your proposal is
> merged, it will imply that the "legacy" codes *must* be removed.
> [We don't want to keep duplicate functionality.]
> -- The new implementation has improved the quality of optimization over the
> legacy model.

"Improved" in what sense?
If you mean enhanced performance, such checks should be done
using JMH (producing data to be published on the web site).

> I have added the legacy packages to demonstrate the same.
> Once we remove the genetics packages in the legacy module, the same will be
> deleted from examples.

I'm probably missing what exactly those "legacy" examples aim to
demonstrate...
In passing, what's the purpose of
  Thread.sleep(5000)
(at line 55 in file "TSPOptimizerLegacy")?

>
> (C)
> File
> "commons-math-examples/examples-ga/src/main/resources/spotbugs/spotbugs-exclude-filter.xml"
> does not belong there.
> --Removed
>
> (D) General design
> Class "ConsoleLogger"
> -> We should not reinvent the wheel.  We should consider whether logging
> is necessary, and in the affirmative, depend on the de facto standard:
> "slf4j".
> -- Introduced the slf4j with simple implementation and removed the
> consolelogger class.

Could you please post a separate message highlighting the
new dependency?
Also (if no objections are made):
1. the declaration should go in the main POM (in the <dependencyManagement>
    section).
2. the library (CM) must not depend on a specific logger implementation.

>
> Class "Constants"
> -> Any data should be declared where its purpose is obvious.
> --Put the constants in relevant classes.
>
> * Class "RandomGenerator"
> 1. Duplicates functionality (storage of thread-local instances)
> readily available in "Commons RNG".
> 2. (IMHO) Thread-local instances should not be used for "heavy" usage
> (like in GA).
> --Modified

You still use "ThreadLocalRandomSource".
I think that if some functionality requires a RNG instance it should be
passed to its constructor (or created by it).

I noticed that the interface "ChromosomeRepresentationUtils"
is actually a utility class (as the name indicates): It only contains
"static" functions.  This will probably require refactoring (utility
classes are frowned upon nowadays).
Some functionality in that class (e.g. sampling from a range of
integers) exists in the "sampling" module of "Commons RNG".

>
> * Class "ValidationUtils"
> -> should not be public (or should be defined in an "internal" package).
> --Changed

The class actually provides no added value.

> >
> > * Class "AbstractListChromosome" (and subclasses)
> > Didn't we conclude that this was a very wasteful implementation of the
> > "chromosome" concept?
> > -- I have some concerns regarding this. I am not much aware of any
> > discussion regarding this conclusion.
>
> >Please search the ML archive; I seem to recall a detailed discussion
> >where Alex gave hints on how a binary chromosome should be
> >implemented.
> --There is a separate Jira for the same. I shall do the implementation.
>
> > Chromosomes are always conceptualized as collections of allele/genes. So
> We
> > need a collection of the *genotypes* anyway. Here List has been used as a
> > collection.
> > We need an abstraction for representing the collection of Genotype. All
> > crossover and mutation operators are based on this abstraction. This
> > enabled reuse of crossover and mutation operators for all chromosome types
> > which extend the abstraction. I am not sure how to achieve this
> reusability
> > without an abstraction.

The abstraction is quite useful, but the point is indeed that the abstraction
is somewhat broken by exposing the List<T> as the representation.
I understand that the advantage is to be able to define several functionalities
(crossover, ...) in a generic way; but is it worth the obvious drawback is that
object instances must be used to represent genes?

> > Any domain specific new chromosome implementation extending the
> > AbstractListChromosome class can reuse all crossover and mutation
> operators.
> > For our proposed improvement of BinaryChromosome we should be able to
> > extend the AbstractChromosome (*not* AbstractListChromosome) for the new
> > class and provide the dedicated crossover and mutation operators for the
> > corresponding Genotype. Without an *explicit* abstraction, management of
> > crossover and mutation operators would be difficult.

For sure, the design would be different. But it is not obvious to me that
the current one is necessarily better.

---CUT---
public interface CrossoverPolicy<P> {
   ChromosomePair<P> crossover(Chromosome<P> first, Chromosome<P>
second, double crossoverRate);
}
---CUT---
vs
---CUT---
public interface Chromosome {
   Chromosome crossover(Chromosome other, double crossoverRate);
}
---CUT---

> > Please share further thoughts regarding this.
>
> Whether we should have a data-based (subclass of "List<T>") or
> behaviour-based  access to individual genes is a fundamental design
> decision.
> The "legacy" implementation chose the former but it might be worth
> considering an alternative (lest we'll need to support several APIs if
> we later come to the conclusion that we need a more compact
> representation for some use cases.
> --How are you thinking the design to be behaviour-based. It will be helpful
> if you share some examples.
> The legacy was only based on data types. But the current model introduces
> the concept of phenotype along with genotype. The genotype  (List<T>)
> represents the actual chromosome implementation and the phenotype (<P>)
> represents the problem domain. A Decoder is also introduced to convert
> genotype to phenotype. In this way this design would be capable of using a
> single genotype for a wide variety of problem domains(phenotype) including
> both direct and indirect encoding. Do you see any challenge to this current
> implementation?
> Kindly share your thoughts.

I admit that I did not fully think about it (but I thought that we had
agreed about the List<T> being a problem)...

>
> * Package "convergencecond"
> -> Should probably be renamed "convergence".
> --Done
>
> * Class "GeneticException"
> 1. Should not be public (or should be defined in an "internal" package"[1]).
> 2. If various types (that map to different JDK subclasses of
> RuntimeException,
> e.g. "IllegalArgumentException" vs "NullPointerException") are necessary,
> there could be a factory for creating the appropriate instance.
> However, for "null" checks, please use the JDK utilities[2].
> --Moved to an internal package. Null checks have been modified too.
>
> >
> > * Class "ConvergenceListenerRegistry"
> > Shouldn't it be thread-safe?
> > -- Yes. We need this to be thread-safe for parallel multi-population
> > parallel genetic algorithms.
> --No change for the time being.
>
> (E) Unit tests
> * src/test
> 1. New tests should use Junit 5.
> 2. "Example usages" probably belong in the "examples-ga" module.
> --JUnit version is inherited from commons-math module.

Not sure what you mean.
It's possible to use Junit5 in new modules/test classes even if
other classes use older Junit.

> --I could not understand what is meant by "Example usages" here. Which
> component is being referred to here.

I'm referring to a comment such as
---CUT---
        // to test a stochastic algorithm is hard, so this will rather
be an usage
        // example
---CUT---
(at line 72 in "GeneticAlgorithmTestBinary.java").

>
> (F) Code readability
> * Please write one argument per line.
> * Write one condition check per line.
> * Avoid comments with no added value (like "constructor" for a constructor).
> * Avoid "ASCII art" (see e.g. "OnePointCrossover"); a link[2] is often
> preferable.
> * Do no duplicate documentation (see e.g. "OnePointCrossover").
> --I have formatted the method declaration to have one parameter in one line.
> --Most of the if conditions are having a single condition except very few
> pre existing ones. I could not see any way to format the if statement in
> eclipse like the suggestion. I cannot introduce any formatting rule which
> cannot be handled in eclipse as that will be very hard to manage.

?
[I can't imagine that Eclipse won't let you add a newline.]

> --ASCII art and other crossover classes are untouched for this release.

What do you mean by "this release"?
In this instance, it is easy to make the docs clearer by using a link
rather than ASCII figures.  If you want to argue that the latter should
be kept, please start a new thread in order to collect other opinions.

Regards,
Gilles

>
> (G)
> Some files contain "tab" characters (e.g. "pom.xml").
> --Removed tab characters.
>
> Thanks & Regards
> --Avijit Basak
>
>>> [...]

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [MATH][GENETICS] Review of PR #197

Posted by Avijit Basak <av...@gmail.com>.
Hi All

        I have fixed most of the review comments. The changes have been
committed to PR#199.

(A)
Please "rebase" on "master".
Please "squash" intermediate commits: For a new feature, a single commit
should exist (that corresponds to the JIRA report describing it).
--Will be done once all changes are finalized and committed.

(B)
I'm confused by your defining "legacy" packages in new modules...
What kind of comparisons are you considering?
It is fine to depend (with scope "test") on CM v3.6.1 to perhaps (?)
regression testing; but please note that when your proposal is
merged, it will imply that the "legacy" codes *must* be removed.
[We don't want to keep duplicate functionality.]
-- The new implementation has improved the quality of optimization over the
legacy model. I have added the legacy packages to demonstrate the same.
Once we remove the genetics packages in the legacy module, the same will be
deleted from examples.

(C)
File
"commons-math-examples/examples-ga/src/main/resources/spotbugs/spotbugs-exclude-filter.xml"
does not belong there.
--Removed

(D) General design
Class "ConsoleLogger"
-> We should not reinvent the wheel.  We should consider whether logging
is necessary, and in the affirmative, depend on the de facto standard:
"slf4j".
-- Introduced the slf4j with simple implementation and removed the
consolelogger class.

Class "Constants"
-> Any data should be declared where its purpose is obvious.
--Put the constants in relevant classes.

* Class "RandomGenerator"
1. Duplicates functionality (storage of thread-local instances)
readily available in "Commons RNG".
2. (IMHO) Thread-local instances should not be used for "heavy" usage
(like in GA).
--Modified

* Class "ValidationUtils"
-> should not be public (or should be defined in an "internal" package).
--Changed

>
> * Class "AbstractListChromosome" (and subclasses)
> Didn't we conclude that this was a very wasteful implementation of the
> "chromosome" concept?
> -- I have some concerns regarding this. I am not much aware of any
> discussion regarding this conclusion.

>Please search the ML archive; I seem to recall a detailed discussion
>where Alex gave hints on how a binary chromosome should be
>implemented.
--There is a separate Jira for the same. I shall do the implementation.

> Chromosomes are always conceptualized as collections of allele/genes. So
We
> need a collection of the *genotypes* anyway. Here List has been used as a
> collection.
> We need an abstraction for representing the collection of Genotype. All
> crossover and mutation operators are based on this abstraction. This
> enabled reuse of crossover and mutation operators for all chromosome types
> which extend the abstraction. I am not sure how to achieve this
reusability
> without an abstraction.
> Any domain specific new chromosome implementation extending the
> AbstractListChromosome class can reuse all crossover and mutation
operators.
> For our proposed improvement of BinaryChromosome we should be able to
> extend the AbstractChromosome (*not* AbstractListChromosome) for the new
> class and provide the dedicated crossover and mutation operators for the
> corresponding Genotype. Without an *explicit* abstraction, management of
> crossover and mutation operators would be difficult.
> Please share further thoughts regarding this.

Whether we should have a data-based (subclass of "List<T>") or
behaviour-based  access to individual genes is a fundamental design
decision.
The "legacy" implementation chose the former but it might be worth
considering an alternative (lest we'll need to support several APIs if
we later come to the conclusion that we need a more compact
representation for some use cases.
--How are you thinking the design to be behaviour-based. It will be helpful
if you share some examples.
The legacy was only based on data types. But the current model introduces
the concept of phenotype along with genotype. The genotype  (List<T>)
represents the actual chromosome implementation and the phenotype (<P>)
represents the problem domain. A Decoder is also introduced to convert
genotype to phenotype. In this way this design would be capable of using a
single genotype for a wide variety of problem domains(phenotype) including
both direct and indirect encoding. Do you see any challenge to this current
implementation?
Kindly share your thoughts.

* Package "convergencecond"
-> Should probably be renamed "convergence".
--Done

* Class "GeneticException"
1. Should not be public (or should be defined in an "internal" package"[1]).
2. If various types (that map to different JDK subclasses of
RuntimeException,
e.g. "IllegalArgumentException" vs "NullPointerException") are necessary,
there could be a factory for creating the appropriate instance.
However, for "null" checks, please use the JDK utilities[2].
--Moved to an internal package. Null checks have been modified too.

>
> * Class "ConvergenceListenerRegistry"
> Shouldn't it be thread-safe?
> -- Yes. We need this to be thread-safe for parallel multi-population
> parallel genetic algorithms.
--No change for the time being.

(E) Unit tests
* src/test
1. New tests should use Junit 5.
2. "Example usages" probably belong in the "examples-ga" module.
--JUnit version is inherited from commons-math module.
--I could not understand what is meant by "Example usages" here. Which
component is being referred to here.


(F) Code readability
* Please write one argument per line.
* Write one condition check per line.
* Avoid comments with no added value (like "constructor" for a constructor).
* Avoid "ASCII art" (see e.g. "OnePointCrossover"); a link[2] is often
preferable.
* Do no duplicate documentation (see e.g. "OnePointCrossover").
--I have formatted the method declaration to have one parameter in one line.
--Most of the if conditions are having a single condition except very few
pre existing ones. I could not see any way to format the if statement in
eclipse like the suggestion. I cannot introduce any formatting rule which
cannot be handled in eclipse as that will be very hard to manage.
--ASCII art and other crossover classes are untouched for this release.

(G)
Some files contain "tab" characters (e.g. "pom.xml").
--Removed tab characters.

Thanks & Regards
--Avijit Basak

On Thu, 21 Oct 2021 at 21:32, Gilles Sadowski <gi...@gmail.com> wrote:

> Le mer. 20 oct. 2021 à 08:47, Avijit Basak <av...@gmail.com> a
> écrit :
> >
> > Hi
> >
> >          Thanks for the review comments. I have started making the
> changes.
> > However, I have some queries regarding some of comments as noted below:
>
> Some (partial) answers below.
>
> >
> > (B)
> > I'm confused by your defining "legacy" packages in new modules...
> > --This is kept for comparison purposes between the legacy and the new
> > implementation of GA.
>
> What kind of comparisons are you considering?
> It is fine to depend (with scope "test") on CM v3.6.1 to perhaps (?)
> regression testing; but please note that when your proposal is
> merged, it will imply that the "legacy" codes *must* be removed.
> [We don't want to keep duplicate functionality.]
>
> > (D) General design
> > Class "ConsoleLogger"
> > -> We should not reinvent the wheel.  We should consider whether logging
> > is necessary, and in the affirmative, depend on the de facto standard:
> > "slf4j".
> > -- I don't see any use of a logging framework in the math library.
>
> There is a long history of not wanting any kind of dependency.
> But this ship has sailed.
>
> > That is
> > the reason I introduced ConsoleLogger. If we introduce a logging
> framework
> > we won't need this class at all. I think we should include the logger in
> > the root(commons-math\) pom.xml file so that all modules should be able
> to
> > use this.
>
> Starting with the upcoming release, we can decide on a per-module
> basis.  Please make the case (in a new ML thread) for introducing
> such a dependency in the GA module.
>
> >
> > Class "Constants"
> > -> Any data should be declared where its purpose is obvious.
> > -- We can declare the constants where it belongs but this might introduce
> > duplicate constants across different classes and hence reduce
> reusability.
>
> The class does not mention where the data is used, nor why it is
> necessary that it be "public".
> By default, the leaner API (i.e. no unnecessary "public" components),
> the better (even if sometimes that would entail duplicating "private"
> data).
> [TBD on a case-by-case basis.]
>
> >
> > * Class "AbstractListChromosome" (and subclasses)
> > Didn't we conclude that this was a very wasteful implementation of the
> > "chromosome" concept?
> > -- I have some concerns regarding this. I am not much aware of any
> > discussion regarding this conclusion.
>
> Please search the ML archive; I seem to recall a detailed discussion
> where Alex gave hints on how a binary chromosome should be
> implemented.
>
> > Chromosomes are always conceptualized as collections of allele/genes. So
> We
> > need a collection of the *genotypes* anyway. Here List has been used as a
> > collection.
> > We need an abstraction for representing the collection of Genotype. All
> > crossover and mutation operators are based on this abstraction. This
> > enabled reuse of crossover and mutation operators for all chromosome
> types
> > which extend the abstraction. I am not sure how to achieve this
> reusability
> > without an abstraction.
> > Any domain specific new chromosome implementation extending the
> > AbstractListChromosome class can reuse all crossover and mutation
> operators.
> > For our proposed improvement of BinaryChromosome we should be able to
> > extend the AbstractChromosome (*not* AbstractListChromosome) for the new
> > class and provide the dedicated crossover and mutation operators for the
> > corresponding Genotype. Without an *explicit* abstraction, management of
> > crossover and mutation operators would be difficult.
> > Please share further thoughts regarding this.
>
> Whether we should have a data-based (subclass of "List<T>") or
> behaviour-based  access to individual genes is a fundamental design
> decision.
> The "legacy" implementation chose the former but it might be worth
> considering an alternative (lest we'll need to support several APIs if
> we later come to the conclusion that we need a more compact
> representation for some use cases.
>
> > * Class "GeneticException"
> > 1. Should not be public (or should be defined in an "internal"
> package"[1]).
> > 2. If various types (that map to different JDK subclasses of
> > RuntimeException,
> > e.g. "IllegalArgumentException" vs "NullPointerException") are necessary,
> > there could be a factory for creating the appropriate instance.
> > However, for "null" checks, please use the JDK utilities[2].
> > -- As of now we are managing all exception types by single
> GeneticException
> > class. So there is no factory.
>
> The issue is that a NPE should not be an IAE if we want to be consistent
> with the JDK's exceptions (as was decided some years ago).
> [The "legacy" CM exceptions to not follow this convention (for practical
> reasons in order to implement features that were never used), which is
> why they are "legacy"...]
>
> > -- Using JDK utilities for NullPointer would repeat this code in all
> > places. Is it fine?
> > Objects.requireNonNull(object,
> > Message.format(GeneticException.NULL_ARGUMENT, args));
>
> Calling "requireNonNull(T obj)" is self-documenting; there is no
> added value in a "null argument" message.
>
> >
> > * Class "ConvergenceListenerRegistry"
> > Shouldn't it be thread-safe?
> > -- Yes. We need this to be thread-safe for parallel multi-population
> > parallel genetic algorithms.
>
> In the current code, it is not (IIUC).
>
> Regards,
> Gilles
>
> >> [...]
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

-- 
Avijit Basak

Re: [MATH][GENETICS] Review of PR #197

Posted by Gilles Sadowski <gi...@gmail.com>.
Le mer. 20 oct. 2021 à 08:47, Avijit Basak <av...@gmail.com> a écrit :
>
> Hi
>
>          Thanks for the review comments. I have started making the changes.
> However, I have some queries regarding some of comments as noted below:

Some (partial) answers below.

>
> (B)
> I'm confused by your defining "legacy" packages in new modules...
> --This is kept for comparison purposes between the legacy and the new
> implementation of GA.

What kind of comparisons are you considering?
It is fine to depend (with scope "test") on CM v3.6.1 to perhaps (?)
regression testing; but please note that when your proposal is
merged, it will imply that the "legacy" codes *must* be removed.
[We don't want to keep duplicate functionality.]

> (D) General design
> Class "ConsoleLogger"
> -> We should not reinvent the wheel.  We should consider whether logging
> is necessary, and in the affirmative, depend on the de facto standard:
> "slf4j".
> -- I don't see any use of a logging framework in the math library.

There is a long history of not wanting any kind of dependency.
But this ship has sailed.

> That is
> the reason I introduced ConsoleLogger. If we introduce a logging framework
> we won't need this class at all. I think we should include the logger in
> the root(commons-math\) pom.xml file so that all modules should be able to
> use this.

Starting with the upcoming release, we can decide on a per-module
basis.  Please make the case (in a new ML thread) for introducing
such a dependency in the GA module.

>
> Class "Constants"
> -> Any data should be declared where its purpose is obvious.
> -- We can declare the constants where it belongs but this might introduce
> duplicate constants across different classes and hence reduce reusability.

The class does not mention where the data is used, nor why it is
necessary that it be "public".
By default, the leaner API (i.e. no unnecessary "public" components),
the better (even if sometimes that would entail duplicating "private" data).
[TBD on a case-by-case basis.]

>
> * Class "AbstractListChromosome" (and subclasses)
> Didn't we conclude that this was a very wasteful implementation of the
> "chromosome" concept?
> -- I have some concerns regarding this. I am not much aware of any
> discussion regarding this conclusion.

Please search the ML archive; I seem to recall a detailed discussion
where Alex gave hints on how a binary chromosome should be
implemented.

> Chromosomes are always conceptualized as collections of allele/genes. So We
> need a collection of the *genotypes* anyway. Here List has been used as a
> collection.
> We need an abstraction for representing the collection of Genotype. All
> crossover and mutation operators are based on this abstraction. This
> enabled reuse of crossover and mutation operators for all chromosome types
> which extend the abstraction. I am not sure how to achieve this reusability
> without an abstraction.
> Any domain specific new chromosome implementation extending the
> AbstractListChromosome class can reuse all crossover and mutation operators.
> For our proposed improvement of BinaryChromosome we should be able to
> extend the AbstractChromosome (*not* AbstractListChromosome) for the new
> class and provide the dedicated crossover and mutation operators for the
> corresponding Genotype. Without an *explicit* abstraction, management of
> crossover and mutation operators would be difficult.
> Please share further thoughts regarding this.

Whether we should have a data-based (subclass of "List<T>") or
behaviour-based  access to individual genes is a fundamental design
decision.
The "legacy" implementation chose the former but it might be worth
considering an alternative (lest we'll need to support several APIs if
we later come to the conclusion that we need a more compact
representation for some use cases.

> * Class "GeneticException"
> 1. Should not be public (or should be defined in an "internal" package"[1]).
> 2. If various types (that map to different JDK subclasses of
> RuntimeException,
> e.g. "IllegalArgumentException" vs "NullPointerException") are necessary,
> there could be a factory for creating the appropriate instance.
> However, for "null" checks, please use the JDK utilities[2].
> -- As of now we are managing all exception types by single GeneticException
> class. So there is no factory.

The issue is that a NPE should not be an IAE if we want to be consistent
with the JDK's exceptions (as was decided some years ago).
[The "legacy" CM exceptions to not follow this convention (for practical
reasons in order to implement features that were never used), which is
why they are "legacy"...]

> -- Using JDK utilities for NullPointer would repeat this code in all
> places. Is it fine?
> Objects.requireNonNull(object,
> Message.format(GeneticException.NULL_ARGUMENT, args));

Calling "requireNonNull(T obj)" is self-documenting; there is no
added value in a "null argument" message.

>
> * Class "ConvergenceListenerRegistry"
> Shouldn't it be thread-safe?
> -- Yes. We need this to be thread-safe for parallel multi-population
> parallel genetic algorithms.

In the current code, it is not (IIUC).

Regards,
Gilles

>> [...]

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [MATH][GENETICS] Review of PR #197

Posted by Avijit Basak <av...@gmail.com>.
Hi

         Thanks for the review comments. I have started making the changes.
However, I have some queries regarding some of comments as noted below:

(B)
I'm confused by your defining "legacy" packages in new modules...
--This is kept for comparison purposes between the legacy and the new
implementation of GA.

(D) General design
Class "ConsoleLogger"
-> We should not reinvent the wheel.  We should consider whether logging
is necessary, and in the affirmative, depend on the de facto standard:
"slf4j".
-- I don't see any use of a logging framework in the math library. That is
the reason I introduced ConsoleLogger. If we introduce a logging framework
we won't need this class at all. I think we should include the logger in
the root(commons-math\) pom.xml file so that all modules should be able to
use this.

Class "Constants"
-> Any data should be declared where its purpose is obvious.
-- We can declare the constants where it belongs but this might introduce
duplicate constants across different classes and hence reduce reusability.

* Class "AbstractListChromosome" (and subclasses)
Didn't we conclude that this was a very wasteful implementation of the
"chromosome" concept?
-- I have some concerns regarding this. I am not much aware of any
discussion regarding this conclusion.
Chromosomes are always conceptualized as collections of allele/genes. So We
need a collection of the *genotypes* anyway. Here List has been used as a
collection.
We need an abstraction for representing the collection of Genotype. All
crossover and mutation operators are based on this abstraction. This
enabled reuse of crossover and mutation operators for all chromosome types
which extend the abstraction. I am not sure how to achieve this reusability
without an abstraction.
Any domain specific new chromosome implementation extending the
AbstractListChromosome class can reuse all crossover and mutation operators.
For our proposed improvement of BinaryChromosome we should be able to
extend the AbstractChromosome (*not* AbstractListChromosome) for the new
class and provide the dedicated crossover and mutation operators for the
corresponding Genotype. Without an *explicit* abstraction, management of
crossover and mutation operators would be difficult.
Please share further thoughts regarding this.

* Class "GeneticException"
1. Should not be public (or should be defined in an "internal" package"[1]).
2. If various types (that map to different JDK subclasses of
RuntimeException,
e.g. "IllegalArgumentException" vs "NullPointerException") are necessary,
there could be a factory for creating the appropriate instance.
However, for "null" checks, please use the JDK utilities[2].
-- As of now we are managing all exception types by single GeneticException
class. So there is no factory.
-- Using JDK utilities for NullPointer would repeat this code in all
places. Is it fine?
Objects.requireNonNull(object,
Message.format(GeneticException.NULL_ARGUMENT, args));

* Class "ConvergenceListenerRegistry"
Shouldn't it be thread-safe?
-- Yes. We need this to be thread-safe for parallel multi-population
parallel genetic algorithms.


Thanks & Regards
--Avijit Basak

On Mon, 18 Oct 2021 at 23:13, Gilles Sadowski <gi...@gmail.com> wrote:

> Hello.
>
> Sorry for the delay in reviewing.
>
> Le lun. 18 oct. 2021 à 09:35, Avijit Basak <av...@gmail.com> a
> écrit :
> >
> > Hi All
> >
> >         I have created PR#197 as mentioned earlier. Kindly let me know if
> > there is any concern or comments.
> >         I have created another *PR#199* consisting of the changes with
> > adaptive probability generations.
>
> Please find below my first remarks.
>
> (A)
> Please "rebase" on "master".
> Please "squash" intermediate commits: For a new feature, a single commit
> should exist (that corresponds to the JIRA report describing it).
>
> (B)
> I'm confused by your defining "legacy" packages in new modules...
>
> (C)
> File
> "commons-math-examples/examples-ga/src/main/resources/spotbugs/spotbugs-exclude-filter.xml"
> does not belong there.
>
> (D) General design
> Class "ConsoleLogger"
> -> We should not reinvent the wheel.  We should consider whether logging
> is necessary, and in the affirmative, depend on the de facto standard:
> "slf4j".
>
> Class "Constants"
> -> Any data should be declared where its purpose is obvious.
>
> * Class "RandomGenerator"
> 1. Duplicates functionality (storage of thread-local instances)
> readily available in "Commons RNG".
> 2. (IMHO) Thread-local instances should not be used for "heavy" usage
> (like in GA).
>
> * Class "ValidationUtils"
> -> should not be public (or should be defined in an "internal" package).
>
> * Class "AbstractListChromosome" (and subclasses)
> Didn't we conclude that this was a very wasteful implementation of the
> "chromosome" concept?
>
> * Package "convergencecond"
> -> Should probably be renamed "convergence".
>
> * Class "GeneticException"
> 1. Should not be public (or should be defined in an "internal"
> package"[1]).
> 2. If various types (that map to different JDK subclasses of
> RuntimeException,
> e.g. "IllegalArgumentException" vs "NullPointerException") are necessary,
> there could be a factory for creating the appropriate instance.
> However, for "null" checks, please use the JDK utilities[2].
>
> * Class "ConvergenceListenerRegistry"
> Shouldn't it be thread-safe?
>
> (E) Unit tests
> * src/test
> 1. New tests should use Junit 5.
> 2. "Example usages" probably belong in the "examples-ga" module.
>
> (F) Code readability
> * Please write one argument per line.
> * Write one condition check per line.
> * Avoid comments with no added value (like "constructor" for a
> constructor).
> * Avoid "ASCII art" (see e.g. "OnePointCrossover"); a link[2] is often
> preferable.
> * Do no duplicate documentation (see e.g. "OnePointCrossover").
>
> (G)
> Some files contain "tab" characters (e.g. "pom.xml").
>
> Best regards,
> Gilles
>
> [1]
> https://gitbox.apache.org/repos/asf?p=commons-math.git;a=tree;f=commons-math-neuralnet/src/main/java/org/apache/commons/math4/neuralnet/internal;h=c0bc1409cdee0932c973fae05f03e8288034c005;hb=HEAD
> [2]
> https://docs.oracle.com/javase/8/docs/api/java/util/Objects.html#requireNonNull-T-
> [3] https://en.wikipedia.org/wiki/Crossover_(genetic_algorithm)
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

-- 
Avijit Basak

Re: [MATH][GENETICS] Review of PR #197

Posted by Gilles Sadowski <gi...@gmail.com>.
Hello.

Sorry for the delay in reviewing.

Le lun. 18 oct. 2021 à 09:35, Avijit Basak <av...@gmail.com> a écrit :
>
> Hi All
>
>         I have created PR#197 as mentioned earlier. Kindly let me know if
> there is any concern or comments.
>         I have created another *PR#199* consisting of the changes with
> adaptive probability generations.

Please find below my first remarks.

(A)
Please "rebase" on "master".
Please "squash" intermediate commits: For a new feature, a single commit
should exist (that corresponds to the JIRA report describing it).

(B)
I'm confused by your defining "legacy" packages in new modules...

(C)
File "commons-math-examples/examples-ga/src/main/resources/spotbugs/spotbugs-exclude-filter.xml"
does not belong there.

(D) General design
Class "ConsoleLogger"
-> We should not reinvent the wheel.  We should consider whether logging
is necessary, and in the affirmative, depend on the de facto standard: "slf4j".

Class "Constants"
-> Any data should be declared where its purpose is obvious.

* Class "RandomGenerator"
1. Duplicates functionality (storage of thread-local instances)
readily available in "Commons RNG".
2. (IMHO) Thread-local instances should not be used for "heavy" usage
(like in GA).

* Class "ValidationUtils"
-> should not be public (or should be defined in an "internal" package).

* Class "AbstractListChromosome" (and subclasses)
Didn't we conclude that this was a very wasteful implementation of the
"chromosome" concept?

* Package "convergencecond"
-> Should probably be renamed "convergence".

* Class "GeneticException"
1. Should not be public (or should be defined in an "internal" package"[1]).
2. If various types (that map to different JDK subclasses of RuntimeException,
e.g. "IllegalArgumentException" vs "NullPointerException") are necessary,
there could be a factory for creating the appropriate instance.
However, for "null" checks, please use the JDK utilities[2].

* Class "ConvergenceListenerRegistry"
Shouldn't it be thread-safe?

(E) Unit tests
* src/test
1. New tests should use Junit 5.
2. "Example usages" probably belong in the "examples-ga" module.

(F) Code readability
* Please write one argument per line.
* Write one condition check per line.
* Avoid comments with no added value (like "constructor" for a constructor).
* Avoid "ASCII art" (see e.g. "OnePointCrossover"); a link[2] is often
preferable.
* Do no duplicate documentation (see e.g. "OnePointCrossover").

(G)
Some files contain "tab" characters (e.g. "pom.xml").

Best regards,
Gilles

[1] https://gitbox.apache.org/repos/asf?p=commons-math.git;a=tree;f=commons-math-neuralnet/src/main/java/org/apache/commons/math4/neuralnet/internal;h=c0bc1409cdee0932c973fae05f03e8288034c005;hb=HEAD
[2] https://docs.oracle.com/javase/8/docs/api/java/util/Objects.html#requireNonNull-T-
[3] https://en.wikipedia.org/wiki/Crossover_(genetic_algorithm)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org