You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@commons.apache.org by Gilles Sadowski <gi...@gmail.com> on 2021/02/09 22:43:21 UTC

[Vote] Create a "machine learning" component

Hi.

Because of an offered contribution, a discussion happened on
JIRA[1] and in another thread[2] about improving the genetic
algorithm (GA) implementation currently in the
   org.apache.commons.math4.genetic
package of the "Commons Math" component.
It would make sense to group "machine learning" algorithms[3]
(to which GA belongs) within a single component, where codes from
  org.apache.commons.math4.ml.neuralnet
  org.apache.commons.math4.ml.clustering
would be moved too.
This would be the fifth (and last) component resulting from my proposal
(see e.g. [4] among other threads) for the reorganization of the "Commons
Math"[5] code base into more maintainable components[6][7][8][9], each
focused on actually related functionalities (thus *not* the wide expertise
necessary for the maintenance of a full-fledged math library).

I suggest "ML" for the name of the component.

Regards,
Gilles

[1] https://issues.apache.org/jira/projects/MATH/issues/MATH-1563
[2] https://markmail.org/message/dnujdcxuaq5bwuwe
[3] https://en.wikipedia.org/wiki/Machine_learning
[4] https://markmail.org/message/75vuyhzblfadc5op
[5] http://commons.apache.org/proper/commons-math/
[6] http://commons.apache.org/proper/commons-rng/
[7] http://commons.apache.org/proper/commons-numbers/
[8] http://commons.apache.org/proper/commons-geometry/
[9] http://commons.apache.org/proper/commons-statistics/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Vote] Create a "machine learning" component

Posted by Gilles Sadowski <gi...@gmail.com>.

Le mar. 20 avr. 2021 à 16:09, Avijit Basak <av...@gmail.com> a écrit :
>
> Hi
>
>           > Did you ask "Spark" people about their opinion about it?
>             -- Not yet. I am not sure what would be the right option for
> this communication. It will be good if you can approach them.

You are the one who proposes a functionality that might be of interest
to the "Spark" project, perhaps on some condition on their part which
*you* are going to have to accept (or not).

In other words: It would be useless that *I* go and tell them there exist
some code in Commons Math which they could take an adapt for their
project (they can always do that).
What might be of value to them (as to the Commons project, too), is a
contributor willing to do the necessary work to create or improve a
community-supported feature.

>           > where it can be used in real-life (performance-wise)
> applications, then you should demonstrate it
>             -- Do we have any kind of performance benchmark or use case
> regarding this?

Please assume that *you* are the person with the most GA expertise
in this forum.
There certainly are unit tests for the GA functionality, but I don't think
there are benchmarks; certainly, one task would be to set up a module
for (JMH-based) experimentation.

> Once that is decided,

One mantra of ASF communities is that "those who do the work get
to decide".
[The PMC can decide (by vote) whether to accept a new component;
but it's up to you to show that it's worth it (with the risk that the PMC
won't accurately judge the contribution, unfortunately)...]

> then I can proceed with this.

There is already a long list of things that can be done.

You don't *have* to contact "Spark" if you don't feel that it's the
right project for your work.  You could just hope for the best, and
start somewhere else (modularization of Commons Math, a fork
on GitHub of of CM ML-related codes, and so on).

The one thing which I won't be helping with is merging ad-hoc
GA-related changes into the current CM codebase.
This doesn't preclude that other committers might want to do that
for you; however judging by the last 5 years, I wouldn't count too
much on it. ;-)

Regards,
Gilles

>
>
> Thanks & Regards
> --Avijit Basak
>
> On Mon, 19 Apr 2021 at 18:51, Gilles Sadowski <gi...@gmail.com> wrote:
>
> > Hello.
> >
> > Le lun. 19 avr. 2021 à 08:35, Avijit Basak <av...@gmail.com> a
> > écrit :
> > >
> > > Hi
> > >
> > > >Isn't a GA inherently parallel?
> > > >If so, why not take advantage of the concurrency tools provided by the
> > JDK?
> > >   -- Are we planning to implement multi-threading for GA operations even
> > as
> > > part of a single population
> >
> > This seems an obvious improvement to our current implementation
> > (in case a chromosome's evaluation is not population-dependent).
> >
> > > or only for multi-population parallel GA.
> > >   -- We can implement different types of co-evolution as part of parallel
> > > GA. Need to decide on the corresponding strategies we are going to
> > > incorporate.
> >
> > The discussion is still about the "administrative" question of whether
> > any of this should be implemented in the "Commons" project...
> >
> > Did you ask "Spark" people about their opinion about it?
> >
> > As I said, if you are confident that you can bring our implementation to
> > a state where it can be used in real-life (performance-wise) applications,
> > then you should demonstrate it (in order to convince other people from
> > the Commons PMC that it is worth engaging in long-term maintenance).
> > AFAICT, a way to do it would be to create a GitHub project (aimed at
> > becoming a new "machine learning" component, or a maven/JPMS
> > module within Commons Math).
> >
> > Best regards,
> > Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Vote] Create a "machine learning" component

Posted by Avijit Basak <av...@gmail.com>.

Hi

          > Did you ask "Spark" people about their opinion about it?
            -- Not yet. I am not sure what would be the right option for
this communication. It will be good if you can approach them.
          > where it can be used in real-life (performance-wise)
applications, then you should demonstrate it
            -- Do we have any kind of performance benchmark or use case
regarding this? Once that is decided, then I can proceed with this.


Thanks & Regards
--Avijit Basak

On Mon, 19 Apr 2021 at 18:51, Gilles Sadowski <gi...@gmail.com> wrote:

> Hello.
>
> Le lun. 19 avr. 2021 à 08:35, Avijit Basak <av...@gmail.com> a
> écrit :
> >
> > Hi
> >
> > >Isn't a GA inherently parallel?
> > >If so, why not take advantage of the concurrency tools provided by the
> JDK?
> >   -- Are we planning to implement multi-threading for GA operations even
> as
> > part of a single population
>
> This seems an obvious improvement to our current implementation
> (in case a chromosome's evaluation is not population-dependent).
>
> > or only for multi-population parallel GA.
> >   -- We can implement different types of co-evolution as part of parallel
> > GA. Need to decide on the corresponding strategies we are going to
> > incorporate.
>
> The discussion is still about the "administrative" question of whether
> any of this should be implemented in the "Commons" project...
>
> Did you ask "Spark" people about their opinion about it?
>
> As I said, if you are confident that you can bring our implementation to
> a state where it can be used in real-life (performance-wise) applications,
> then you should demonstrate it (in order to convince other people from
> the Commons PMC that it is worth engaging in long-term maintenance).
> AFAICT, a way to do it would be to create a GitHub project (aimed at
> becoming a new "machine learning" component, or a maven/JPMS
> module within Commons Math).
>
> Best regards,
> Gilles
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

-- 
Avijit Basak

Re: [Vote] Create a "machine learning" component

Posted by Gilles Sadowski <gi...@gmail.com>.

Hello.

Le lun. 19 avr. 2021 à 08:35, Avijit Basak <av...@gmail.com> a écrit :
>
> Hi
>
> >Isn't a GA inherently parallel?
> >If so, why not take advantage of the concurrency tools provided by the JDK?
>   -- Are we planning to implement multi-threading for GA operations even as
> part of a single population

This seems an obvious improvement to our current implementation
(in case a chromosome's evaluation is not population-dependent).

> or only for multi-population parallel GA.
>   -- We can implement different types of co-evolution as part of parallel
> GA. Need to decide on the corresponding strategies we are going to
> incorporate.

The discussion is still about the "administrative" question of whether
any of this should be implemented in the "Commons" project...

Did you ask "Spark" people about their opinion about it?

As I said, if you are confident that you can bring our implementation to
a state where it can be used in real-life (performance-wise) applications,
then you should demonstrate it (in order to convince other people from
the Commons PMC that it is worth engaging in long-term maintenance).
AFAICT, a way to do it would be to create a GitHub project (aimed at
becoming a new "machine learning" component, or a maven/JPMS
module within Commons Math).

Best regards,
Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Vote] Create a "machine learning" component

Posted by Avijit Basak <av...@gmail.com>.

Hi

>Isn't a GA inherently parallel?
>If so, why not take advantage of the concurrency tools provided by the JDK?
  -- Are we planning to implement multi-threading for GA operations even as
part of a single population or only for multi-population parallel GA.
  -- We can implement different types of co-evolution as part of parallel
GA. Need to decide on the corresponding strategies we are going to
incorporate.

Thanks & Regards
--Avijit Basak

On Wed, 14 Apr 2021 at 05:53, Gilles Sadowski <gi...@gmail.com> wrote:

> Le mar. 13 avr. 2021 à 18:21, Avijit Basak <av...@gmail.com> a
> écrit :
> >
> > Hi
> >
> >           Please find my comments below.
> >
> > >> I don't follow the distinction "prod" vs "non-prod".
> >      -- Actually in Prod we really need a very high performing system. So
> > use of implicit parallelism in spark would help us to achieve it. But for
> > other types of work like POC or R&D we may not need such performance.
>
> Isn't a GA inherently parallel?
> If so, why not take advantage of the concurrency tools provided by the JDK?
>
> > >> the question was actually whether you are willing to modularize CM
> >      -- I am not much aware of other ml components in commons. I would
> look
> > into it.
>
> I've mentioned them in earlier messages:
>  * Self-organizing feature map (artificial neural net)
>  * Clustering
>
> The former is multi-threaded; the latter should be refactored to
> take advantage of multi-threading.
>
> > >>You did not expand about the usability/performance (e.g. the issue of
> > multi-threading)
> >      -- Are we planning to incorporate parallel GA.
>
> Aren't you?
>
> > Then multi-threading
> > would be a more appropriate option.
>
> IMHO, a necessary one.
>
> > >> So, as a way forward, I would suggest that you create a project on
> > GitHub (copying all the settings from a *Commons modular* component,
> such as
> > "Commons Numbers")
> >      -- Could you kindly share the GitHub repository URL for any Commons
> > modular component.
>
> https://github.com/apache/commons-rng
> https://github.com/apache/commons-numbers
> https://github.com/apache/commons-geometry
> https://github.com/apache/commons-statistics
>
> >
> > Thanks & Regards
> > --Avijit Basak
> >
> >
> > On Tue, 13 Apr 2021 at 18:29, Gilles Sadowski <gi...@gmail.com>
> wrote:
> >
> > > Hello.
> > >
> > > Le lun. 12 avr. 2021 à 17:21, Avijit Basak <av...@gmail.com> a
> > > écrit :
> > > >
> > > > Hi
> > > >
> > > >          Sorry for the delayed response. Thanks for your patience.
> Please
> > > > find my comments below:
> > > >
> > > >  (1) Why not Spark?  [At least post over there (?).]
> > > >       --We can move to Spark. But it will be very much useful if the
> > > things
> > > > can also run without Spark. The use of Spark would make more sense
> in a
> > > > production environment. But the portability of the library will be
> more
> > > > useful for the non-prod environment.
> > >
> > > I don't follow the distinction "prod" vs "non-prod".
> > >
> > > > Definitely, we can reach the Spark
> > > > team and query.
> > >
> > > That would be a good idea...
> > >
> > > >  (2) Further develop a monolithic CM?  [Who will do it?]
> > > >        --I can help with the upgrade of the existing library related
> to
> > > GA
> > > > functionality.
> > >
> > > Sure, but nobody is currently working on (2).
> > >
> > > >  (3) Modularize CM? [Who will do it?]
> > > >        --I can help with the upgrade of the existing library related
> to
> > > GA
> > > > functionality.
> > >
> > > I don't doubt it; but the question was actually whether you are willing
> > > to modularize CM (that is: in addition to, and before, contributing to
> > > the GA functionality).
> > >
> > > >  (4) New component (with another name) with the proposed contents?
> > > >        --This is the best option if permitted.
> > >
> > > Currently, only the two of us are in favour of this alternative.
> > >
> > > Nobody, by their action, is really in favour of any of the other
> > > alternatives.
> > > So, as a way forward, I would suggest that you create a project on
> GitHub
> > > (copying all the settings from a Commons modular component, such as
> > > "Commons Numbers"), to be eventually integrated here, once its
> potential
> > > has been demonstrated.
> > >
> > > >       The code which I have written can be reused with minor
> > > modifications.
> > > > So it won't take too much effort for this activity.
> > >
> > > You did not expand about the usability/performance (e.g. the issue of
> > > multi-threading)...
> > >
> > > Regards,
> > > Gilles
> > >
> > > >> [...]
> > >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

-- 
Avijit Basak

Re: [Vote] Create a "machine learning" component

Posted by Gilles Sadowski <gi...@gmail.com>.

Le mar. 13 avr. 2021 à 18:21, Avijit Basak <av...@gmail.com> a écrit :
>
> Hi
>
>           Please find my comments below.
>
> >> I don't follow the distinction "prod" vs "non-prod".
>      -- Actually in Prod we really need a very high performing system. So
> use of implicit parallelism in spark would help us to achieve it. But for
> other types of work like POC or R&D we may not need such performance.

Isn't a GA inherently parallel?
If so, why not take advantage of the concurrency tools provided by the JDK?

> >> the question was actually whether you are willing to modularize CM
>      -- I am not much aware of other ml components in commons. I would look
> into it.

I've mentioned them in earlier messages:
 * Self-organizing feature map (artificial neural net)
 * Clustering

The former is multi-threaded; the latter should be refactored to
take advantage of multi-threading.

> >>You did not expand about the usability/performance (e.g. the issue of
> multi-threading)
>      -- Are we planning to incorporate parallel GA.

Aren't you?

> Then multi-threading
> would be a more appropriate option.

IMHO, a necessary one.

> >> So, as a way forward, I would suggest that you create a project on
> GitHub (copying all the settings from a *Commons modular* component, such as
> "Commons Numbers")
>      -- Could you kindly share the GitHub repository URL for any Commons
> modular component.

https://github.com/apache/commons-rng
https://github.com/apache/commons-numbers
https://github.com/apache/commons-geometry
https://github.com/apache/commons-statistics

>
> Thanks & Regards
> --Avijit Basak
>
>
> On Tue, 13 Apr 2021 at 18:29, Gilles Sadowski <gi...@gmail.com> wrote:
>
> > Hello.
> >
> > Le lun. 12 avr. 2021 à 17:21, Avijit Basak <av...@gmail.com> a
> > écrit :
> > >
> > > Hi
> > >
> > >          Sorry for the delayed response. Thanks for your patience. Please
> > > find my comments below:
> > >
> > >  (1) Why not Spark?  [At least post over there (?).]
> > >       --We can move to Spark. But it will be very much useful if the
> > things
> > > can also run without Spark. The use of Spark would make more sense in a
> > > production environment. But the portability of the library will be more
> > > useful for the non-prod environment.
> >
> > I don't follow the distinction "prod" vs "non-prod".
> >
> > > Definitely, we can reach the Spark
> > > team and query.
> >
> > That would be a good idea...
> >
> > >  (2) Further develop a monolithic CM?  [Who will do it?]
> > >        --I can help with the upgrade of the existing library related to
> > GA
> > > functionality.
> >
> > Sure, but nobody is currently working on (2).
> >
> > >  (3) Modularize CM? [Who will do it?]
> > >        --I can help with the upgrade of the existing library related to
> > GA
> > > functionality.
> >
> > I don't doubt it; but the question was actually whether you are willing
> > to modularize CM (that is: in addition to, and before, contributing to
> > the GA functionality).
> >
> > >  (4) New component (with another name) with the proposed contents?
> > >        --This is the best option if permitted.
> >
> > Currently, only the two of us are in favour of this alternative.
> >
> > Nobody, by their action, is really in favour of any of the other
> > alternatives.
> > So, as a way forward, I would suggest that you create a project on GitHub
> > (copying all the settings from a Commons modular component, such as
> > "Commons Numbers"), to be eventually integrated here, once its potential
> > has been demonstrated.
> >
> > >       The code which I have written can be reused with minor
> > modifications.
> > > So it won't take too much effort for this activity.
> >
> > You did not expand about the usability/performance (e.g. the issue of
> > multi-threading)...
> >
> > Regards,
> > Gilles
> >
> > >> [...]
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Vote] Create a "machine learning" component

Posted by Avijit Basak <av...@gmail.com>.

Hi

          Please find my comments below.

>> I don't follow the distinction "prod" vs "non-prod".
     -- Actually in Prod we really need a very high performing system. So
use of implicit parallelism in spark would help us to achieve it. But for
other types of work like POC or R&D we may not need such performance.
>> the question was actually whether you are willing to modularize CM
     -- I am not much aware of other ml components in commons. I would look
into it.
>>You did not expand about the usability/performance (e.g. the issue of
multi-threading)
     -- Are we planning to incorporate parallel GA. Then multi-threading
would be a more appropriate option.
>> So, as a way forward, I would suggest that you create a project on
GitHub (copying all the settings from a *Commons modular* component, such as
"Commons Numbers")
     -- Could you kindly share the GitHub repository URL for any Commons
modular component.

Thanks & Regards
--Avijit Basak


On Tue, 13 Apr 2021 at 18:29, Gilles Sadowski <gi...@gmail.com> wrote:

> Hello.
>
> Le lun. 12 avr. 2021 à 17:21, Avijit Basak <av...@gmail.com> a
> écrit :
> >
> > Hi
> >
> >          Sorry for the delayed response. Thanks for your patience. Please
> > find my comments below:
> >
> >  (1) Why not Spark?  [At least post over there (?).]
> >       --We can move to Spark. But it will be very much useful if the
> things
> > can also run without Spark. The use of Spark would make more sense in a
> > production environment. But the portability of the library will be more
> > useful for the non-prod environment.
>
> I don't follow the distinction "prod" vs "non-prod".
>
> > Definitely, we can reach the Spark
> > team and query.
>
> That would be a good idea...
>
> >  (2) Further develop a monolithic CM?  [Who will do it?]
> >        --I can help with the upgrade of the existing library related to
> GA
> > functionality.
>
> Sure, but nobody is currently working on (2).
>
> >  (3) Modularize CM? [Who will do it?]
> >        --I can help with the upgrade of the existing library related to
> GA
> > functionality.
>
> I don't doubt it; but the question was actually whether you are willing
> to modularize CM (that is: in addition to, and before, contributing to
> the GA functionality).
>
> >  (4) New component (with another name) with the proposed contents?
> >        --This is the best option if permitted.
>
> Currently, only the two of us are in favour of this alternative.
>
> Nobody, by their action, is really in favour of any of the other
> alternatives.
> So, as a way forward, I would suggest that you create a project on GitHub
> (copying all the settings from a Commons modular component, such as
> "Commons Numbers"), to be eventually integrated here, once its potential
> has been demonstrated.
>
> >       The code which I have written can be reused with minor
> modifications.
> > So it won't take too much effort for this activity.
>
> You did not expand about the usability/performance (e.g. the issue of
> multi-threading)...
>
> Regards,
> Gilles
>
> >> [...]
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

-- 
Avijit Basak

Re: [Vote] Create a "machine learning" component

Posted by Gilles Sadowski <gi...@gmail.com>.

Hello.

Le lun. 12 avr. 2021 à 17:21, Avijit Basak <av...@gmail.com> a écrit :
>
> Hi
>
>          Sorry for the delayed response. Thanks for your patience. Please
> find my comments below:
>
>  (1) Why not Spark?  [At least post over there (?).]
>       --We can move to Spark. But it will be very much useful if the things
> can also run without Spark. The use of Spark would make more sense in a
> production environment. But the portability of the library will be more
> useful for the non-prod environment.

I don't follow the distinction "prod" vs "non-prod".

> Definitely, we can reach the Spark
> team and query.

That would be a good idea...

>  (2) Further develop a monolithic CM?  [Who will do it?]
>        --I can help with the upgrade of the existing library related to GA
> functionality.

Sure, but nobody is currently working on (2).

>  (3) Modularize CM? [Who will do it?]
>        --I can help with the upgrade of the existing library related to GA
> functionality.

I don't doubt it; but the question was actually whether you are willing
to modularize CM (that is: in addition to, and before, contributing to
the GA functionality).

>  (4) New component (with another name) with the proposed contents?
>        --This is the best option if permitted.

Currently, only the two of us are in favour of this alternative.

Nobody, by their action, is really in favour of any of the other alternatives.
So, as a way forward, I would suggest that you create a project on GitHub
(copying all the settings from a Commons modular component, such as
"Commons Numbers"), to be eventually integrated here, once its potential
has been demonstrated.

>       The code which I have written can be reused with minor modifications.
> So it won't take too much effort for this activity.

You did not expand about the usability/performance (e.g. the issue of
multi-threading)...

Regards,
Gilles

>> [...]

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Vote] Create a "machine learning" component

Posted by Avijit Basak <av...@gmail.com>.

Hi

         Sorry for the delayed response. Thanks for your patience. Please
find my comments below:

 (1) Why not Spark?  [At least post over there (?).]
      --We can move to Spark. But it will be very much useful if the things
can also run without Spark. The use of Spark would make more sense in a
production environment. But the portability of the library will be more
useful for the non-prod environment. Definitely, we can reach the Spark
team and query.
 (2) Further develop a monolithic CM?  [Who will do it?]
       --I can help with the upgrade of the existing library related to GA
functionality.
 (3) Modularize CM? [Who will do it?]
       --I can help with the upgrade of the existing library related to GA
functionality.
 (4) New component (with another name) with the proposed contents?
       --This is the best option if permitted.

      The code which I have written can be reused with minor modifications.
So it won't take too much effort for this activity.
      Kindly share further thoughts.

Thanks & Regards
--Avijit Basak


On Sun, 14 Feb 2021 at 19:56, Gilles Sadowski <gi...@gmail.com> wrote:

> Le dim. 14 févr. 2021 à 09:06, Avijit Basak <av...@gmail.com> a
> écrit :
> >
> > Hi
> >
> >        I would like to mention a few points here. Genetic Algorithm has a
> > vast range of applications in optimization and search problems. Machine
> > learning is only one of those.
> >        If we couple the new GA library with any specific domain like ml
> it
> > would be meaningless for people working in other domains.
>
> Isn't "meaningless" a slight overstatement?
> We might have an issue of terminology: There is no necessary "coupling"
> but maybe "acquaintance" (for lack of a better word), as a set of tools
> that
> might come in handy for solving certain types of problems.  [For example,
> the Traveling Salesman Problem can be tackled by GA and SOFM, both
> of which are candidate for inclusion in the new component, although they
> don't share any code.]
>
> If the name "machine learning" is not the most appropriate one to convey
> the intended scope, do you have another idea?
> ["AI" would perhaps be more correct if we consider a strict hierarchy, but
> would obviously be far too presumptuous.]
>
> > They have to
> > incorporate the entire ml library
>
> No, they won't.  Given the stated goal of "modularity": the "ga" module
> will be available as a dedicated JAR (possibly with a dependency to
> codes that can be reused in other modules provided by the component).
>
> > which may be completely unrelated to
> > their project. Coupling it with any technology like spark might also
> limit
> > it's usability.
>
> You may be right; I have no idea about the "restrictions" imposed by
> Spark.  [It seems that in this case, one would have to indeed depend
> on Spark's "mllib" (?).  This would be one reason, as I already stated,
> for having something in "Commons".]
>
> Could you elaborate on a concrete use-case where one would be
> starting to develop an application with the specific requirement that
> Spark could not be used?
> In particular, IIRC Spark has multi-threading built in.  Don't you see
> it as a huge problem that CM would not provide such a feature?
>
> >        If a separate component is not approved for this change then we
> can
> > incorporate the changes as part of *commons.math* library.
>
> Of course, if somebody wants to do that, he's welcome.
> [That will not be me, for all the reasons which I've explained.  In the
> last
> 5 years I've been pretty much alone in handling bug reports about CM;
> I'm unwilling to assume implicit support for even more codes.]
>
> Also, with this solution, you'd now be willing to accept what you weren't
> above: Anyone wanting to use the GA functionality would indeed have to
> "incorporate" the whole of "Commons Math" (CM).
> Of course, the latter could be modularized, but this will only mitigate the
> issue, as any release of the GA functionality will potentially be then held
> off by potential issues in other parts of CM (which nobody has been able
> to consistently support for more than 5 years now).
>
> >        The same library can be reused in ml or neural network libraries
> as
> > a dependency.
>
> It is the other way around:  The development version of CM currently
> depends on "lower-level" components.
> Furthermore, right now its (embryonic) "machine learning" functionality
> hasn't any substantial dependency on codes outside the "o.a.c.math4.ml"
> package.
>
> >        Kindly share further views on this.
>
> In summary, to be clarified:
>  (1) Why not Spark?  [At least post over there (?).]
>  (2) Further develop a monolithic CM?  [Who will do it?]
>  (3) Modularize CM? [Who will do it?]
>  (4) New component (with another name) with the proposed contents?
>
> To make things clear from my side:  As a *user*, I've currently some
> stake at having a clean, independent "ml" component or an independent
> "sofm" module.  So I could do (4).  Or help with (3), on the condition that
> *other* people get things moving.
>
> Regards,
> Gilles
>
> >
> > Thanks & Regards
> > --Avijit Basak
> >
> > On Wed, 10 Feb 2021 at 19:49, Gilles Sadowski <gi...@gmail.com>
> wrote:
> >
> > > Le mer. 10 févr. 2021 à 13:19, sebb <se...@gmail.com> a écrit :
> > > >
> > > > Likewise, commons-ml is too cryptic.
> > > >
> > > > Also, the Spark project has a machine-learning library:
> > > >
> > > > https://spark.apache.org/mllib/
> > >
> > > Thanks for the pointer.
> > >
> > > >
> > > > Maybe that would be better home?
> > >
> > > On the face of it, probably.
> > > [For sure, Avijit should comment on the suggestion.]
> > >
> > > On the other hand, "Commons" is the place where one can pick "bare
> > > bone" implementations, and add the functionality to one's application
> > > without necessarily comply with an overarching framework.
> > > [I don't mean that framework compliance is bad; quite the contrary, it
> is
> > > hopefully the result of a thorough reflection by experts.  But ... cf.
> the
> > > numerous "no-dependency" discussions ...]
> > >
> > > Actually, concerning Avijit's proposed contribution, didn't I say:[1]
> > > ---CUT---
> > > Thus, I think that we must assess whether the "genetic algorithms"
> > > functionality has a reasonable future within "Apache Commons" (i.e.
> > > potential users and contributors) while there exist other libraries
> that
> > > seem much more advanced for any serious usage.
> > > ---CUT---
> > >
> > > > I'm also a bit concerned as to whether there are sufficient
> developers
> > > > here with knowledge of the ML domain to be able to support the code
> in
> > > > the future.
> > >
> > > An interesting point; by all means not a new one (see e.g. [2]).
> > >
> > > Isn't it the same point I've been making about "Commons Math" (CM)?
> > > There has been no releases because nobody here is able (or is willing
> > > to) support it.
> > >
> > > Concerning the support of the purported "machinelearning" component:
> > > 1. Package
> > >         org.apache.commons.math4.ml.neuralnet
> > >     * I've written it entirely and I have applications that depend on
> it
> > > (and I
> > >       cannot assume that I could easily switch to, or port it to,
> Spark),
> > > so I
> > >       can reasonably ensure that it would be supported.
> > > 2. Package
> > >         org.apache.commons.math4.ml.clustering
> > >     * Functionality is mentioned in Spark's "mllib" user guide.
> > >     * When a new feature was last contributed[3], it was
> noticed[4][5][6]
> > >       that improvement were needed (but there was no follow-up).
> > >     * I've an application that depend on it (from CM v3.6.1) but I
> wouldn't
> > >       support it if shipped in CM v4.0.
> > > 3. Package
> > >         org.apache.commons.math4.genetics
> > >     * Part of my "end-of-study" project consisted in a GA
> implementation.
> > >       I've never used the CM implementation, and I don't deny that
> there
> > >       could be perfectly fine uses of it but, just looking at the
> code, it
> > > seems
> > >       obvious that it cannot compete feature-wise with other libraries
> > > out there.
> > >     * I've suggested long ago that, without anyone supporting it
> actively
> > > (and
> > >       no known user community), it should be dropped from CM.
> > >     * Avijit expressed a willingness to improve the functionality:  Is
> > > this enough
> > >       for the PMC to create a new component?  From the experience with
> the
> > >       "clustering" package mentioned above, I'd tend to think
> > > (unfortunately)
> > >       that it isn't.  He should first explore whether the Spark
> community
> > > is
> > >       interested, that the GA functionality be moved over there.
> > >
> > > Gilles
> > >
> > > [1] https://issues.apache.org/jira/browse/MATH-1563
> > > [2] https://markmail.org/message/26yxj5vhysdsoety
> > > [3] https://issues.apache.org/jira/projects/MATH/issues/MATH-1509
> > > [4] https://issues.apache.org/jira/projects/MATH/issues/MATH-1524
> > > [5] https://issues.apache.org/jira/projects/MATH/issues/MATH-1528
> > > [6] https://issues.apache.org/jira/projects/MATH/issues/MATH-1526
> > >
> > > >
> > > > On Wed, 10 Feb 2021 at 08:27, Emmanuel Bourg <eb...@apache.org>
> wrote:
> > > > >
> > > > > -1 for commons-ml for the same reasons.
> > > > >
> > > > > What about commons-machine-learning or commons-math-learning? The
> > > latter
> > > > > is as long as commons-configuration.
> > > > >
> > > > > Emmanuel Bourg
> > > > >
> > > > >
> > > > > Le 2021-02-10 03:27, Ralph Goers a écrit :
> > > > > > -1 on commons-ml as the name. My first thought is such a repo
> would
> > > > > > hold stuff related to mailing lists. Then again maybe it contains
> > > > > > stuff relating to markup languages. Maybe it is Apache’s version
> of
> > > > > > the ML Programming Language [1].
> > > > > >
> > > > > > However, I wouldn’t be -1 on commons-math-ml, although at best I
> > > would
> > > > > > be +0 since it is still not obvious what it would contain.
> > > > > >
> > > > > > Ralph
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > > For additional commands, e-mail: dev-help@commons.apache.org
> > >
> > >
> >
> > --
> > Avijit Basak
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

-- 
Avijit Basak

Re: [Vote] Create a "machine learning" component

Posted by Gilles Sadowski <gi...@gmail.com>.

Le dim. 14 févr. 2021 à 09:06, Avijit Basak <av...@gmail.com> a écrit :
>
> Hi
>
>        I would like to mention a few points here. Genetic Algorithm has a
> vast range of applications in optimization and search problems. Machine
> learning is only one of those.
>        If we couple the new GA library with any specific domain like ml it
> would be meaningless for people working in other domains.

Isn't "meaningless" a slight overstatement?
We might have an issue of terminology: There is no necessary "coupling"
but maybe "acquaintance" (for lack of a better word), as a set of tools that
might come in handy for solving certain types of problems.  [For example,
the Traveling Salesman Problem can be tackled by GA and SOFM, both
of which are candidate for inclusion in the new component, although they
don't share any code.]

If the name "machine learning" is not the most appropriate one to convey
the intended scope, do you have another idea?
["AI" would perhaps be more correct if we consider a strict hierarchy, but
would obviously be far too presumptuous.]

> They have to
> incorporate the entire ml library

No, they won't.  Given the stated goal of "modularity": the "ga" module
will be available as a dedicated JAR (possibly with a dependency to
codes that can be reused in other modules provided by the component).

> which may be completely unrelated to
> their project. Coupling it with any technology like spark might also limit
> it's usability.

You may be right; I have no idea about the "restrictions" imposed by
Spark.  [It seems that in this case, one would have to indeed depend
on Spark's "mllib" (?).  This would be one reason, as I already stated,
for having something in "Commons".]

Could you elaborate on a concrete use-case where one would be
starting to develop an application with the specific requirement that
Spark could not be used?
In particular, IIRC Spark has multi-threading built in.  Don't you see
it as a huge problem that CM would not provide such a feature?

>        If a separate component is not approved for this change then we can
> incorporate the changes as part of *commons.math* library.

Of course, if somebody wants to do that, he's welcome.
[That will not be me, for all the reasons which I've explained.  In the last
5 years I've been pretty much alone in handling bug reports about CM;
I'm unwilling to assume implicit support for even more codes.]

Also, with this solution, you'd now be willing to accept what you weren't
above: Anyone wanting to use the GA functionality would indeed have to
"incorporate" the whole of "Commons Math" (CM).
Of course, the latter could be modularized, but this will only mitigate the
issue, as any release of the GA functionality will potentially be then held
off by potential issues in other parts of CM (which nobody has been able
to consistently support for more than 5 years now).

>        The same library can be reused in ml or neural network libraries as
> a dependency.

It is the other way around:  The development version of CM currently
depends on "lower-level" components.
Furthermore, right now its (embryonic) "machine learning" functionality
hasn't any substantial dependency on codes outside the "o.a.c.math4.ml"
package.

>        Kindly share further views on this.

In summary, to be clarified:
 (1) Why not Spark?  [At least post over there (?).]
 (2) Further develop a monolithic CM?  [Who will do it?]
 (3) Modularize CM? [Who will do it?]
 (4) New component (with another name) with the proposed contents?

To make things clear from my side:  As a *user*, I've currently some
stake at having a clean, independent "ml" component or an independent
"sofm" module.  So I could do (4).  Or help with (3), on the condition that
*other* people get things moving.

Regards,
Gilles

>
> Thanks & Regards
> --Avijit Basak
>
> On Wed, 10 Feb 2021 at 19:49, Gilles Sadowski <gi...@gmail.com> wrote:
>
> > Le mer. 10 févr. 2021 à 13:19, sebb <se...@gmail.com> a écrit :
> > >
> > > Likewise, commons-ml is too cryptic.
> > >
> > > Also, the Spark project has a machine-learning library:
> > >
> > > https://spark.apache.org/mllib/
> >
> > Thanks for the pointer.
> >
> > >
> > > Maybe that would be better home?
> >
> > On the face of it, probably.
> > [For sure, Avijit should comment on the suggestion.]
> >
> > On the other hand, "Commons" is the place where one can pick "bare
> > bone" implementations, and add the functionality to one's application
> > without necessarily comply with an overarching framework.
> > [I don't mean that framework compliance is bad; quite the contrary, it is
> > hopefully the result of a thorough reflection by experts.  But ... cf. the
> > numerous "no-dependency" discussions ...]
> >
> > Actually, concerning Avijit's proposed contribution, didn't I say:[1]
> > ---CUT---
> > Thus, I think that we must assess whether the "genetic algorithms"
> > functionality has a reasonable future within "Apache Commons" (i.e.
> > potential users and contributors) while there exist other libraries that
> > seem much more advanced for any serious usage.
> > ---CUT---
> >
> > > I'm also a bit concerned as to whether there are sufficient developers
> > > here with knowledge of the ML domain to be able to support the code in
> > > the future.
> >
> > An interesting point; by all means not a new one (see e.g. [2]).
> >
> > Isn't it the same point I've been making about "Commons Math" (CM)?
> > There has been no releases because nobody here is able (or is willing
> > to) support it.
> >
> > Concerning the support of the purported "machinelearning" component:
> > 1. Package
> >         org.apache.commons.math4.ml.neuralnet
> >     * I've written it entirely and I have applications that depend on it
> > (and I
> >       cannot assume that I could easily switch to, or port it to, Spark),
> > so I
> >       can reasonably ensure that it would be supported.
> > 2. Package
> >         org.apache.commons.math4.ml.clustering
> >     * Functionality is mentioned in Spark's "mllib" user guide.
> >     * When a new feature was last contributed[3], it was noticed[4][5][6]
> >       that improvement were needed (but there was no follow-up).
> >     * I've an application that depend on it (from CM v3.6.1) but I wouldn't
> >       support it if shipped in CM v4.0.
> > 3. Package
> >         org.apache.commons.math4.genetics
> >     * Part of my "end-of-study" project consisted in a GA implementation.
> >       I've never used the CM implementation, and I don't deny that there
> >       could be perfectly fine uses of it but, just looking at the code, it
> > seems
> >       obvious that it cannot compete feature-wise with other libraries
> > out there.
> >     * I've suggested long ago that, without anyone supporting it actively
> > (and
> >       no known user community), it should be dropped from CM.
> >     * Avijit expressed a willingness to improve the functionality:  Is
> > this enough
> >       for the PMC to create a new component?  From the experience with the
> >       "clustering" package mentioned above, I'd tend to think
> > (unfortunately)
> >       that it isn't.  He should first explore whether the Spark community
> > is
> >       interested, that the GA functionality be moved over there.
> >
> > Gilles
> >
> > [1] https://issues.apache.org/jira/browse/MATH-1563
> > [2] https://markmail.org/message/26yxj5vhysdsoety
> > [3] https://issues.apache.org/jira/projects/MATH/issues/MATH-1509
> > [4] https://issues.apache.org/jira/projects/MATH/issues/MATH-1524
> > [5] https://issues.apache.org/jira/projects/MATH/issues/MATH-1528
> > [6] https://issues.apache.org/jira/projects/MATH/issues/MATH-1526
> >
> > >
> > > On Wed, 10 Feb 2021 at 08:27, Emmanuel Bourg <eb...@apache.org> wrote:
> > > >
> > > > -1 for commons-ml for the same reasons.
> > > >
> > > > What about commons-machine-learning or commons-math-learning? The
> > latter
> > > > is as long as commons-configuration.
> > > >
> > > > Emmanuel Bourg
> > > >
> > > >
> > > > Le 2021-02-10 03:27, Ralph Goers a écrit :
> > > > > -1 on commons-ml as the name. My first thought is such a repo would
> > > > > hold stuff related to mailing lists. Then again maybe it contains
> > > > > stuff relating to markup languages. Maybe it is Apache’s version of
> > > > > the ML Programming Language [1].
> > > > >
> > > > > However, I wouldn’t be -1 on commons-math-ml, although at best I
> > would
> > > > > be +0 since it is still not obvious what it would contain.
> > > > >
> > > > > Ralph
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > For additional commands, e-mail: dev-help@commons.apache.org
> >
> >
>
> --
> Avijit Basak

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: The case for a Commons component

Posted by Gilles Sadowski <gi...@gmail.com>.

Le jeu. 6 mai 2021 à 20:29, Oliver Heger
<ol...@oliver-heger.de> a écrit :
>
>
>
> Am 05.05.21 um 21:54 schrieb Gilles Sadowski:
> > Le mer. 5 mai 2021 à 20:33, Oliver Heger
> > <ol...@oliver-heger.de> a écrit :
> >>
> >>
> >>
> >> Am 05.05.21 um 20:26 schrieb Gilles Sadowski:
> >>> Le mer. 5 mai 2021 à 18:57, Gary Gregory <ga...@gmail.com> a écrit :
> >>>>
> >>>> IMO the lack of +1s shows the lack of appetite to manage another component
> >>>
> >>> That's certainly true.
> >>> And nobody is forced to do anything.
> >>>
> >>> When the other CM spin-offs started, there was only _one_ person
> >>> willing to do the work.
> >>
> >> What about the sandbox? IIUC, every committer can start a new component
> >> there. If then a community forms around this component, it can move to
> >> proper (which would then require a vote).
> >>
> >> Would this be an option to get started?
> >
> > [Graph] is listed in the sandbox[1], yet when someone expressed a willingness
> > to contribute, we had a "git" repository created[2] (even though the
> > web site has
> > remained outdated[3], probably because the attempt was short-lived).
> >
> > So indeed, I could have already created the repository a few weeks ago...
> >
> > However in this instance, what would it mean to have codes that have lived
> > within a "proper" component for 6 years and more be moved to "sandbox"?
>
> A way to move forward?

Thanks for trying to be contructive (and a decent tone).

I've been told that I should learn to count; that the vote (to
create a repository) has failed.
Hence that option has also been ruled out.  [What was OK for
[Graph] in sandbox, somehow is not anymore.  Go figure...]

Gilles

>
> Oliver
>
> >
> > Regards,
> > Gilles
> >
> > [1] http://commons.apache.org/sandbox/commons-graph/
> > [2] https://gitbox.apache.org/repos/asf?p=commons-graph.git
> > [3] http://commons.apache.org/sandbox/commons-graph/source-repository.html

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: The case for a Commons component

Posted by Oliver Heger <ol...@oliver-heger.de>.


Am 05.05.21 um 21:54 schrieb Gilles Sadowski:
> Le mer. 5 mai 2021 à 20:33, Oliver Heger
> <ol...@oliver-heger.de> a écrit :
>>
>>
>>
>> Am 05.05.21 um 20:26 schrieb Gilles Sadowski:
>>> Le mer. 5 mai 2021 à 18:57, Gary Gregory <ga...@gmail.com> a écrit :
>>>>
>>>> IMO the lack of +1s shows the lack of appetite to manage another component
>>>
>>> That's certainly true.
>>> And nobody is forced to do anything.
>>>
>>> When the other CM spin-offs started, there was only _one_ person
>>> willing to do the work.
>>
>> What about the sandbox? IIUC, every committer can start a new component
>> there. If then a community forms around this component, it can move to
>> proper (which would then require a vote).
>>
>> Would this be an option to get started?
> 
> [Graph] is listed in the sandbox[1], yet when someone expressed a willingness
> to contribute, we had a "git" repository created[2] (even though the
> web site has
> remained outdated[3], probably because the attempt was short-lived).
> 
> So indeed, I could have already created the repository a few weeks ago...
> 
> However in this instance, what would it mean to have codes that have lived
> within a "proper" component for 6 years and more be moved to "sandbox"?

A way to move forward?

Oliver

> 
> Regards,
> Gilles
> 
> [1] http://commons.apache.org/sandbox/commons-graph/
> [2] https://gitbox.apache.org/repos/asf?p=commons-graph.git
> [3] http://commons.apache.org/sandbox/commons-graph/source-repository.html
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: The case for a Commons component

Posted by Ralph Goers <ra...@dslextreme.com>.

I’ll be nice an summarize. Giles started two vote threads. The first was polluted with discussion and eventually closed. The second has not passed and is effectively dead but Giles hasn’t closed the vote.

So nothing has been approved.

Ralph

> On May 14, 2021, at 5:48 AM, Gary Gregory <ga...@gmail.com> wrote:
> 
> Are seriously asking someone else to read through 40 emails and summarize
> for you? Perhaps part of your contribution might be to do this yourself?
> 
> Gary
> 
> On Fri, May 14, 2021, 08:15 Avijit Basak <av...@gmail.com> wrote:
> 
>> Hi All
>> 
>>        This has been a long mail thread. It will be really helpful if
>> anyone can summarize the decisions.
>>        Is the proposal of developing the new machine learning component
>> approved?
>>        If the team repository is not provided is there any way to go
>> ahead?
>>        Waiting for a response.
>> 
>> Thanks & Regards
>> --Avijit Basak
>> 
>> On Fri, 7 May 2021 at 02:26, sebb <se...@gmail.com> wrote:
>> 
>>> On Thu, 6 May 2021 at 21:13, Gary Gregory <ga...@gmail.com>
>> wrote:
>>>> 
>>>> It is true that there much less friction these days to get a repository
>>>> going with GitHub, GitLab, and BitBucket, but, for now, the Commons
>>> Sandbox
>>>> is still available. If we want to do away with the sandbox, then let's
>>>> talk about that separately.
>>>> 
>>> 
>>> There is no need for a Sandbox component to use SVN, and it's easy to
>>> create a new Commons git repo.
>>> 
>>> A non-ASF code repo would require code to be checked for license
>>> compliance etc before it could become a Commons component.
>>> A Sandbox component does not require that.
>>> 
>>>> Gary
>>>> 
>>>> On Thu, May 6, 2021, 11:26 Ralph Goers <ra...@dslextreme.com>
>>> wrote:
>>>> 
>>>>> 
>>>>> 
>>>>>> On May 6, 2021, at 8:06 AM, Gary Gregory <ga...@gmail.com>
>>> wrote:
>>>>>> 
>>>>>> What about the Commons Sandox? Would that be a good place to start?
>>>>>> 
>>>>> 
>>>>> Emmanuel just sort of proposed doing away with it. As he put it,
>> anyone
>>>>> can create a
>>>>> GitHub repo so why does it need to be under the apache user.  He
>> hasn’t
>>>>> formally
>>>>> made a proposal for that and I’m not sure how I would vote on it if
>> he
>>>>> did. He does
>>>>> have a point. At the same time I’m not sure I’d close off doing
>>>>> experimental or
>>>>> early development within the ASF space.
>>>>> 
>>>>> Ralph
>>>>> 
>>>>> 
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>> 
>>>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>> 
>>> 
>> 
>> --
>> Avijit Basak
>> 



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: The case for a Commons component

Posted by Gary Gregory <ga...@gmail.com>.

Are seriously asking someone else to read through 40 emails and summarize
for you? Perhaps part of your contribution might be to do this yourself?

Gary

On Fri, May 14, 2021, 08:15 Avijit Basak <av...@gmail.com> wrote:

> Hi All
>
>         This has been a long mail thread. It will be really helpful if
> anyone can summarize the decisions.
>         Is the proposal of developing the new machine learning component
> approved?
>         If the team repository is not provided is there any way to go
> ahead?
>         Waiting for a response.
>
> Thanks & Regards
> --Avijit Basak
>
> On Fri, 7 May 2021 at 02:26, sebb <se...@gmail.com> wrote:
>
> > On Thu, 6 May 2021 at 21:13, Gary Gregory <ga...@gmail.com>
> wrote:
> > >
> > > It is true that there much less friction these days to get a repository
> > > going with GitHub, GitLab, and BitBucket, but, for now, the Commons
> > Sandbox
> > > is still available. If we want to do away with the sandbox, then let's
> > > talk about that separately.
> > >
> >
> > There is no need for a Sandbox component to use SVN, and it's easy to
> > create a new Commons git repo.
> >
> > A non-ASF code repo would require code to be checked for license
> > compliance etc before it could become a Commons component.
> > A Sandbox component does not require that.
> >
> > > Gary
> > >
> > > On Thu, May 6, 2021, 11:26 Ralph Goers <ra...@dslextreme.com>
> > wrote:
> > >
> > > >
> > > >
> > > > > On May 6, 2021, at 8:06 AM, Gary Gregory <ga...@gmail.com>
> > wrote:
> > > > >
> > > > > What about the Commons Sandox? Would that be a good place to start?
> > > > >
> > > >
> > > > Emmanuel just sort of proposed doing away with it. As he put it,
> anyone
> > > > can create a
> > > > GitHub repo so why does it need to be under the apache user.  He
> hasn’t
> > > > formally
> > > > made a proposal for that and I’m not sure how I would vote on it if
> he
> > > > did. He does
> > > > have a point. At the same time I’m not sure I’d close off doing
> > > > experimental or
> > > > early development within the ASF space.
> > > >
> > > > Ralph
> > > >
> > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > > > For additional commands, e-mail: dev-help@commons.apache.org
> > > >
> > > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > For additional commands, e-mail: dev-help@commons.apache.org
> >
> >
>
> --
> Avijit Basak
>

Re: The case for a Commons component

Posted by Avijit Basak <av...@gmail.com>.

Hi All

        This has been a long mail thread. It will be really helpful if
anyone can summarize the decisions.
        Is the proposal of developing the new machine learning component
approved?
        If the team repository is not provided is there any way to go ahead?
        Waiting for a response.

Thanks & Regards
--Avijit Basak

On Fri, 7 May 2021 at 02:26, sebb <se...@gmail.com> wrote:

> On Thu, 6 May 2021 at 21:13, Gary Gregory <ga...@gmail.com> wrote:
> >
> > It is true that there much less friction these days to get a repository
> > going with GitHub, GitLab, and BitBucket, but, for now, the Commons
> Sandbox
> > is still available. If we want to do away with the sandbox, then let's
> > talk about that separately.
> >
>
> There is no need for a Sandbox component to use SVN, and it's easy to
> create a new Commons git repo.
>
> A non-ASF code repo would require code to be checked for license
> compliance etc before it could become a Commons component.
> A Sandbox component does not require that.
>
> > Gary
> >
> > On Thu, May 6, 2021, 11:26 Ralph Goers <ra...@dslextreme.com>
> wrote:
> >
> > >
> > >
> > > > On May 6, 2021, at 8:06 AM, Gary Gregory <ga...@gmail.com>
> wrote:
> > > >
> > > > What about the Commons Sandox? Would that be a good place to start?
> > > >
> > >
> > > Emmanuel just sort of proposed doing away with it. As he put it, anyone
> > > can create a
> > > GitHub repo so why does it need to be under the apache user.  He hasn’t
> > > formally
> > > made a proposal for that and I’m not sure how I would vote on it if he
> > > did. He does
> > > have a point. At the same time I’m not sure I’d close off doing
> > > experimental or
> > > early development within the ASF space.
> > >
> > > Ralph
> > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > > For additional commands, e-mail: dev-help@commons.apache.org
> > >
> > >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

-- 
Avijit Basak

Re: The case for a Commons component

Posted by sebb <se...@gmail.com>.

On Thu, 6 May 2021 at 21:13, Gary Gregory <ga...@gmail.com> wrote:
>
> It is true that there much less friction these days to get a repository
> going with GitHub, GitLab, and BitBucket, but, for now, the Commons Sandbox
> is still available. If we want to do away with the sandbox, then let's
> talk about that separately.
>

There is no need for a Sandbox component to use SVN, and it's easy to
create a new Commons git repo.

A non-ASF code repo would require code to be checked for license
compliance etc before it could become a Commons component.
A Sandbox component does not require that.

> Gary
>
> On Thu, May 6, 2021, 11:26 Ralph Goers <ra...@dslextreme.com> wrote:
>
> >
> >
> > > On May 6, 2021, at 8:06 AM, Gary Gregory <ga...@gmail.com> wrote:
> > >
> > > What about the Commons Sandox? Would that be a good place to start?
> > >
> >
> > Emmanuel just sort of proposed doing away with it. As he put it, anyone
> > can create a
> > GitHub repo so why does it need to be under the apache user.  He hasn’t
> > formally
> > made a proposal for that and I’m not sure how I would vote on it if he
> > did. He does
> > have a point. At the same time I’m not sure I’d close off doing
> > experimental or
> > early development within the ASF space.
> >
> > Ralph
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > For additional commands, e-mail: dev-help@commons.apache.org
> >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: The case for a Commons component

Posted by Gary Gregory <ga...@gmail.com>.

It is true that there much less friction these days to get a repository
going with GitHub, GitLab, and BitBucket, but, for now, the Commons Sandbox
is still available. If we want to do away with the sandbox, then let's
talk about that separately.

Gary

On Thu, May 6, 2021, 11:26 Ralph Goers <ra...@dslextreme.com> wrote:

>
>
> > On May 6, 2021, at 8:06 AM, Gary Gregory <ga...@gmail.com> wrote:
> >
> > What about the Commons Sandox? Would that be a good place to start?
> >
>
> Emmanuel just sort of proposed doing away with it. As he put it, anyone
> can create a
> GitHub repo so why does it need to be under the apache user.  He hasn’t
> formally
> made a proposal for that and I’m not sure how I would vote on it if he
> did. He does
> have a point. At the same time I’m not sure I’d close off doing
> experimental or
> early development within the ASF space.
>
> Ralph
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

Re: The case for a Commons component

Posted by Ralph Goers <ra...@dslextreme.com>.

> On May 6, 2021, at 8:06 AM, Gary Gregory <ga...@gmail.com> wrote:
> 
> What about the Commons Sandox? Would that be a good place to start?
> 

Emmanuel just sort of proposed doing away with it. As he put it, anyone can create a 
GitHub repo so why does it need to be under the apache user.  He hasn’t formally 
made a proposal for that and I’m not sure how I would vote on it if he did. He does 
have a point. At the same time I’m not sure I’d close off doing experimental or 
early development within the ASF space.

Ralph

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: The case for a Commons component

Posted by Gary Gregory <ga...@gmail.com>.

What about the Commons Sandox? Would that be a good place to start?

Gary

On Thu, May 6, 2021, 09:37 Gilles Sadowski <gi...@gmail.com> wrote:

> Le jeu. 6 mai 2021 à 14:48, Emmanuel Bourg <eb...@apache.org> a écrit :
> >
> > Le 2021-05-06 13:06, Gilles Sadowski a écrit :
> >
> > > It is not nice to decide for others what they may need.
> >
> > It is not nice to suggest I shouldn't voice my opinions.
>
> Your argued opinion is welcome.
> In the text which you cut, you *explicitly* said that I should
> go somewhere else (GitHub or whatever).
>
> >
> > > It would have been courteous to acknowledge the answers to
> > > your argument against having a dedicated component
> >
> > I've little appetite for lengthy debate with you again.
>
> There is/was no debate (as in: "an exchange of arguments" or
> "trying to get consensus" or "not forcing me to do what I think is
> bad"), you state your opinion (as mentioned above) and that's it.
>
> > > My rationale, for whether a specific component is needed, has
> > > always been the same: Define a scope (and stick to it).
> > > You seem to find this acceptable for any Commons project except
> > > those which you tagged as "math-related".
> >
> > The machine learning scope is too wide, it doesn't belong here.
>
> I agree that it is wide, but much less so than "math", yet you never
> voiced such an opinion against CM (while I did).
>
> > > So I'm asking: Will it make any difference if the "machine learning"
> > > codes are further developed within [Math]?  Concretely:
> > >  * Would you vote to release CM v4.0?
> > >  * Would you help (more than if the ML codes were in a
> > >    specific component) to review/merge the PRs?
> >
> > I'd would vote favorably for a modularized CM 4.0 release,
>
> I really (really, really) can't figure out how you can reconcile that a
> library (CM) that *contains* a ML subset which you deem too big
> to be a Commons component, is not too big to be a Commons
> component!
>
> The spin-offs from CM do solve the issue of "too wide scope" that
> doomed CM.
> And again: I agree that "machine learning" may be too wide a
> scope itself; grouping all such algorithms in a single component
> was already a compromise wrt to having each ML field in its own,
> especially if we aimed at some common goal (multi-threading) that
> could lead to shared code (not the math algorithms but, o.a. things,
> the threads management).
>
> > but I still
> > think that the math related components would be best served in their own
> > TLP with a dedicated community
>
> When this was brought up somewhat seriously, most of the
> PMC voted against.
> Then last time (IIRC) the idea was floated, there wasn't the
> minimum of people required to support a TLP.  [FTR, that was
> the practical reason these codes are here (as is the for all the
> other Commons components): a place where more people can
> contribute to otherwise orphaned libraries.]
>
> OK, then let's move on; thus I'm asking who in this PMC, is
> now willing to provide the necessary clearance for an internal
> fork of the math-related codes for which it is deemed that they
> are not a good fit for Commons?
>
> > free of the Apache Commons rules and
> > constraints.
>
> I'm still to be shown what rules I'd be asking to be free of.
>
> Gilles
>
> >
> > Emmanuel Bourg
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

Re: The case for a Commons component

Posted by Gilles Sadowski <gi...@gmail.com>.

Le jeu. 6 mai 2021 à 14:48, Emmanuel Bourg <eb...@apache.org> a écrit :
>
> Le 2021-05-06 13:06, Gilles Sadowski a écrit :
>
> > It is not nice to decide for others what they may need.
>
> It is not nice to suggest I shouldn't voice my opinions.

Your argued opinion is welcome.
In the text which you cut, you *explicitly* said that I should
go somewhere else (GitHub or whatever).

>
> > It would have been courteous to acknowledge the answers to
> > your argument against having a dedicated component
>
> I've little appetite for lengthy debate with you again.

There is/was no debate (as in: "an exchange of arguments" or
"trying to get consensus" or "not forcing me to do what I think is
bad"), you state your opinion (as mentioned above) and that's it.

> > My rationale, for whether a specific component is needed, has
> > always been the same: Define a scope (and stick to it).
> > You seem to find this acceptable for any Commons project except
> > those which you tagged as "math-related".
>
> The machine learning scope is too wide, it doesn't belong here.

I agree that it is wide, but much less so than "math", yet you never
voiced such an opinion against CM (while I did).

> > So I'm asking: Will it make any difference if the "machine learning"
> > codes are further developed within [Math]?  Concretely:
> >  * Would you vote to release CM v4.0?
> >  * Would you help (more than if the ML codes were in a
> >    specific component) to review/merge the PRs?
>
> I'd would vote favorably for a modularized CM 4.0 release,

I really (really, really) can't figure out how you can reconcile that a
library (CM) that *contains* a ML subset which you deem too big
to be a Commons component, is not too big to be a Commons
component!

The spin-offs from CM do solve the issue of "too wide scope" that
doomed CM.
And again: I agree that "machine learning" may be too wide a
scope itself; grouping all such algorithms in a single component
was already a compromise wrt to having each ML field in its own,
especially if we aimed at some common goal (multi-threading) that
could lead to shared code (not the math algorithms but, o.a. things,
the threads management).

> but I still
> think that the math related components would be best served in their own
> TLP with a dedicated community

When this was brought up somewhat seriously, most of the
PMC voted against.
Then last time (IIRC) the idea was floated, there wasn't the
minimum of people required to support a TLP.  [FTR, that was
the practical reason these codes are here (as is the for all the
other Commons components): a place where more people can
contribute to otherwise orphaned libraries.]

OK, then let's move on; thus I'm asking who in this PMC, is
now willing to provide the necessary clearance for an internal
fork of the math-related codes for which it is deemed that they
are not a good fit for Commons?

> free of the Apache Commons rules and
> constraints.

I'm still to be shown what rules I'd be asking to be free of.

Gilles

>
> Emmanuel Bourg
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: The case for a Commons component

Posted by Emmanuel Bourg <eb...@apache.org>.

Le 2021-05-06 13:06, Gilles Sadowski a écrit :

> It is not nice to decide for others what they may need.

It is not nice to suggest I shouldn't voice my opinions.


> It would have been courteous to acknowledge the answers to
> your argument against having a dedicated component

I've little appetite for lengthy debate with you again.


> My rationale, for whether a specific component is needed, has
> always been the same: Define a scope (and stick to it).
> You seem to find this acceptable for any Commons project except
> those which you tagged as "math-related".

The machine learning scope is too wide, it doesn't belong here.


> So I'm asking: Will it make any difference if the "machine learning"
> codes are further developed within [Math]?  Concretely:
>  * Would you vote to release CM v4.0?
>  * Would you help (more than if the ML codes were in a
>    specific component) to review/merge the PRs?

I'd would vote favorably for a modularized CM 4.0 release, but I still 
think that the math related components would be best served in their own 
TLP with a dedicated community free of the Apache Commons rules and 
constraints.

Emmanuel Bourg

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: The case for a Commons component

Posted by Gilles Sadowski <gi...@gmail.com>.

Le jeu. 6 mai 2021 à 02:24, Emmanuel Bourg <eb...@apache.org> a écrit :
>
> Le 2021-05-05 20:31, Oliver Heger a écrit :
>
> > What about the sandbox? IIUC, every committer can start a new
> > component there. If then a community forms around this component, it
> > can move to proper (which would then require a vote).
>
> With the various source hosting solutions available today we no longer
> need the sandbox, and I think we should discontinue this practice. The
> machine learning library could as well start its life on GitHub, it
> doesn't need Apache Commons.

It is not nice to decide for others what they may need.

It would have been courteous to acknowledge the answers to
your argument against having a dedicated component (to more
efficiently manage codes that have already been accepted within
the "Commons" project, as part of CM), and explain
 * why those answers would not make you withdraw your -1,
 * why the ASF would be better off without the offered contribution,
 * why some initiatives in Commons deserve a worse treatment
   than others.

My rationale, for whether a specific component is needed, has
always been the same: Define a scope (and stick to it).
You seem to find this acceptable for any Commons project except
those which you tagged as "math-related".

So I'm asking: Will it make any difference if the "machine learning"
codes are further developed within [Math]?  Concretely:
 * Would you vote to release CM v4.0?
 * Would you help (more than if the ML codes were in a
   specific component) to review/merge the PRs?

Gilles

>
> Emmanuel Bourg
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: The case for a Commons component

Posted by Emmanuel Bourg <eb...@apache.org>.

Le 2021-05-05 20:31, Oliver Heger a écrit :

> What about the sandbox? IIUC, every committer can start a new
> component there. If then a community forms around this component, it
> can move to proper (which would then require a vote).

With the various source hosting solutions available today we no longer 
need the sandbox, and I think we should discontinue this practice. The 
machine learning library could as well start its life on GitHub, it 
doesn't need Apache Commons.

Emmanuel Bourg

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: The case for a Commons component

Posted by Gilles Sadowski <gi...@gmail.com>.

Le mer. 5 mai 2021 à 20:33, Oliver Heger
<ol...@oliver-heger.de> a écrit :
>
>
>
> Am 05.05.21 um 20:26 schrieb Gilles Sadowski:
> > Le mer. 5 mai 2021 à 18:57, Gary Gregory <ga...@gmail.com> a écrit :
> >>
> >> IMO the lack of +1s shows the lack of appetite to manage another component
> >
> > That's certainly true.
> > And nobody is forced to do anything.
> >
> > When the other CM spin-offs started, there was only _one_ person
> > willing to do the work.
>
> What about the sandbox? IIUC, every committer can start a new component
> there. If then a community forms around this component, it can move to
> proper (which would then require a vote).
>
> Would this be an option to get started?

[Graph] is listed in the sandbox[1], yet when someone expressed a willingness
to contribute, we had a "git" repository created[2] (even though the
web site has
remained outdated[3], probably because the attempt was short-lived).

So indeed, I could have already created the repository a few weeks ago...

However in this instance, what would it mean to have codes that have lived
within a "proper" component for 6 years and more be moved to "sandbox"?

Regards,
Gilles

[1] http://commons.apache.org/sandbox/commons-graph/
[2] https://gitbox.apache.org/repos/asf?p=commons-graph.git
[3] http://commons.apache.org/sandbox/commons-graph/source-repository.html

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: The case for a Commons component

Posted by Oliver Heger <ol...@oliver-heger.de>.


Am 05.05.21 um 20:26 schrieb Gilles Sadowski:
> Le mer. 5 mai 2021 à 18:57, Gary Gregory <ga...@gmail.com> a écrit :
>>
>> IMO the lack of +1s shows the lack of appetite to manage another component
> 
> That's certainly true.
> And nobody is forced to do anything.
> 
> When the other CM spin-offs started, there was only _one_ person
> willing to do the work.

What about the sandbox? IIUC, every committer can start a new component 
there. If then a community forms around this component, it can move to 
proper (which would then require a vote).

Would this be an option to get started?

Oliver

> 
> Gilles
> 
>> [...]
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: The case for a Commons component

Posted by Alex Herbert <al...@gmail.com>.

On Fri, 30 Apr 2021 at 16:40, Avijit Basak <av...@gmail.com> wrote:

>
>           >>  Then some examination of the data-structures is required (a
> binary chromosome is currently stored as a "List<Integer>").
>               -- I have recently done some work on this. Could you please
> check this article and share your thought.
>                   "*https://arxiv.org/abs/2103.04751
> <https://arxiv.org/abs/2103.04751>*"
>

Looking at the paper it relates to the efficiency of storing binary values
for many indexes. The conclusion being you should use each bit of the byte
to store each binary value, i.e. a bitset. In the example repository the
binary chromosome is stored using a List<Long> with each long representing
64 alleles. This is basically an unoptimised BitSet. So I would look at how
this is done in java.util.BitSet and write a custom version for
optimised genetic algorithm operations. It would also be faster than the
List<Long> and avoid the boxing of each long with a Long object wrapper
thus save a lot of memory.

Note: You cannot easily just use java.util.BitSet as you wish to have
access to the underlying long[] to store the chromosome to enable efficient
crossover. This can be done with bit manipulation of the longs containing
the crossover point and then a System.arraycopy via a temp array:

For a single point crossover of two long[] chromosomes:

long[] c1 = ...
long[] c2 = ...
// The chosen allele for the crossover
int cross = ...

// Find the index and bit in the 64-bit per long representation
int index = cross >> 6; // i.e. cross / 64
// This is not actually required...
// int bit = cross & 64; // i.e. cross % 64

// The following will create the mask for all bits up to the target bit
// long mask = -1 << bit;
long mask = -1 << index;

// Swap the bits before/after the crossover at the target index
long tmp = c1[index];
c1[index] = (tmp & mask) | (c2[index] & ~mask);
c2[index] = (tmp & ~mask) | (c2[index] & mask);

// Copy the rest
long[] data = new long[index];
System.arraycopy(c2, 0, data, 0, index);
System.arraycopy(c1, 0, c2, 0, index);
System.arraycopy(data, 0, c1, 0, index);

This is untested code but contains the main idea.

Setting and unsetting bits in the binary chromosome for mutation is much
easier as you just pick the mutation point, find the index and the bit and
then set it or unset it as appropriate using a xor operation of the bit
(see the source code for BitSet.flip).

Alex

Re: The case for a Commons component

Posted by Alex Herbert <al...@gmail.com>.

On Sun, 2 May 2021 at 16:51, Avijit Basak <av...@gmail.com> wrote:

> Hi
>
> >>        Note: You cannot easily just use java.util.BitSet as you wish to
> have
> access to the underlying long[] to store the chromosome to enable efficient
> crossover.
> --Thanks for pointing this. However, I have considered few constraints
> while doing the implementation.
>      1) I extended the existing class AbstractListChromosome, which
> requires a Generic type. This is the reason for using a list of Long.
> However, I can extend the Chromosome and use an array of primitive long.
> BitSet also uses a similar data structure.
>      2) One problem of BitSet is the use of MSB to retain bits. As a
> result, we won't be able to use the static utility methods of wrapper
> classes(Long) for conversion between primitive type and string. We will
> have to write custom code for conversion between string and integral types.
> This is the only reason I have used BLOCKSIZE as 63 instead of 64.
>

I did state you cannot use BitSet as there are requirements to access the
underlying long[] for certain operations such as crossover. Thus you have
to build a custom implementation that uses a long[] representation with the
operations you need. You can then store the bits using big or little endian
as you require. The BitSet is using LSB for bit 0 to MSB for bit 63 of each
word.

Writing custom code for toString() would be simple. You can use a 256 entry
look-up table and output 8 blocks per long:

String[] OUTPUT = { "00000000", "00000001", "00000010", "00000011", etc. };
long[] alleles = ...;
StringBuilder sb = new StringBuilder(alleles.length * 64);
for (long bits : alleles) {
    // The order of this depends on the endianness of the representation
    sb.append(OUTPUT[(int)(bits & 0xff)])
       .append(OUTPUT[(int)((bits >> 8) & 0xff)])
       .append(OUTPUT[(int)((bits >> 16) & 0xff)])
       // etc ...
}

There would be extra work for the final block of 64 if it is not complete
(i.e. less than 64 bits are used) to avoid extra zeros in the output.

Writing fromString input code could use Long.parseUnsignedLong(long, int)
with a radix of 2 if you have the correct endianness per block of 64. This
allows you to intake 64 characters at a time to create the long[].

I do not see it as a problem to write custom code based around long[] if
the result is a large gain in speed and memory efficiency for the
implementation.

Restricting functionality to the current CM AbstractListChromosome
or Chromosome is not necessary for a new package. This is the opportunity
to build new data structures that are appropriate for the intended use.


> >>// This is not actually required...
> // int bit = cross & 64; // i.e. cross % 64
> --Do you mean bit index is not required to calculate? How can we handle
> crossover indexes which are not multiple of 64.
>

Sorry for not being clear. You need to create the mask to determine where
in the 64-bit long to perform the crossover. What I meant was you do not
need to identify the bit with a modulus operator. This:

int cross = ...

int index = cross / 64;
int bit = cross % 64;
long mask = 0xffff_ffff_ffff_ffffL << bit;

Is the same as:

int index = cross >>> 6;
long mask = -1 << cross;

This is because the left shift operator only uses the int value from the
lowest 6 bits of the integer. These are all the same:

-1 << 1
-1 << (1 + 64)
-1 << (1 + 128)
-1 << (1 + 256)


> >> Do you think that allele sets other than binary would be useful to
> implement? [IIUC your document above, it seems not (?).]
> --The document only describes the data structure related to Binary
> genotype. We already have an implementation of RandomKey genotype in
> commons. We can think of adding other genotypes gradually.
>
>
> Thanks & Regards
> --Avijit Basak
>
>
>
> On Sat, 1 May 2021 at 22:18, Gilles Sadowski <gi...@gmail.com> wrote:
>
> > Le ven. 30 avr. 2021 à 17:40, Avijit Basak <av...@gmail.com> a
> > écrit :
> > >
> > > Hi
> > >
> > >          >>lot of spurious references to "Commons Numbers"
> > >              --I have only created the basic project structure. Changes
> > > need to be made. Can anyone from the existing commons team help in
> doing
> > > this.
> >
> > Wel, you should "search and replace":
> >   "Numbers" -> "Machine Learning"
> >   commons-numbers -> commons-machinelearning
> >
> > Other things (repository URL, JIRA project name and URL) require that
> > a component be created (vote is pending).
> > [As long as those files are not part of a PR, it is not urgent to fix
> > them.]
> >
> > >          >> For sure, populate it with the code extracted from CM's
> > > "genetics"
> > > package and proceed with the enhancements.
> > > At first, I'd suggest to refactor the layout of the package (i.e.
> create
> > > a "subpackage" for each component of a genetic algorithm).
> > >               -- I am working on it.
> >
> > Great!
> >
> > > Did not commit the code till now.
> >
> > OK.  When you do, please ask for review on the "dev" ML.
> >
> > >           >>  Then some examination of the data-structures is required
> (a
> > > binary chromosome is currently stored as a "List<Integer>").
> > >               -- I have recently done some work on this. Could you
> please
> > > check this article and share your thought.
> > >                   "*https://arxiv.org/abs/2103.04751
> > > <https://arxiv.org/abs/2103.04751>*"
> >
> > Alex already provided a thorough response.
> > It's a pity that JDK's BitSet is missing a few methods (e.g. "append")
> > for a readily usable implementation of a "binary chromosome".
> >
> > Do you think that allele sets other than binary would be useful to
> > implement? [IIUC your document above, it seems not (?).]
> >
> > >           Are we thinking to use Spark for our parallelism
> >
> > No, if the code is to reside in Commons.
> >
> > > or a simple
> > > multi-threading of Java.
> >
> > Yes, we'd depend only on JDK classes.
> >
> > > I would prefer to use java multi-threading and
> > > avoid any other framework.
> > >           In java we don't have any library which can be used for AI/ML
> > > programming with a very minimal learning curve. Can we think of
> > fulfilling
> > > this need?
> >
> > That would be nice. Don't hesitate to enlist fellow programmers. :-)
> >
> > Regards,
> > Gilles
> >
> > >           This will be helpful for many java developers to venture into
> > > AI/ML without learning a new language like Python.
> > >
> > >
> > >>> [...]
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > For additional commands, e-mail: dev-help@commons.apache.org
> >
> >
>
> --
> Avijit Basak
>

Re: The case for a Commons component

Posted by Gilles Sadowski <gi...@gmail.com>.

Hello.

Le lun. 3 mai 2021 à 08:53, Avijit Basak <av...@gmail.com> a écrit :
>
> Hi
>
>           I would like to vote for *commons-ml*.

Wrong thread, again.

Sorry for the nit-picking, but whenever a vote is requested, it is
often the basis of an official decision that must be traceable by
other parties, such as the ASF's INFRAstructure people.
In this case (the eventual creation of a repository, they might not
need to be involved, so I've voted on your behalf in the proper
thread (but, please, confirm by acknowledging, in that *other*
thread that the vote is according to your preference).

Thanks,
Gilles


>> [...]

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: The case for a Commons component

Posted by Avijit Basak <av...@gmail.com>.

Hi

          I would like to vote for *commons-ml*.

Thanks & Regards
--Avijit Basak

On Mon, 3 May 2021 at 04:29, Gilles Sadowski <gi...@gmail.com> wrote:

> Hi.
>
> > [... Discussion about GA data-structures...]
>
> I'd suggest that we finalize the [Vote] before getting into the
> details...
>
> Currently, there have been votes by:
>   Emmanuel Bourg (-1)
>   Sebastian Bazley (-0)
>   Ralph Goers (+0)
>   Paul King (+1)
>
> So currently, the discussion should be focused on settling to the
> issues put forward by the opponents to having this new component:
>   * Problem 1: Functionality should go somewhere else (Emmanuel, Sebb)
>   * Problem 2: Who will contribute? (Ralph)
>
> Partial answers have been given.
> We need more opinions (and votes).
>
> Regards,
> Gilles
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

-- 
Avijit Basak

Re: The case for a Commons component

Posted by Gilles Sadowski <gi...@gmail.com>.

> > [...]
> >>
> >> So a procedural vote requires a majority.
> >
> > There is a small majority (irrespective of the binding vs non-binding
> > categories).
>
> In votes ONLY PMC member votes are counted. Other votes are
> advisory. PMC members should take those votes into account
> when voting.

That's the point indeed: the "advisory" information was not taken
into account.

Last time the PMC turned down a contribution, the conversation
had made it clear that the donating people did not intend to
support it.
Here we have the "opposite" case: Code that is rotting here could
be taken back to life.  Yet it seems that sparing some bits on the
ASF servers is more important than having people feel welcome
to contribute here.

> If you don’t understand that concept you shouldn’t
> be on a PMC.

Sure. There is "concept" for that nowadays: Cancel culture...

> Trying to justify creating a new Commons component by endlessly
> discussing the topic just isn’t going to work.
>
> I’ll not be responding to more emails on this thread

... exactly (see above).

> as I consider the
> matter closed.


Gilles

>
> Ralph

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: The case for a Commons component

Posted by Ralph Goers <ra...@dslextreme.com>.


> On May 6, 2021, at 3:04 AM, Gilles Sadowski <gi...@gmail.com> wrote:
>> 
>> It looks like you didn’t read the page.
> 
> I did, of course. And my interpretation differs.
> 
>> For clarity I am copying it here
>> 
>> "Votes on procedural issues follow the common format of majority rule unless
>> 
>> otherwise stated. That is, if there are more favourable votes than unfavourable ones,
>> 
>> the issue is considered to have passed -- regardless of the number of votes in each
>> 
>> category. (If the number of votes seems too small to be representative of a community
>> 
>> consensus, the issue is typically not pursued. However, see the description of
>> 
>> lazy consensus <https://www.apache.org/foundation/voting.html#LazyConsensus <https://www.apache.org/foundation/voting.html#LazyConsensus>> for a modifying factor.)"
>> 
>> 
>> So a procedural vote requires a majority.
> 
> There is a small majority (irrespective of the binding vs non-binding
> categories).

In votes ONLY PMC member votes are counted. Other votes are 
advisory. PMC members should take those votes into account 
when voting. If you don’t understand that concept you shouldn’t 
be on a PMC.

Trying to justify creating a new Commons component by endlessly
discussing the topic just isn’t going to work.

I’ll not be responding to more emails on this thread as I consider the 
matter closed.

Ralph

Re: The case for a Commons component

Posted by Gilles Sadowski <gi...@gmail.com>.

Le jeu. 6 mai 2021 à 07:53, Ralph Goers <ra...@dslextreme.com> a écrit :
>
>
> > On May 5, 2021, at 11:13 AM, Gilles Sadowski <gi...@gmail.com> wrote:
> >
> > Le mer. 5 mai 2021 à 17:44, Ralph Goers <ra...@dslextreme.com> a écrit :
> >>
> >>
> >>
> >>> On May 5, 2021, at 6:38 AM, Gilles Sadowski <gi...@gmail.com> wrote:
> >>>
> >>> Le mar. 4 mai 2021 à 02:49, Ralph Goers <ra...@dslextreme.com> a écrit :
> >>>>
> >>>> I apologize. I started another thread regarding the vote before seeing this.
> >>>
> >>> No problem.
> >>>
> >>>> Maybe that will get more attention?
> >>>
> >>> It doesn't seem so. :-}
> >>>
> >>> IMHO, valid answers have been given to the statements/questions
> >>> from people who didn't vote +1.
> >>> The very low turnout makes the arithmetics of the result fairly subjective...
> >>>
> >>> The optimistic view is that
> >>> 1. most people don't care (that the repository is created),
> >>> 2. there is no reason to doubt the infos provided by actual users of
> >>> those codes,
> >>> 3. there is an embryo of a community (perhaps not viable, but only
> >>> the future can tell...),[1]
> >>> 4. the same kind of welcoming gestures should apply for the proposed
> >>> contributions, as for the attempt to resuscitate "Commons Graph"[2],
> >>> even if some of the PMC might arguably prefer another option.
> >>
> >> Regardless, following https://www.apache.org/foundation/voting.html <https://www.apache.org/foundation/voting.html> indicates that this vote is not going to pass.
> >
> > How so?
> > [It's not about a code change; and no "technical argument" can be invoked.]
>
> It looks like you didn’t read the page.

I did, of course. And my interpretation differs.

> For clarity I am copying it here
>
> "Votes on procedural issues follow the common format of majority rule unless
>
> otherwise stated. That is, if there are more favourable votes than unfavourable ones,
>
> the issue is considered to have passed -- regardless of the number of votes in each
>
> category. (If the number of votes seems too small to be representative of a community
>
>  consensus, the issue is typically not pursued. However, see the description of
>
> lazy consensus <https://www.apache.org/foundation/voting.html#LazyConsensus> for a modifying factor.)"
>
>
> So a procedural vote requires a majority.

There is a small majority (irrespective of the binding vs non-binding
categories).

> But note that it also calls out that if the number of voters
> seems too small then the issue is usually not pursued.

"usually"...
In Commons, the number of votes has always been low, in
proportion of the official number of committers.
No surprise that, for very specific functionalities, it is even
lower.
However the main point should rather have been whether
the perspective exists that someone will do the work for
getting a chance for a community to ever exist.
In the case of ML algorithms, a discussion started that has
involved 4 people (among them 2 PMC people); this is largely
more than the "usual" attendance about any one specific
component's issue.

>  Both of these describe this situation perfectly.
> The vote did not get a majority of binding votes (it was a tie) and the number of votes was very small.
>
>
> >
> >> You can’t assert lazy consensus on an explicit vote.  If you had started this as a lazy consensus vote it
> >> is likely it would have still gotten a -1 vote since both Sebb and Emmanuel have voice opposition.
> >
>
>
> > A "veto" does not apply here.
> > Hence my remark on the "arithmetics" since the total tally is slightly
> > "pro" although the PMC tally is slightly "con”.
>
> Where did I use the word “veto”? I never used the word “veto”.

I was trying to figure out how you reached your conclusion from the
page which you referred to (i.e. how a "-1" vote would be sufficient).

> There are essentially 3 ways to vote,
> Yes, No, and Abstain. In a procedural vote + or -1 represent an abstention. Anything less than 0 is
> a No and anything greater is a Yes. So saying there were -1 votes implies there are “No” votes and
> therefore there is no consensus.

Oliver reminded us that "[...] every committer can start a new
component [in the sandbox]".
Your interpration of the procedural vote seems to mean that
anyone else can prevent such an initiative.

Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: The case for a Commons component

Posted by Ralph Goers <ra...@dslextreme.com>.

> On May 5, 2021, at 11:13 AM, Gilles Sadowski <gi...@gmail.com> wrote:
> 
> Le mer. 5 mai 2021 à 17:44, Ralph Goers <ra...@dslextreme.com> a écrit :
>> 
>> 
>> 
>>> On May 5, 2021, at 6:38 AM, Gilles Sadowski <gi...@gmail.com> wrote:
>>> 
>>> Le mar. 4 mai 2021 à 02:49, Ralph Goers <ra...@dslextreme.com> a écrit :
>>>> 
>>>> I apologize. I started another thread regarding the vote before seeing this.
>>> 
>>> No problem.
>>> 
>>>> Maybe that will get more attention?
>>> 
>>> It doesn't seem so. :-}
>>> 
>>> IMHO, valid answers have been given to the statements/questions
>>> from people who didn't vote +1.
>>> The very low turnout makes the arithmetics of the result fairly subjective...
>>> 
>>> The optimistic view is that
>>> 1. most people don't care (that the repository is created),
>>> 2. there is no reason to doubt the infos provided by actual users of
>>> those codes,
>>> 3. there is an embryo of a community (perhaps not viable, but only
>>> the future can tell...),[1]
>>> 4. the same kind of welcoming gestures should apply for the proposed
>>> contributions, as for the attempt to resuscitate "Commons Graph"[2],
>>> even if some of the PMC might arguably prefer another option.
>> 
>> Regardless, following https://www.apache.org/foundation/voting.html <https://www.apache.org/foundation/voting.html> indicates that this vote is not going to pass.
> 
> How so?
> [It's not about a code change; and no "technical argument" can be invoked.]

It looks like you didn’t read the page. For clarity I am copying it here

"Votes on procedural issues follow the common format of majority rule unless 

otherwise stated. That is, if there are more favourable votes than unfavourable ones, 

the issue is considered to have passed -- regardless of the number of votes in each 

category. (If the number of votes seems too small to be representative of a community

 consensus, the issue is typically not pursued. However, see the description of 

lazy consensus <https://www.apache.org/foundation/voting.html#LazyConsensus> for a modifying factor.)"

So a procedural vote requires a majority. But note that it also calls out that if the number of voters 
seems too small then the issue is usually not pursued.  Both of these describe this situation perfectly. 
The vote did not get a majority of binding votes (it was a tie) and the number of votes was very small.

> 
>> You can’t assert lazy consensus on an explicit vote.  If you had started this as a lazy consensus vote it
>> is likely it would have still gotten a -1 vote since both Sebb and Emmanuel have voice opposition.
> 

> A "veto" does not apply here.
> Hence my remark on the "arithmetics" since the total tally is slightly
> "pro" although the PMC tally is slightly "con”.

Where did I use the word “veto”? I never used the word “veto”.  There are essentially 3 ways to vote, 
Yes, No, and Abstain. In a procedural vote + or -1 represent an abstention. Anything less than 0 is 
a No and anything greater is a Yes. So saying there were -1 votes implies there are “No” votes and 
therefore there is no consensus.

Ralph

Re: The case for a Commons component

Posted by Gilles Sadowski <gi...@gmail.com>.

Le mer. 5 mai 2021 à 17:44, Ralph Goers <ra...@dslextreme.com> a écrit :
>
>
>
> > On May 5, 2021, at 6:38 AM, Gilles Sadowski <gi...@gmail.com> wrote:
> >
> > Le mar. 4 mai 2021 à 02:49, Ralph Goers <ra...@dslextreme.com> a écrit :
> >>
> >> I apologize. I started another thread regarding the vote before seeing this.
> >
> > No problem.
> >
> >> Maybe that will get more attention?
> >
> > It doesn't seem so. :-}
> >
> > IMHO, valid answers have been given to the statements/questions
> > from people who didn't vote +1.
> > The very low turnout makes the arithmetics of the result fairly subjective...
> >
> > The optimistic view is that
> >  1. most people don't care (that the repository is created),
> >  2. there is no reason to doubt the infos provided by actual users of
> > those codes,
> >  3. there is an embryo of a community (perhaps not viable, but only
> > the future can tell...),[1]
> >  4. the same kind of welcoming gestures should apply for the proposed
> > contributions, as for the attempt to resuscitate "Commons Graph"[2],
> > even if some of the PMC might arguably prefer another option.
>
> Regardless, following https://www.apache.org/foundation/voting.html <https://www.apache.org/foundation/voting.html> indicates that this vote is not going to pass.

How so?
[It's not about a code change; and no "technical argument" can be invoked.]

> You can’t assert lazy consensus on an explicit vote.  If you had started this as a lazy consensus vote it
> is likely it would have still gotten a -1 vote since both Sebb and Emmanuel have voice opposition.

A "veto" does not apply here.
Hence my remark on the "arithmetics" since the total tally is slightly
"pro" although the PMC tally is slightly "con".

Gilles

>
> Ralph
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: The case for a Commons component

Posted by Ralph Goers <ra...@dslextreme.com>.


> On May 5, 2021, at 6:38 AM, Gilles Sadowski <gi...@gmail.com> wrote:
> 
> Le mar. 4 mai 2021 à 02:49, Ralph Goers <ra...@dslextreme.com> a écrit :
>> 
>> I apologize. I started another thread regarding the vote before seeing this.
> 
> No problem.
> 
>> Maybe that will get more attention?
> 
> It doesn't seem so. :-}
> 
> IMHO, valid answers have been given to the statements/questions
> from people who didn't vote +1.
> The very low turnout makes the arithmetics of the result fairly subjective...
> 
> The optimistic view is that
>  1. most people don't care (that the repository is created),
>  2. there is no reason to doubt the infos provided by actual users of
> those codes,
>  3. there is an embryo of a community (perhaps not viable, but only
> the future can tell...),[1]
>  4. the same kind of welcoming gestures should apply for the proposed
> contributions, as for the attempt to resuscitate "Commons Graph"[2],
> even if some of the PMC might arguably prefer another option.

Regardless, following https://www.apache.org/foundation/voting.html <https://www.apache.org/foundation/voting.html> indicates that this vote is not going to pass. 
You can’t assert lazy consensus on an explicit vote.  If you had started this as a lazy consensus vote it 
is likely it would have still gotten a -1 vote since both Sebb and Emmanuel have voice opposition.

Ralph

Re: The case for a Commons component

Posted by Gilles Sadowski <gi...@gmail.com>.

Le mer. 5 mai 2021 à 18:57, Gary Gregory <ga...@gmail.com> a écrit :
>
> IMO the lack of +1s shows the lack of appetite to manage another component

That's certainly true.
And nobody is forced to do anything.

When the other CM spin-offs started, there was only _one_ person
willing to do the work.

Gilles

> [...]

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: The case for a Commons component

Posted by Gary Gregory <ga...@gmail.com>.

IMO the lack of +1s shows the lack of appetite to manage another component
that not "common" to "most" Java apps, where I use quotes to understand
that YMMV.

Personally, my plate is full with the current slate of components in which
I participate.

Gary

On Wed, May 5, 2021, 09:38 Gilles Sadowski <gi...@gmail.com> wrote:

> Le mar. 4 mai 2021 à 02:49, Ralph Goers <ra...@dslextreme.com> a
> écrit :
> >
> > I apologize. I started another thread regarding the vote before seeing
> this.
>
> No problem.
>
> > Maybe that will get more attention?
>
> It doesn't seem so. :-}
>
> IMHO, valid answers have been given to the statements/questions
> from people who didn't vote +1.
> The very low turnout makes the arithmetics of the result fairly
> subjective...
>
> The optimistic view is that
>   1. most people don't care (that the repository is created),
>   2. there is no reason to doubt the infos provided by actual users of
> those codes,
>   3. there is an embryo of a community (perhaps not viable, but only
> the future can tell...),[1]
>   4. the same kind of welcoming gestures should apply for the proposed
> contributions, as for the attempt to resuscitate "Commons Graph"[2],
> even if some of the PMC might arguably prefer another option.
>
> Regards,
> Gilles
>
> [1] Three Java implementations of the SOFM turned up as the top results
> of a web search; none seem to include multi-threading.
> [2] https://gitbox.apache.org/repos/asf?p=commons-graph.git
>
>
> >
> > Ralph
> >
> > > On May 2, 2021, at 3:59 PM, Gilles Sadowski <gi...@gmail.com>
> wrote:
> > >
> > > Hi.
> > >
> > >> [... Discussion about GA data-structures...]
> > >
> > > I'd suggest that we finalize the [Vote] before getting into the
> > > details...
> > >
> > > Currently, there have been votes by:
> > >  Emmanuel Bourg (-1)
> > >  Sebastian Bazley (-0)
> > >  Ralph Goers (+0)
> > >  Paul King (+1)
> > >
> > > So currently, the discussion should be focused on settling to the
> > > issues put forward by the opponents to having this new component:
> > >  * Problem 1: Functionality should go somewhere else (Emmanuel, Sebb)
> > >  * Problem 2: Who will contribute? (Ralph)
> > >
> > > Partial answers have been given.
> > > We need more opinions (and votes).
> > >
> > > Regards,
> > > Gilles
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

Re: The case for a Commons component

Posted by Gilles Sadowski <gi...@gmail.com>.

Le mar. 4 mai 2021 à 02:49, Ralph Goers <ra...@dslextreme.com> a écrit :
>
> I apologize. I started another thread regarding the vote before seeing this.

No problem.

> Maybe that will get more attention?

It doesn't seem so. :-}

IMHO, valid answers have been given to the statements/questions
from people who didn't vote +1.
The very low turnout makes the arithmetics of the result fairly subjective...

The optimistic view is that
  1. most people don't care (that the repository is created),
  2. there is no reason to doubt the infos provided by actual users of
those codes,
  3. there is an embryo of a community (perhaps not viable, but only
the future can tell...),[1]
  4. the same kind of welcoming gestures should apply for the proposed
contributions, as for the attempt to resuscitate "Commons Graph"[2],
even if some of the PMC might arguably prefer another option.

Regards,
Gilles

[1] Three Java implementations of the SOFM turned up as the top results
of a web search; none seem to include multi-threading.
[2] https://gitbox.apache.org/repos/asf?p=commons-graph.git

>
> Ralph
>
> > On May 2, 2021, at 3:59 PM, Gilles Sadowski <gi...@gmail.com> wrote:
> >
> > Hi.
> >
> >> [... Discussion about GA data-structures...]
> >
> > I'd suggest that we finalize the [Vote] before getting into the
> > details...
> >
> > Currently, there have been votes by:
> >  Emmanuel Bourg (-1)
> >  Sebastian Bazley (-0)
> >  Ralph Goers (+0)
> >  Paul King (+1)
> >
> > So currently, the discussion should be focused on settling to the
> > issues put forward by the opponents to having this new component:
> >  * Problem 1: Functionality should go somewhere else (Emmanuel, Sebb)
> >  * Problem 2: Who will contribute? (Ralph)
> >
> > Partial answers have been given.
> > We need more opinions (and votes).
> >
> > Regards,
> > Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: The case for a Commons component

Posted by Ralph Goers <ra...@dslextreme.com>.

I apologize. I started another thread regarding the vote before seeing this. Maybe that will get more attention?

Ralph

> On May 2, 2021, at 3:59 PM, Gilles Sadowski <gi...@gmail.com> wrote:
> 
> Hi.
> 
>> [... Discussion about GA data-structures...]
> 
> I'd suggest that we finalize the [Vote] before getting into the
> details...
> 
> Currently, there have been votes by:
>  Emmanuel Bourg (-1)
>  Sebastian Bazley (-0)
>  Ralph Goers (+0)
>  Paul King (+1)
> 
> So currently, the discussion should be focused on settling to the
> issues put forward by the opponents to having this new component:
>  * Problem 1: Functionality should go somewhere else (Emmanuel, Sebb)
>  * Problem 2: Who will contribute? (Ralph)
> 
> Partial answers have been given.
> We need more opinions (and votes).
> 
> Regards,
> Gilles
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
> 
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: The case for a Commons component

Posted by Gilles Sadowski <gi...@gmail.com>.

Hi.

> [... Discussion about GA data-structures...]

I'd suggest that we finalize the [Vote] before getting into the
details...

Currently, there have been votes by:
  Emmanuel Bourg (-1)
  Sebastian Bazley (-0)
  Ralph Goers (+0)
  Paul King (+1)

So currently, the discussion should be focused on settling to the
issues put forward by the opponents to having this new component:
  * Problem 1: Functionality should go somewhere else (Emmanuel, Sebb)
  * Problem 2: Who will contribute? (Ralph)

Partial answers have been given.
We need more opinions (and votes).

Regards,
Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: The case for a Commons component

Posted by Avijit Basak <av...@gmail.com>.

Hi

>>        Note: You cannot easily just use java.util.BitSet as you wish to
have
access to the underlying long[] to store the chromosome to enable efficient
crossover.
--Thanks for pointing this. However, I have considered few constraints
while doing the implementation.
     1) I extended the existing class AbstractListChromosome, which
requires a Generic type. This is the reason for using a list of Long.
However, I can extend the Chromosome and use an array of primitive long.
BitSet also uses a similar data structure.
     2) One problem of BitSet is the use of MSB to retain bits. As a
result, we won't be able to use the static utility methods of wrapper
classes(Long) for conversion between primitive type and string. We will
have to write custom code for conversion between string and integral types.
This is the only reason I have used BLOCKSIZE as 63 instead of 64.
>>// This is not actually required...
// int bit = cross & 64; // i.e. cross % 64
--Do you mean bit index is not required to calculate? How can we handle
crossover indexes which are not multiple of 64.
>> Do you think that allele sets other than binary would be useful to
implement? [IIUC your document above, it seems not (?).]
--The document only describes the data structure related to Binary
genotype. We already have an implementation of RandomKey genotype in
commons. We can think of adding other genotypes gradually.


Thanks & Regards
--Avijit Basak



On Sat, 1 May 2021 at 22:18, Gilles Sadowski <gi...@gmail.com> wrote:

> Le ven. 30 avr. 2021 à 17:40, Avijit Basak <av...@gmail.com> a
> écrit :
> >
> > Hi
> >
> >          >>lot of spurious references to "Commons Numbers"
> >              --I have only created the basic project structure. Changes
> > need to be made. Can anyone from the existing commons team help in doing
> > this.
>
> Wel, you should "search and replace":
>   "Numbers" -> "Machine Learning"
>   commons-numbers -> commons-machinelearning
>
> Other things (repository URL, JIRA project name and URL) require that
> a component be created (vote is pending).
> [As long as those files are not part of a PR, it is not urgent to fix
> them.]
>
> >          >> For sure, populate it with the code extracted from CM's
> > "genetics"
> > package and proceed with the enhancements.
> > At first, I'd suggest to refactor the layout of the package (i.e. create
> > a "subpackage" for each component of a genetic algorithm).
> >               -- I am working on it.
>
> Great!
>
> > Did not commit the code till now.
>
> OK.  When you do, please ask for review on the "dev" ML.
>
> >           >>  Then some examination of the data-structures is required (a
> > binary chromosome is currently stored as a "List<Integer>").
> >               -- I have recently done some work on this. Could you please
> > check this article and share your thought.
> >                   "*https://arxiv.org/abs/2103.04751
> > <https://arxiv.org/abs/2103.04751>*"
>
> Alex already provided a thorough response.
> It's a pity that JDK's BitSet is missing a few methods (e.g. "append")
> for a readily usable implementation of a "binary chromosome".
>
> Do you think that allele sets other than binary would be useful to
> implement? [IIUC your document above, it seems not (?).]
>
> >           Are we thinking to use Spark for our parallelism
>
> No, if the code is to reside in Commons.
>
> > or a simple
> > multi-threading of Java.
>
> Yes, we'd depend only on JDK classes.
>
> > I would prefer to use java multi-threading and
> > avoid any other framework.
> >           In java we don't have any library which can be used for AI/ML
> > programming with a very minimal learning curve. Can we think of
> fulfilling
> > this need?
>
> That would be nice. Don't hesitate to enlist fellow programmers. :-)
>
> Regards,
> Gilles
>
> >           This will be helpful for many java developers to venture into
> > AI/ML without learning a new language like Python.
> >
> >
> >>> [...]
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

-- 
Avijit Basak

Re: The case for a Commons component

Posted by Gilles Sadowski <gi...@gmail.com>.

Le ven. 30 avr. 2021 à 17:40, Avijit Basak <av...@gmail.com> a écrit :
>
> Hi
>
>          >>lot of spurious references to "Commons Numbers"
>              --I have only created the basic project structure. Changes
> need to be made. Can anyone from the existing commons team help in doing
> this.

Wel, you should "search and replace":
  "Numbers" -> "Machine Learning"
  commons-numbers -> commons-machinelearning

Other things (repository URL, JIRA project name and URL) require that
a component be created (vote is pending).
[As long as those files are not part of a PR, it is not urgent to fix them.]

>          >> For sure, populate it with the code extracted from CM's
> "genetics"
> package and proceed with the enhancements.
> At first, I'd suggest to refactor the layout of the package (i.e. create
> a "subpackage" for each component of a genetic algorithm).
>               -- I am working on it.

Great!

> Did not commit the code till now.

OK.  When you do, please ask for review on the "dev" ML.

>           >>  Then some examination of the data-structures is required (a
> binary chromosome is currently stored as a "List<Integer>").
>               -- I have recently done some work on this. Could you please
> check this article and share your thought.
>                   "*https://arxiv.org/abs/2103.04751
> <https://arxiv.org/abs/2103.04751>*"

Alex already provided a thorough response.
It's a pity that JDK's BitSet is missing a few methods (e.g. "append")
for a readily usable implementation of a "binary chromosome".

Do you think that allele sets other than binary would be useful to
implement? [IIUC your document above, it seems not (?).]

>           Are we thinking to use Spark for our parallelism

No, if the code is to reside in Commons.

> or a simple
> multi-threading of Java.

Yes, we'd depend only on JDK classes.

> I would prefer to use java multi-threading and
> avoid any other framework.
>           In java we don't have any library which can be used for AI/ML
> programming with a very minimal learning curve. Can we think of fulfilling
> this need?

That would be nice. Don't hesitate to enlist fellow programmers. :-)

Regards,
Gilles

>           This will be helpful for many java developers to venture into
> AI/ML without learning a new language like Python.
>
>
>>> [...]

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: The case for a Commons component

Posted by Avijit Basak <av...@gmail.com>.

Hi

         >>lot of spurious references to "Commons Numbers"
             --I have only created the basic project structure. Changes
need to be made. Can anyone from the existing commons team help in doing
this.
         >> For sure, populate it with the code extracted from CM's
"genetics"
package and proceed with the enhancements.
At first, I'd suggest to refactor the layout of the package (i.e. create
a "subpackage" for each component of a genetic algorithm).
              -- I am working on it. Did not commit the code till now.
          >>  Then some examination of the data-structures is required (a
binary chromosome is currently stored as a "List<Integer>").
              -- I have recently done some work on this. Could you please
check this article and share your thought.
                  "*https://arxiv.org/abs/2103.04751
<https://arxiv.org/abs/2103.04751>*"

          Are we thinking to use Spark for our parallelism or a simple
multi-threading of Java. I would prefer to use java multi-threading and
avoid any other framework.
          In java we don't have any library which can be used for AI/ML
programming with a very minimal learning curve. Can we think of fulfilling
this need?
          This will be helpful for many java developers to venture into
AI/ML without learning a new language like Python.


Thanks & Regards
--Avijit Basak

On Wed, 28 Apr 2021 at 18:48, Gilles Sadowski <gi...@gmail.com> wrote:

> Le lun. 26 avr. 2021 à 16:18, Avijit Basak <av...@gmail.com> a
> écrit :
> >
> > Hi
> >
> >         As per previous discussions, I have created a temporary
> repository
> > in GitHub under my personal GitHub Id(avijitbasak). The artifacts have
> been
> > copied from commons-numbers. A preliminary structure has been created for
> > the proposed component.
> > Please let me know if we want to proceed with this format.
>
> There is no source code (and a lot of spurious references to
> "Commons Numbers").
> For sure, populate it with the code extracted from CM's "genetics"
> package and proceed with the enhancements.
> At first, I'd suggest to refactor the layout of the package (i.e. create
> a "subpackage" for each component of a genetic algorithm).
> Then some examination of the data-structures is required (a binary
> chromosome is currently stored as a "List<Integer>").
> Shouldn't the whole design be revised (based on interfaces and
> streams)?
>
> > We can copy the
> > same to any other team repository if required.
>
> That would be a repository on an ASF server, once the pending vote
> process is completed.  [By the way: You didn't vote...]
>
> Regards,
> Gilles
>
> >> [...]
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

-- 
Avijit Basak

Re: The case for a Commons component

Posted by Gilles Sadowski <gi...@gmail.com>.

Le lun. 26 avr. 2021 à 16:18, Avijit Basak <av...@gmail.com> a écrit :
>
> Hi
>
>         As per previous discussions, I have created a temporary repository
> in GitHub under my personal GitHub Id(avijitbasak). The artifacts have been
> copied from commons-numbers. A preliminary structure has been created for
> the proposed component.
> Please let me know if we want to proceed with this format.

There is no source code (and a lot of spurious references to
"Commons Numbers").
For sure, populate it with the code extracted from CM's "genetics"
package and proceed with the enhancements.
At first, I'd suggest to refactor the layout of the package (i.e. create
a "subpackage" for each component of a genetic algorithm).
Then some examination of the data-structures is required (a binary
chromosome is currently stored as a "List<Integer>").
Shouldn't the whole design be revised (based on interfaces and
streams)?

> We can copy the
> same to any other team repository if required.

That would be a repository on an ASF server, once the pending vote
process is completed.  [By the way: You didn't vote...]

Regards,
Gilles

>> [...]

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: The case for a Commons component

Posted by Gilles Sadowski <gi...@gmail.com>.

Le lun. 26 avr. 2021 à 17:08, Ralph Goers <ra...@dslextreme.com> a écrit :
>
> How many committers will be active for this component?

No less than there were for [RNG], [Numbers] and [Geometry]. ;-)

Those new components have attracted high-quality contributions;
two of the people who provided them have become committers.

Gilles

> > [...]

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: The case for a Commons component

Posted by Ralph Goers <ra...@dslextreme.com>.

How many committers will be active for this component?

Ralph

> On Apr 26, 2021, at 7:17 AM, Avijit Basak <av...@gmail.com> wrote:
> 
> Hi
> 
>        As per previous discussions, I have created a temporary repository
> in GitHub under my personal GitHub Id(avijitbasak). The artifacts have been
> copied from commons-numbers. A preliminary structure has been created for
> the proposed component.
> Please let me know if we want to proceed with this format. We can copy the
> same to any other team repository if required.
> 
>        Repo URL: https://github.com/avijitbasak/commons-machinelearning
> 
> Thanks & Regards
> --Avijit Basak
> 
> On Mon, 26 Apr 2021 at 04:49, Paul King <pa...@gmail.com> wrote:
> 
>> On Mon, Apr 26, 2021 at 12:27 AM sebb <se...@gmail.com> wrote:
>>> 
>>> I assume this thread is about the possible ML component.
>>> 
>>> If the code was developed by Commons, I assume it could be used as
>>> part of Spark.
>>> However Commons does not currently have many developers who are
>>> familiar with the field.
>>> So it would seem to me better to have development done by a project
>>> which does have relevant experience.
>>> 
>>> You say that Spark etc have lots of jars.
>>> Surely that allows for it to be implemented as a separate jar which
>>> can either be used as part of the Spark platform, or used
>>> independently?
>> 
>> The stats I gave were for the current minimal use of those algorithms.
>> Most algorithms are written in Scala, use RDD "dataframes" rather than
>> say double arrays, and assume you're running on "the platform" which
>> handles how you might get your data and return results and do logging
>> etc. in a potentially concurrent world. Some of those design choices
>> are key to scaling up but don't align with the goal of making the
>> algorithms runnable "independently".
>> 
>>> The only other option I see is for Commons to persuade some developers
>>> who are familiar with the field to join Commons to assist with the
>>> algorithms.
>> 
>> I agree that is the crux of the issue here. The "commons doesn't have
>> the bandwidth to absorb another algorithm" part of the discussion
>> seems perfectly legit to me. The "and there is an obvious home
>> elsewhere" part of the discussion seemed a little more dubious to me,
>> though obviously that is something which should be considered.
>> 
>>> Existing Commons developers can help manage the logistics of packaging
>>> and releasing the code, as this does not require in depth knowledge of
>>> the design.
>>> However this only makes sense if the developers skilled in the are are
>>> prepared to assist long-term.
>>> 
>>> 
>>> On Sat, 24 Apr 2021 at 23:32, Paul King <pa...@gmail.com>
>> wrote:
>>>> 
>>>> Thanks Gilles,
>>>> 
>>>> I can provide the same sort of stats across a clustering example
>>>> across commons-math (KMeans) vs Apache Ignite, Apache Spark and
>>>> Rheem/Apache Wayang (incubating) if anyone would find that useful. It
>>>> would no doubt lead to similar conclusions.
>>>> 
>>>> Cheers, Paul.
>>>> 
>>>> On Sun, Apr 25, 2021 at 8:15 AM Gilles Sadowski <gi...@gmail.com>
>> wrote:
>>>>> 
>>>>> Hello Paul.
>>>>> 
>>>>> Le sam. 24 avr. 2021 à 04:42, Paul King <pa...@gmail.com>
>> a écrit :
>>>>>> 
>>>>>> I added some more comments relevant to if the proposed algorithm
>>>>>> belongs somewhere in the commons "math" area back in the Jira:
>>>>>> 
>>>>>> https://issues.apache.org/jira/browse/MATH-1563
>>>>> 
>>>>> Thanks for a "real" user's testimony.
>>>>> 
>>>>> As the ML is still the official forum for such a discussion, I'm
>> quoting
>>>>> part of your post on JIRA:
>>>>> ---CUT---
>>>>> For linear regression, taking just one example dataset, commons-math
>>>>> is a couple of library calls for a single 2M library and solves the
>>>>> problem in 240ms. Both Ignite and Spark involve "firing up the
>>>>> platform" and the code is more complex for simple scenarios. Spark
>> has
>>>>> a 181M footprint across 210 jars and solves the problem in about 20s.
>>>>> Ignite has a 87M footprint across 85 jars and solves the problem in >
>>>>> 40s. But I can also find more complex scenarios which need to scale
>>>>> where Ignite and Spark really come into their own.
>>>>> ---CUT---
>>>>> 
>>>>> A similar rationale was behind my developing/using the SOFM
>>>>> functionality in the "o.a.c.m.ml.neuralnet" package: I needed a
>>>>> proof of concept, and taking the "lightweight" path seemed more
>>>>> effective than experimenting with those platforms.
>>>>> Admittingly, at that epoch, there were people around, who were
>>>>> maintaining the clustering and GA codes; hence, the prototyping
>>>>> of a machine-learning library didn't look strange to anyone.
>>>>> 
>>>>> Regards,
>>>>> Gilles
>>>>> 
>>>>>>>> [...]
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>> 
>> 
> 
> -- 
> Avijit Basak



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: The case for a Commons component

Posted by Avijit Basak <av...@gmail.com>.

Hi

        As per previous discussions, I have created a temporary repository
in GitHub under my personal GitHub Id(avijitbasak). The artifacts have been
copied from commons-numbers. A preliminary structure has been created for
the proposed component.
Please let me know if we want to proceed with this format. We can copy the
same to any other team repository if required.

        Repo URL: https://github.com/avijitbasak/commons-machinelearning

Thanks & Regards
--Avijit Basak

On Mon, 26 Apr 2021 at 04:49, Paul King <pa...@gmail.com> wrote:

> On Mon, Apr 26, 2021 at 12:27 AM sebb <se...@gmail.com> wrote:
> >
> > I assume this thread is about the possible ML component.
> >
> > If the code was developed by Commons, I assume it could be used as
> > part of Spark.
> > However Commons does not currently have many developers who are
> > familiar with the field.
> > So it would seem to me better to have development done by a project
> > which does have relevant experience.
> >
> > You say that Spark etc have lots of jars.
> > Surely that allows for it to be implemented as a separate jar which
> > can either be used as part of the Spark platform, or used
> > independently?
>
> The stats I gave were for the current minimal use of those algorithms.
> Most algorithms are written in Scala, use RDD "dataframes" rather than
> say double arrays, and assume you're running on "the platform" which
> handles how you might get your data and return results and do logging
> etc. in a potentially concurrent world. Some of those design choices
> are key to scaling up but don't align with the goal of making the
> algorithms runnable "independently".
>
> > The only other option I see is for Commons to persuade some developers
> > who are familiar with the field to join Commons to assist with the
> > algorithms.
>
> I agree that is the crux of the issue here. The "commons doesn't have
> the bandwidth to absorb another algorithm" part of the discussion
> seems perfectly legit to me. The "and there is an obvious home
> elsewhere" part of the discussion seemed a little more dubious to me,
> though obviously that is something which should be considered.
>
> > Existing Commons developers can help manage the logistics of packaging
> > and releasing the code, as this does not require in depth knowledge of
> > the design.
> > However this only makes sense if the developers skilled in the are are
> > prepared to assist long-term.
> >
> >
> > On Sat, 24 Apr 2021 at 23:32, Paul King <pa...@gmail.com>
> wrote:
> > >
> > > Thanks Gilles,
> > >
> > > I can provide the same sort of stats across a clustering example
> > > across commons-math (KMeans) vs Apache Ignite, Apache Spark and
> > > Rheem/Apache Wayang (incubating) if anyone would find that useful. It
> > > would no doubt lead to similar conclusions.
> > >
> > > Cheers, Paul.
> > >
> > > On Sun, Apr 25, 2021 at 8:15 AM Gilles Sadowski <gi...@gmail.com>
> wrote:
> > > >
> > > > Hello Paul.
> > > >
> > > > Le sam. 24 avr. 2021 à 04:42, Paul King <pa...@gmail.com>
> a écrit :
> > > > >
> > > > > I added some more comments relevant to if the proposed algorithm
> > > > > belongs somewhere in the commons "math" area back in the Jira:
> > > > >
> > > > > https://issues.apache.org/jira/browse/MATH-1563
> > > >
> > > > Thanks for a "real" user's testimony.
> > > >
> > > > As the ML is still the official forum for such a discussion, I'm
> quoting
> > > > part of your post on JIRA:
> > > > ---CUT---
> > > > For linear regression, taking just one example dataset, commons-math
> > > > is a couple of library calls for a single 2M library and solves the
> > > > problem in 240ms. Both Ignite and Spark involve "firing up the
> > > > platform" and the code is more complex for simple scenarios. Spark
> has
> > > > a 181M footprint across 210 jars and solves the problem in about 20s.
> > > > Ignite has a 87M footprint across 85 jars and solves the problem in >
> > > > 40s. But I can also find more complex scenarios which need to scale
> > > > where Ignite and Spark really come into their own.
> > > > ---CUT---
> > > >
> > > > A similar rationale was behind my developing/using the SOFM
> > > > functionality in the "o.a.c.m.ml.neuralnet" package: I needed a
> > > > proof of concept, and taking the "lightweight" path seemed more
> > > > effective than experimenting with those platforms.
> > > > Admittingly, at that epoch, there were people around, who were
> > > > maintaining the clustering and GA codes; hence, the prototyping
> > > > of a machine-learning library didn't look strange to anyone.
> > > >
> > > > Regards,
> > > > Gilles
> > > >
> > > > >>> [...]
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > > > For additional commands, e-mail: dev-help@commons.apache.org
> > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > > For additional commands, e-mail: dev-help@commons.apache.org
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > For additional commands, e-mail: dev-help@commons.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

-- 
Avijit Basak

Re: The case for a Commons component

Posted by Paul King <pa...@gmail.com>.

On Mon, Apr 26, 2021 at 12:27 AM sebb <se...@gmail.com> wrote:
>
> I assume this thread is about the possible ML component.
>
> If the code was developed by Commons, I assume it could be used as
> part of Spark.
> However Commons does not currently have many developers who are
> familiar with the field.
> So it would seem to me better to have development done by a project
> which does have relevant experience.
>
> You say that Spark etc have lots of jars.
> Surely that allows for it to be implemented as a separate jar which
> can either be used as part of the Spark platform, or used
> independently?

The stats I gave were for the current minimal use of those algorithms.
Most algorithms are written in Scala, use RDD "dataframes" rather than
say double arrays, and assume you're running on "the platform" which
handles how you might get your data and return results and do logging
etc. in a potentially concurrent world. Some of those design choices
are key to scaling up but don't align with the goal of making the
algorithms runnable "independently".

> The only other option I see is for Commons to persuade some developers
> who are familiar with the field to join Commons to assist with the
> algorithms.

I agree that is the crux of the issue here. The "commons doesn't have
the bandwidth to absorb another algorithm" part of the discussion
seems perfectly legit to me. The "and there is an obvious home
elsewhere" part of the discussion seemed a little more dubious to me,
though obviously that is something which should be considered.

> Existing Commons developers can help manage the logistics of packaging
> and releasing the code, as this does not require in depth knowledge of
> the design.
> However this only makes sense if the developers skilled in the are are
> prepared to assist long-term.
>
>
> On Sat, 24 Apr 2021 at 23:32, Paul King <pa...@gmail.com> wrote:
> >
> > Thanks Gilles,
> >
> > I can provide the same sort of stats across a clustering example
> > across commons-math (KMeans) vs Apache Ignite, Apache Spark and
> > Rheem/Apache Wayang (incubating) if anyone would find that useful. It
> > would no doubt lead to similar conclusions.
> >
> > Cheers, Paul.
> >
> > On Sun, Apr 25, 2021 at 8:15 AM Gilles Sadowski <gi...@gmail.com> wrote:
> > >
> > > Hello Paul.
> > >
> > > Le sam. 24 avr. 2021 à 04:42, Paul King <pa...@gmail.com> a écrit :
> > > >
> > > > I added some more comments relevant to if the proposed algorithm
> > > > belongs somewhere in the commons "math" area back in the Jira:
> > > >
> > > > https://issues.apache.org/jira/browse/MATH-1563
> > >
> > > Thanks for a "real" user's testimony.
> > >
> > > As the ML is still the official forum for such a discussion, I'm quoting
> > > part of your post on JIRA:
> > > ---CUT---
> > > For linear regression, taking just one example dataset, commons-math
> > > is a couple of library calls for a single 2M library and solves the
> > > problem in 240ms. Both Ignite and Spark involve "firing up the
> > > platform" and the code is more complex for simple scenarios. Spark has
> > > a 181M footprint across 210 jars and solves the problem in about 20s.
> > > Ignite has a 87M footprint across 85 jars and solves the problem in >
> > > 40s. But I can also find more complex scenarios which need to scale
> > > where Ignite and Spark really come into their own.
> > > ---CUT---
> > >
> > > A similar rationale was behind my developing/using the SOFM
> > > functionality in the "o.a.c.m.ml.neuralnet" package: I needed a
> > > proof of concept, and taking the "lightweight" path seemed more
> > > effective than experimenting with those platforms.
> > > Admittingly, at that epoch, there were people around, who were
> > > maintaining the clustering and GA codes; hence, the prototyping
> > > of a machine-learning library didn't look strange to anyone.
> > >
> > > Regards,
> > > Gilles
> > >
> > > >>> [...]
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > > For additional commands, e-mail: dev-help@commons.apache.org
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > For additional commands, e-mail: dev-help@commons.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: The case for a Commons component

Posted by Gilles Sadowski <gi...@gmail.com>.

Le dim. 25 avr. 2021 à 16:27, sebb <se...@gmail.com> a écrit :
>
> I assume this thread is about the possible ML component.

I hesitated with Subject: "The case for *any* Commons component".

> If the code was developed by Commons, I assume it could be used as
> part of Spark.
> However Commons does not currently have many developers who are
> familiar with the field.
> So it would seem to me better to have development done by a project
> which does have relevant experience.

I expressed the same concern/opinion; in fact, if I were tempted
to implement something of the like now, I would probably indeed
start experimenting with Spark. [CM's implementation of SOFM
dates from early 2014.]

On the other hand, several people (at different times) expressed
an interest of having such codes free of the "high-level" features
that come with the "platforms".
My own current usage of the "neuralnet" package does not
warrant a move to Spark.
I'm also interested in refactoring the "clustering" package (but will
not pursue it alone).

> You say that Spark etc have lots of jars.
> Surely that allows for it to be implemented as a separate jar which
> can either be used as part of the Spark platform, or used
> independently?

https://spark.apache.org/docs/latest/spark-standalone.html

TL;DR; but there are many references to a "cluster", so that seems
the common use-case, while code here could for example focus on
multi-thread-ready components, primarily targetting applications that
run in a single multi-core machine).

> The only other option I see is for Commons to persuade some developers
> who are familiar with the field to join Commons to assist with the
> algorithms.
> Existing Commons developers can help manage the logistics of packaging
> and releasing the code, as this does not require in depth knowledge of
> the design.
> However this only makes sense if the developers skilled in the are are
> prepared to assist long-term.

I try to make that crystal-clear to every new contributor (cf. proposal to
revive "Commons Graph", the exchange on refactoring  the "clustering"
package, the necessary features for a GA implementation that purports
to be more than a toy example, ...).

However, it is obviously impossible to enforce something as "prepared
to assist long-term"; it is rightfully a necessary condition for being
granted commit access, but it's up to the project to create a "place"
where people want to stay (and know what to expect).
For people interested in "ML" (not necessarily experts: They could be
developers willing to implement standard algorithms, as we did in CM),
it means that there should be global guidelines (like there were for CM)
such as e.g. "multi-thread-ready" (in addition to the usual "full doc",
"full coverage", etc.), and a repository for those codes.

We don't have much grasp on the arrival rate of contributors but I
contend that a component with a specific scope is much more
appealing (especially to newcomers) than a mixed bag à la CM
which nobody here is able (or willing) to maintain (and the reason
why I'll only merge bug-fixes).

Not creating the "place" will of course pave the way to a self-fulfilling
prophecy.

Gilles

> On Sat, 24 Apr 2021 at 23:32, Paul King <pa...@gmail.com> wrote:
> >
> > Thanks Gilles,
> >
> > I can provide the same sort of stats across a clustering example
> > across commons-math (KMeans) vs Apache Ignite, Apache Spark and
> > Rheem/Apache Wayang (incubating) if anyone would find that useful. It
> > would no doubt lead to similar conclusions.
> >
> > Cheers, Paul.
> >
> > On Sun, Apr 25, 2021 at 8:15 AM Gilles Sadowski <gi...@gmail.com> wrote:
> > >
> > > Hello Paul.
> > >
> > > Le sam. 24 avr. 2021 à 04:42, Paul King <pa...@gmail.com> a écrit :
> > > >
> > > > I added some more comments relevant to if the proposed algorithm
> > > > belongs somewhere in the commons "math" area back in the Jira:
> > > >
> > > > https://issues.apache.org/jira/browse/MATH-1563
> > >
> > > Thanks for a "real" user's testimony.
> > >
> > > As the ML is still the official forum for such a discussion, I'm quoting
> > > part of your post on JIRA:
> > > ---CUT---
> > > For linear regression, taking just one example dataset, commons-math
> > > is a couple of library calls for a single 2M library and solves the
> > > problem in 240ms. Both Ignite and Spark involve "firing up the
> > > platform" and the code is more complex for simple scenarios. Spark has
> > > a 181M footprint across 210 jars and solves the problem in about 20s.
> > > Ignite has a 87M footprint across 85 jars and solves the problem in >
> > > 40s. But I can also find more complex scenarios which need to scale
> > > where Ignite and Spark really come into their own.
> > > ---CUT---
> > >
> > > A similar rationale was behind my developing/using the SOFM
> > > functionality in the "o.a.c.m.ml.neuralnet" package: I needed a
> > > proof of concept, and taking the "lightweight" path seemed more
> > > effective than experimenting with those platforms.
> > > Admittingly, at that epoch, there were people around, who were
> > > maintaining the clustering and GA codes; hence, the prototyping
> > > of a machine-learning library didn't look strange to anyone.
> > >
> > > Regards,
> > > Gilles
> > >
> > > >>> [...]

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: The case for a Commons component

Posted by sebb <se...@gmail.com>.

I assume this thread is about the possible ML component.

If the code was developed by Commons, I assume it could be used as
part of Spark.
However Commons does not currently have many developers who are
familiar with the field.
So it would seem to me better to have development done by a project
which does have relevant experience.

You say that Spark etc have lots of jars.
Surely that allows for it to be implemented as a separate jar which
can either be used as part of the Spark platform, or used
independently?

The only other option I see is for Commons to persuade some developers
who are familiar with the field to join Commons to assist with the
algorithms.
Existing Commons developers can help manage the logistics of packaging
and releasing the code, as this does not require in depth knowledge of
the design.
However this only makes sense if the developers skilled in the are are
prepared to assist long-term.


On Sat, 24 Apr 2021 at 23:32, Paul King <pa...@gmail.com> wrote:
>
> Thanks Gilles,
>
> I can provide the same sort of stats across a clustering example
> across commons-math (KMeans) vs Apache Ignite, Apache Spark and
> Rheem/Apache Wayang (incubating) if anyone would find that useful. It
> would no doubt lead to similar conclusions.
>
> Cheers, Paul.
>
> On Sun, Apr 25, 2021 at 8:15 AM Gilles Sadowski <gi...@gmail.com> wrote:
> >
> > Hello Paul.
> >
> > Le sam. 24 avr. 2021 à 04:42, Paul King <pa...@gmail.com> a écrit :
> > >
> > > I added some more comments relevant to if the proposed algorithm
> > > belongs somewhere in the commons "math" area back in the Jira:
> > >
> > > https://issues.apache.org/jira/browse/MATH-1563
> >
> > Thanks for a "real" user's testimony.
> >
> > As the ML is still the official forum for such a discussion, I'm quoting
> > part of your post on JIRA:
> > ---CUT---
> > For linear regression, taking just one example dataset, commons-math
> > is a couple of library calls for a single 2M library and solves the
> > problem in 240ms. Both Ignite and Spark involve "firing up the
> > platform" and the code is more complex for simple scenarios. Spark has
> > a 181M footprint across 210 jars and solves the problem in about 20s.
> > Ignite has a 87M footprint across 85 jars and solves the problem in >
> > 40s. But I can also find more complex scenarios which need to scale
> > where Ignite and Spark really come into their own.
> > ---CUT---
> >
> > A similar rationale was behind my developing/using the SOFM
> > functionality in the "o.a.c.m.ml.neuralnet" package: I needed a
> > proof of concept, and taking the "lightweight" path seemed more
> > effective than experimenting with those platforms.
> > Admittingly, at that epoch, there were people around, who were
> > maintaining the clustering and GA codes; hence, the prototyping
> > of a machine-learning library didn't look strange to anyone.
> >
> > Regards,
> > Gilles
> >
> > >>> [...]
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > For additional commands, e-mail: dev-help@commons.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: The case for a Commons component

Posted by Gilles Sadowski <gi...@gmail.com>.

Le dim. 25 avr. 2021 à 00:32, Paul King <pa...@gmail.com> a écrit :
>
> Thanks Gilles,
>
> I can provide the same sort of stats across a clustering example
> across commons-math (KMeans) vs Apache Ignite, Apache Spark and
> Rheem/Apache Wayang (incubating) if anyone would find that useful. It
> would no doubt lead to similar conclusions.

There also were relatively recent discussions concerning the codes in
the "o.a.c.m.ml.clustering" package.[1]
If they are useful as of the old CM v3.6.1, they can very probably be
improved upon in terms of flexibilty[2] and performance through (a.o.
things) multi-threading (in much the same way as for GA, I guess).

Best regards,
Gilles

[1] https://issues.apache.org/jira/browse/MATH-1515
[2] Fixes and enhancements are already in CM "master" branch.

>
> Cheers, Paul.
>
> On Sun, Apr 25, 2021 at 8:15 AM Gilles Sadowski <gi...@gmail.com> wrote:
> >
> > Hello Paul.
> >
> > Le sam. 24 avr. 2021 à 04:42, Paul King <pa...@gmail.com> a écrit :
> > >
> > > I added some more comments relevant to if the proposed algorithm
> > > belongs somewhere in the commons "math" area back in the Jira:
> > >
> > > https://issues.apache.org/jira/browse/MATH-1563
> >
> > Thanks for a "real" user's testimony.
> >
> > As the ML is still the official forum for such a discussion, I'm quoting
> > part of your post on JIRA:
> > ---CUT---
> > For linear regression, taking just one example dataset, commons-math
> > is a couple of library calls for a single 2M library and solves the
> > problem in 240ms. Both Ignite and Spark involve "firing up the
> > platform" and the code is more complex for simple scenarios. Spark has
> > a 181M footprint across 210 jars and solves the problem in about 20s.
> > Ignite has a 87M footprint across 85 jars and solves the problem in >
> > 40s. But I can also find more complex scenarios which need to scale
> > where Ignite and Spark really come into their own.
> > ---CUT---
> >
> > A similar rationale was behind my developing/using the SOFM
> > functionality in the "o.a.c.m.ml.neuralnet" package: I needed a
> > proof of concept, and taking the "lightweight" path seemed more
> > effective than experimenting with those platforms.
> > Admittingly, at that epoch, there were people around, who were
> > maintaining the clustering and GA codes; hence, the prototyping
> > of a machine-learning library didn't look strange to anyone.
> >
> > Regards,
> > Gilles
> >
> > >>> [...]

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: The case for a Commons component

Posted by Paul King <pa...@gmail.com>.

Thanks Gilles,

I can provide the same sort of stats across a clustering example
across commons-math (KMeans) vs Apache Ignite, Apache Spark and
Rheem/Apache Wayang (incubating) if anyone would find that useful. It
would no doubt lead to similar conclusions.

Cheers, Paul.

On Sun, Apr 25, 2021 at 8:15 AM Gilles Sadowski <gi...@gmail.com> wrote:
>
> Hello Paul.
>
> Le sam. 24 avr. 2021 à 04:42, Paul King <pa...@gmail.com> a écrit :
> >
> > I added some more comments relevant to if the proposed algorithm
> > belongs somewhere in the commons "math" area back in the Jira:
> >
> > https://issues.apache.org/jira/browse/MATH-1563
>
> Thanks for a "real" user's testimony.
>
> As the ML is still the official forum for such a discussion, I'm quoting
> part of your post on JIRA:
> ---CUT---
> For linear regression, taking just one example dataset, commons-math
> is a couple of library calls for a single 2M library and solves the
> problem in 240ms. Both Ignite and Spark involve "firing up the
> platform" and the code is more complex for simple scenarios. Spark has
> a 181M footprint across 210 jars and solves the problem in about 20s.
> Ignite has a 87M footprint across 85 jars and solves the problem in >
> 40s. But I can also find more complex scenarios which need to scale
> where Ignite and Spark really come into their own.
> ---CUT---
>
> A similar rationale was behind my developing/using the SOFM
> functionality in the "o.a.c.m.ml.neuralnet" package: I needed a
> proof of concept, and taking the "lightweight" path seemed more
> effective than experimenting with those platforms.
> Admittingly, at that epoch, there were people around, who were
> maintaining the clustering and GA codes; hence, the prototyping
> of a machine-learning library didn't look strange to anyone.
>
> Regards,
> Gilles
>
> >>> [...]
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

The case for a Commons component

Posted by Gilles Sadowski <gi...@gmail.com>.

Hello Paul.

Le sam. 24 avr. 2021 à 04:42, Paul King <pa...@gmail.com> a écrit :
>
> I added some more comments relevant to if the proposed algorithm
> belongs somewhere in the commons "math" area back in the Jira:
>
> https://issues.apache.org/jira/browse/MATH-1563

Thanks for a "real" user's testimony.

As the ML is still the official forum for such a discussion, I'm quoting
part of your post on JIRA:
---CUT---
For linear regression, taking just one example dataset, commons-math
is a couple of library calls for a single 2M library and solves the
problem in 240ms. Both Ignite and Spark involve "firing up the
platform" and the code is more complex for simple scenarios. Spark has
a 181M footprint across 210 jars and solves the problem in about 20s.
Ignite has a 87M footprint across 85 jars and solves the problem in >
40s. But I can also find more complex scenarios which need to scale
where Ignite and Spark really come into their own.
---CUT---

A similar rationale was behind my developing/using the SOFM
functionality in the "o.a.c.m.ml.neuralnet" package: I needed a
proof of concept, and taking the "lightweight" path seemed more
effective than experimenting with those platforms.
Admittingly, at that epoch, there were people around, who were
maintaining the clustering and GA codes; hence, the prototyping
of a machine-learning library didn't look strange to anyone.

Regards,
Gilles

>>> [...]

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Vote] Create a "machine learning" component

Posted by Gilles Sadowski <gi...@gmail.com>.

Le ven. 30 avr. 2021 à 18:00, Avijit Basak <av...@gmail.com> a écrit :
>
> Hi
>
>          I would like to vote for *commons-ml*.

Wrong thread (the vote on this one has been cancelled due to being
idle for too long):  The new vote is there:
   https://markmail.org/message/g5gwof3qdkzyvedc

>>>  [...]

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Vote] Create a "machine learning" component

Posted by Avijit Basak <av...@gmail.com>.

Hi

         I would like to vote for *commons-ml*.

Thanks & Regards
--Avijit Basak

On Sat, 24 Apr 2021 at 08:12, Paul King <pa...@gmail.com> wrote:

> I added some more comments relevant to if the proposed algorithm
> belongs somewhere in the commons "math" area back in the Jira:
>
> https://issues.apache.org/jira/browse/MATH-1563
>
> Cheers, Paul.
>
> On Wed, Apr 21, 2021 at 7:26 PM Gilles Sadowski <gi...@gmail.com>
> wrote:
> >
> > Le mer. 21 avr. 2021 à 08:56, Paul King <pa...@gmail.com> a
> écrit :
> > >
> > > On Wed, Apr 21, 2021 at 4:12 PM Ralph Goers <
> ralph.goers@dslextreme.com> wrote:
> > > >
> > > > Why are y’all having a long discussion on Vote thread?
> >
> > Paul King's comments is interesting information that could
> > bear on people's decision on the proposal (especially the
> > licence's issue).
> > As for the question of whether the purported functionality would
> > find a better home elsewhere with the ASF, I'm sure what would
> > be the conclusion (apart from Avijit Bask's plain preference (?) to
> > develop a standalone component, as per Commons' requirement).
> >
> > >
> > > Fair enough. I am +1 (non-binding).
> >
> > So currently, IIRC the tally (on creating a dedicated component) is
> >   Gilles Sadowski +1
> >   Avijit Basak +1
> >   Paul King +1
> > And several -1 on the initially suggested name; but the proposed
> > name has been changed early on to "commons-machinelearning"
> > (in order to comply with Commons' tradition of full words and
> > descriptive names).
> > [Please correct if it doesn't reflect what has been expressed.]
> >
> > Where does that lead us?
> >
> > Regards,
> > Gilles
> >
> > >>> [...]
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > For additional commands, e-mail: dev-help@commons.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

-- 
Avijit Basak

Re: [Vote] Create a "machine learning" component

Posted by Paul King <pa...@gmail.com>.

I added some more comments relevant to if the proposed algorithm
belongs somewhere in the commons "math" area back in the Jira:

https://issues.apache.org/jira/browse/MATH-1563

Cheers, Paul.

On Wed, Apr 21, 2021 at 7:26 PM Gilles Sadowski <gi...@gmail.com> wrote:
>
> Le mer. 21 avr. 2021 à 08:56, Paul King <pa...@gmail.com> a écrit :
> >
> > On Wed, Apr 21, 2021 at 4:12 PM Ralph Goers <ra...@dslextreme.com> wrote:
> > >
> > > Why are y’all having a long discussion on Vote thread?
>
> Paul King's comments is interesting information that could
> bear on people's decision on the proposal (especially the
> licence's issue).
> As for the question of whether the purported functionality would
> find a better home elsewhere with the ASF, I'm sure what would
> be the conclusion (apart from Avijit Bask's plain preference (?) to
> develop a standalone component, as per Commons' requirement).
>
> >
> > Fair enough. I am +1 (non-binding).
>
> So currently, IIRC the tally (on creating a dedicated component) is
>   Gilles Sadowski +1
>   Avijit Basak +1
>   Paul King +1
> And several -1 on the initially suggested name; but the proposed
> name has been changed early on to "commons-machinelearning"
> (in order to comply with Commons' tradition of full words and
> descriptive names).
> [Please correct if it doesn't reflect what has been expressed.]
>
> Where does that lead us?
>
> Regards,
> Gilles
>
> >>> [...]
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

[Cancel][Vote] Create a "machine learning" component

Posted by Gilles Sadowski <gi...@gmail.com>.

>>> [...]
> >
> > So currently, IIRC the tally (on creating a dedicated component) is
> >  Gilles Sadowski +1
> >  Avijit Basak +1
> >  Paul King +1
> > And several -1 on the initially suggested name; but the proposed
> > name has been changed early on to "commons-machinelearning"
> > (in order to comply with Commons' tradition of full words and
> > descriptive names).
> > [Please correct if it doesn't reflect what has been expressed.]
> >
> > Where does that lead us?
>
> With a vote thread that has been open for over 2 months that apparently should have been a discussion thread. I would suggest you cancel this vote and create a new Vote thread proposing commons-machinelearning.

Stopping this thread as a [vote].

Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Vote] Create a "machine learning" component

Posted by Ralph Goers <ra...@dslextreme.com>.


> On Apr 21, 2021, at 2:25 AM, Gilles Sadowski <gi...@gmail.com> wrote:
> 
> Le mer. 21 avr. 2021 à 08:56, Paul King <pa...@gmail.com> a écrit :
>> 
>> On Wed, Apr 21, 2021 at 4:12 PM Ralph Goers <ra...@dslextreme.com> wrote:
>>> 
>>> Why are y’all having a long discussion on Vote thread?
> 
> Paul King's comments is interesting information that could
> bear on people's decision on the proposal (especially the
> licence's issue).

The point is that discussions shouldn’t happen on a vote thread. The thread should be forked into its own  [DISCUSS][VOTE].

> As for the question of whether the purported functionality would
> find a better home elsewhere with the ASF, I'm sure what would
> be the conclusion (apart from Avijit Bask's plain preference (?) to
> develop a standalone component, as per Commons' requirement).
> 
>> 
>> Fair enough. I am +1 (non-binding).
> 
> So currently, IIRC the tally (on creating a dedicated component) is
>  Gilles Sadowski +1
>  Avijit Basak +1
>  Paul King +1
> And several -1 on the initially suggested name; but the proposed
> name has been changed early on to "commons-machinelearning"
> (in order to comply with Commons' tradition of full words and
> descriptive names).
> [Please correct if it doesn't reflect what has been expressed.]
> 
> Where does that lead us?

With a vote thread that has been open for over 2 months that apparently should have been a discussion thread.  I would suggest you cancel this vote and create a new Vote thread proposing commons-machinelearning.

Ralph

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Vote] Create a "machine learning" component

Posted by Gilles Sadowski <gi...@gmail.com>.

Le mer. 21 avr. 2021 à 08:56, Paul King <pa...@gmail.com> a écrit :
>
> On Wed, Apr 21, 2021 at 4:12 PM Ralph Goers <ra...@dslextreme.com> wrote:
> >
> > Why are y’all having a long discussion on Vote thread?

Paul King's comments is interesting information that could
bear on people's decision on the proposal (especially the
licence's issue).
As for the question of whether the purported functionality would
find a better home elsewhere with the ASF, I'm sure what would
be the conclusion (apart from Avijit Bask's plain preference (?) to
develop a standalone component, as per Commons' requirement).

>
> Fair enough. I am +1 (non-binding).

So currently, IIRC the tally (on creating a dedicated component) is
  Gilles Sadowski +1
  Avijit Basak +1
  Paul King +1
And several -1 on the initially suggested name; but the proposed
name has been changed early on to "commons-machinelearning"
(in order to comply with Commons' tradition of full words and
descriptive names).
[Please correct if it doesn't reflect what has been expressed.]

Where does that lead us?

Regards,
Gilles

>>> [...]

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Vote] Create a "machine learning" component

Posted by Paul King <pa...@gmail.com>.

On Wed, Apr 21, 2021 at 4:12 PM Ralph Goers <ra...@dslextreme.com> wrote:
>
> Why are y’all having a long discussion on Vote thread?

Fair enough. I am +1 (non-binding).

Cheers, Paul.

> > On Apr 20, 2021, at 10:33 PM, Paul King <pa...@gmail.com> wrote:
> >
> > Hi Avijit Basak,
> >
> > +1 to thanking you for your offer. Just a couple of comments from
> > someone who is only a marginal contributor to the commons project.
> >
> > I would be keen to see a new commons component incorporating various
> > machine learning/data science components. The other main contenders
> > that seem to be reasonably actively developed are Smile[1] and Weka[2]
> > which are licensed under GPL or LGPL. Such a component would be a
> > natural fit for the algorithm you propose. If you look at Apache
> > Spark[3] and Apache Ignite[4], they both offer some "machine learning"
> > offerings but they tend to only support algorithms which are either
> > "embarrassingly" parallel or inherently parallel. They tend not to
> > include sequential by nature algorithms. Even "embarrassingly"
> > parallel algorithms are often not included since they can typically
> > already be used already by Spark, Ignite, Beam, Wayang, or home-grown
> > threads/fibres.
> >
> > There has been previous research into PGA with Hadoop, Spark and
> > Ignite[5][6] but so far, none of that has made it into those
> > distributions as far as I know. I don't know how customisable the
> > Ignite GA algorithm[7] is but it might be worth looking into.
> >
> > With respect to component naming, you either go very broad with "math"
> > or something like "datascience", or potentially too narrow with
> > something like "ml" or "machinelearning". Of the latter two, "ml" is
> > most common when bundled into some other framework. The other
> > alternative is to simply come up with another name but the typical
> > convention within commons is to use a descriptive to purpose name.
> > Numerous "ml" libraries also bundle things like regression into them,
> > so there is precedence for such libraries to be algorithms broadly in
> > the topic space. On the commons math front, I think regression is
> > currently earmarked for statistics but not sure it has made the jump
> > as of yet. An "ml" home would be equally suitable in my mind.
> >
> > Having said all of that, as others have pointed out, the volunteer
> > space in commons is somewhat lean at the moment. I would be happy to
> > help a little from the ASF side of things but machine learning/data
> > science isn't my principal area of expertise nor a major aspect in my
> > "day job" activities, it probably takes others with interest to fully
> > give this the effort it deserves. But sometimes someone has to get the
> > ball rolling before other interested parties show up.
> >
> > Cheers, Paul
> >
> > [1] https://haifengl.github.io/ <https://haifengl.github.io/>
> > [2] https://www.cs.waikato.ac.nz/ml/weka/ <https://www.cs.waikato.ac.nz/ml/weka/>
> > [3] https://spark.apache.org/mllib/ <https://spark.apache.org/mllib/>
> > [4] https://ignite.apache.org/docs/latest/machine-learning/machine-learning <https://ignite.apache.org/docs/latest/machine-learning/machine-learning>
> > [5] https://hajirajabeen.github.io/publications/SGA.pdf <https://hajirajabeen.github.io/publications/SGA.pdf>
> > [6] https://dzone.com/articles/genetic-algorithms-with-apache-ignite <https://dzone.com/articles/genetic-algorithms-with-apache-ignite>
> > [7] https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/ml/util/genetic/GeneticAlgorithm.html <https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/ml/util/genetic/GeneticAlgorithm.html>
> >
> > On Sun, Feb 14, 2021 at 6:06 PM Avijit Basak <avijit.basak@gmail.com <ma...@gmail.com>> wrote:
> >>
> >> Hi
> >>
> >>       I would like to mention a few points here. Genetic Algorithm has a
> >> vast range of applications in optimization and search problems. Machine
> >> learning is only one of those.
> >>       If we couple the new GA library with any specific domain like ml it
> >> would be meaningless for people working in other domains. They have to
> >> incorporate the entire ml library which may be completely unrelated to
> >> their project. Coupling it with any technology like spark might also limit
> >> it's usability.
> >>       If a separate component is not approved for this change then we can
> >> incorporate the changes as part of *commons.math* library.
> >>       The same library can be reused in ml or neural network libraries as
> >> a dependency.
> >>       Kindly share further views on this.
> >>
> >> Thanks & Regards
> >> --Avijit Basak
> >>
> >> On Wed, 10 Feb 2021 at 19:49, Gilles Sadowski <gilleseran@gmail.com <ma...@gmail.com>> wrote:
> >>
> >>> Le mer. 10 févr. 2021 à 13:19, sebb <sebbaz@gmail.com <ma...@gmail.com>> a écrit :
> >>>>
> >>>> Likewise, commons-ml is too cryptic.
> >>>>
> >>>> Also, the Spark project has a machine-learning library:
> >>>>
> >>>> https://spark.apache.org/mllib/ <https://spark.apache.org/mllib/>
> >>>
> >>> Thanks for the pointer.
> >>>
> >>>>
> >>>> Maybe that would be better home?
> >>>
> >>> On the face of it, probably.
> >>> [For sure, Avijit should comment on the suggestion.]
> >>>
> >>> On the other hand, "Commons" is the place where one can pick "bare
> >>> bone" implementations, and add the functionality to one's application
> >>> without necessarily comply with an overarching framework.
> >>> [I don't mean that framework compliance is bad; quite the contrary, it is
> >>> hopefully the result of a thorough reflection by experts.  But ... cf. the
> >>> numerous "no-dependency" discussions ...]
> >>>
> >>> Actually, concerning Avijit's proposed contribution, didn't I say:[1]
> >>> ---CUT---
> >>> Thus, I think that we must assess whether the "genetic algorithms"
> >>> functionality has a reasonable future within "Apache Commons" (i.e.
> >>> potential users and contributors) while there exist other libraries that
> >>> seem much more advanced for any serious usage.
> >>> ---CUT---
> >>>
> >>>> I'm also a bit concerned as to whether there are sufficient developers
> >>>> here with knowledge of the ML domain to be able to support the code in
> >>>> the future.
> >>>
> >>> An interesting point; by all means not a new one (see e.g. [2]).
> >>>
> >>> Isn't it the same point I've been making about "Commons Math" (CM)?
> >>> There has been no releases because nobody here is able (or is willing
> >>> to) support it.
> >>>
> >>> Concerning the support of the purported "machinelearning" component:
> >>> 1. Package
> >>>        org.apache.commons.math4.ml.neuralnet
> >>>    * I've written it entirely and I have applications that depend on it
> >>> (and I
> >>>      cannot assume that I could easily switch to, or port it to, Spark),
> >>> so I
> >>>      can reasonably ensure that it would be supported.
> >>> 2. Package
> >>>        org.apache.commons.math4.ml.clustering
> >>>    * Functionality is mentioned in Spark's "mllib" user guide.
> >>>    * When a new feature was last contributed[3], it was noticed[4][5][6]
> >>>      that improvement were needed (but there was no follow-up).
> >>>    * I've an application that depend on it (from CM v3.6.1) but I wouldn't
> >>>      support it if shipped in CM v4.0.
> >>> 3. Package
> >>>        org.apache.commons.math4.genetics
> >>>    * Part of my "end-of-study" project consisted in a GA implementation.
> >>>      I've never used the CM implementation, and I don't deny that there
> >>>      could be perfectly fine uses of it but, just looking at the code, it
> >>> seems
> >>>      obvious that it cannot compete feature-wise with other libraries
> >>> out there.
> >>>    * I've suggested long ago that, without anyone supporting it actively
> >>> (and
> >>>      no known user community), it should be dropped from CM.
> >>>    * Avijit expressed a willingness to improve the functionality:  Is
> >>> this enough
> >>>      for the PMC to create a new component?  From the experience with the
> >>>      "clustering" package mentioned above, I'd tend to think
> >>> (unfortunately)
> >>>      that it isn't.  He should first explore whether the Spark community
> >>> is
> >>>      interested, that the GA functionality be moved over there.
> >>>
> >>> Gilles
> >>>
> >>> [1] https://issues.apache.org/jira/browse/MATH-1563 <https://issues.apache.org/jira/browse/MATH-1563>
> >>> [2] https://markmail.org/message/26yxj5vhysdsoety <https://markmail.org/message/26yxj5vhysdsoety>
> >>> [3] https://issues.apache.org/jira/projects/MATH/issues/MATH-1509 <https://issues.apache.org/jira/projects/MATH/issues/MATH-1509>
> >>> [4] https://issues.apache.org/jira/projects/MATH/issues/MATH-1524 <https://issues.apache.org/jira/projects/MATH/issues/MATH-1524>
> >>> [5] https://issues.apache.org/jira/projects/MATH/issues/MATH-1528 <https://issues.apache.org/jira/projects/MATH/issues/MATH-1528>
> >>> [6] https://issues.apache.org/jira/projects/MATH/issues/MATH-1526 <https://issues.apache.org/jira/projects/MATH/issues/MATH-1526>
> >>>
> >>>>
> >>>> On Wed, 10 Feb 2021 at 08:27, Emmanuel Bourg <ebourg@apache.org <ma...@apache.org>> wrote:
> >>>>>
> >>>>> -1 for commons-ml for the same reasons.
> >>>>>
> >>>>> What about commons-machine-learning or commons-math-learning? The
> >>> latter
> >>>>> is as long as commons-configuration.
> >>>>>
> >>>>> Emmanuel Bourg
> >>>>>
> >>>>>
> >>>>> Le 2021-02-10 03:27, Ralph Goers a écrit :
> >>>>>> -1 on commons-ml as the name. My first thought is such a repo would
> >>>>>> hold stuff related to mailing lists. Then again maybe it contains
> >>>>>> stuff relating to markup languages. Maybe it is Apache’s version of
> >>>>>> the ML Programming Language [1].
> >>>>>>
> >>>>>> However, I wouldn’t be -1 on commons-math-ml, although at best I
> >>> would
> >>>>>> be +0 since it is still not obvious what it would contain.
> >>>>>>
> >>>>>> Ralph
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> >>> For additional commands, e-mail: dev-help@commons.apache.org
> >>>
> >>>
> >>
> >> --
> >> Avijit Basak
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org <ma...@commons.apache.org>
> > For additional commands, e-mail: dev-help@commons.apache.org <ma...@commons.apache.org>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Vote] Create a "machine learning" component

Posted by Ralph Goers <ra...@dslextreme.com>.

Why are y’all having a long discussion on Vote thread?

Ralph

> On Apr 20, 2021, at 10:33 PM, Paul King <pa...@gmail.com> wrote:
> 
> Hi Avijit Basak,
> 
> +1 to thanking you for your offer. Just a couple of comments from
> someone who is only a marginal contributor to the commons project.
> 
> I would be keen to see a new commons component incorporating various
> machine learning/data science components. The other main contenders
> that seem to be reasonably actively developed are Smile[1] and Weka[2]
> which are licensed under GPL or LGPL. Such a component would be a
> natural fit for the algorithm you propose. If you look at Apache
> Spark[3] and Apache Ignite[4], they both offer some "machine learning"
> offerings but they tend to only support algorithms which are either
> "embarrassingly" parallel or inherently parallel. They tend not to
> include sequential by nature algorithms. Even "embarrassingly"
> parallel algorithms are often not included since they can typically
> already be used already by Spark, Ignite, Beam, Wayang, or home-grown
> threads/fibres.
> 
> There has been previous research into PGA with Hadoop, Spark and
> Ignite[5][6] but so far, none of that has made it into those
> distributions as far as I know. I don't know how customisable the
> Ignite GA algorithm[7] is but it might be worth looking into.
> 
> With respect to component naming, you either go very broad with "math"
> or something like "datascience", or potentially too narrow with
> something like "ml" or "machinelearning". Of the latter two, "ml" is
> most common when bundled into some other framework. The other
> alternative is to simply come up with another name but the typical
> convention within commons is to use a descriptive to purpose name.
> Numerous "ml" libraries also bundle things like regression into them,
> so there is precedence for such libraries to be algorithms broadly in
> the topic space. On the commons math front, I think regression is
> currently earmarked for statistics but not sure it has made the jump
> as of yet. An "ml" home would be equally suitable in my mind.
> 
> Having said all of that, as others have pointed out, the volunteer
> space in commons is somewhat lean at the moment. I would be happy to
> help a little from the ASF side of things but machine learning/data
> science isn't my principal area of expertise nor a major aspect in my
> "day job" activities, it probably takes others with interest to fully
> give this the effort it deserves. But sometimes someone has to get the
> ball rolling before other interested parties show up.
> 
> Cheers, Paul
> 
> [1] https://haifengl.github.io/ <https://haifengl.github.io/>
> [2] https://www.cs.waikato.ac.nz/ml/weka/ <https://www.cs.waikato.ac.nz/ml/weka/>
> [3] https://spark.apache.org/mllib/ <https://spark.apache.org/mllib/>
> [4] https://ignite.apache.org/docs/latest/machine-learning/machine-learning <https://ignite.apache.org/docs/latest/machine-learning/machine-learning>
> [5] https://hajirajabeen.github.io/publications/SGA.pdf <https://hajirajabeen.github.io/publications/SGA.pdf>
> [6] https://dzone.com/articles/genetic-algorithms-with-apache-ignite <https://dzone.com/articles/genetic-algorithms-with-apache-ignite>
> [7] https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/ml/util/genetic/GeneticAlgorithm.html <https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/ml/util/genetic/GeneticAlgorithm.html>
> 
> On Sun, Feb 14, 2021 at 6:06 PM Avijit Basak <avijit.basak@gmail.com <ma...@gmail.com>> wrote:
>> 
>> Hi
>> 
>>       I would like to mention a few points here. Genetic Algorithm has a
>> vast range of applications in optimization and search problems. Machine
>> learning is only one of those.
>>       If we couple the new GA library with any specific domain like ml it
>> would be meaningless for people working in other domains. They have to
>> incorporate the entire ml library which may be completely unrelated to
>> their project. Coupling it with any technology like spark might also limit
>> it's usability.
>>       If a separate component is not approved for this change then we can
>> incorporate the changes as part of *commons.math* library.
>>       The same library can be reused in ml or neural network libraries as
>> a dependency.
>>       Kindly share further views on this.
>> 
>> Thanks & Regards
>> --Avijit Basak
>> 
>> On Wed, 10 Feb 2021 at 19:49, Gilles Sadowski <gilleseran@gmail.com <ma...@gmail.com>> wrote:
>> 
>>> Le mer. 10 févr. 2021 à 13:19, sebb <sebbaz@gmail.com <ma...@gmail.com>> a écrit :
>>>> 
>>>> Likewise, commons-ml is too cryptic.
>>>> 
>>>> Also, the Spark project has a machine-learning library:
>>>> 
>>>> https://spark.apache.org/mllib/ <https://spark.apache.org/mllib/>
>>> 
>>> Thanks for the pointer.
>>> 
>>>> 
>>>> Maybe that would be better home?
>>> 
>>> On the face of it, probably.
>>> [For sure, Avijit should comment on the suggestion.]
>>> 
>>> On the other hand, "Commons" is the place where one can pick "bare
>>> bone" implementations, and add the functionality to one's application
>>> without necessarily comply with an overarching framework.
>>> [I don't mean that framework compliance is bad; quite the contrary, it is
>>> hopefully the result of a thorough reflection by experts.  But ... cf. the
>>> numerous "no-dependency" discussions ...]
>>> 
>>> Actually, concerning Avijit's proposed contribution, didn't I say:[1]
>>> ---CUT---
>>> Thus, I think that we must assess whether the "genetic algorithms"
>>> functionality has a reasonable future within "Apache Commons" (i.e.
>>> potential users and contributors) while there exist other libraries that
>>> seem much more advanced for any serious usage.
>>> ---CUT---
>>> 
>>>> I'm also a bit concerned as to whether there are sufficient developers
>>>> here with knowledge of the ML domain to be able to support the code in
>>>> the future.
>>> 
>>> An interesting point; by all means not a new one (see e.g. [2]).
>>> 
>>> Isn't it the same point I've been making about "Commons Math" (CM)?
>>> There has been no releases because nobody here is able (or is willing
>>> to) support it.
>>> 
>>> Concerning the support of the purported "machinelearning" component:
>>> 1. Package
>>>        org.apache.commons.math4.ml.neuralnet
>>>    * I've written it entirely and I have applications that depend on it
>>> (and I
>>>      cannot assume that I could easily switch to, or port it to, Spark),
>>> so I
>>>      can reasonably ensure that it would be supported.
>>> 2. Package
>>>        org.apache.commons.math4.ml.clustering
>>>    * Functionality is mentioned in Spark's "mllib" user guide.
>>>    * When a new feature was last contributed[3], it was noticed[4][5][6]
>>>      that improvement were needed (but there was no follow-up).
>>>    * I've an application that depend on it (from CM v3.6.1) but I wouldn't
>>>      support it if shipped in CM v4.0.
>>> 3. Package
>>>        org.apache.commons.math4.genetics
>>>    * Part of my "end-of-study" project consisted in a GA implementation.
>>>      I've never used the CM implementation, and I don't deny that there
>>>      could be perfectly fine uses of it but, just looking at the code, it
>>> seems
>>>      obvious that it cannot compete feature-wise with other libraries
>>> out there.
>>>    * I've suggested long ago that, without anyone supporting it actively
>>> (and
>>>      no known user community), it should be dropped from CM.
>>>    * Avijit expressed a willingness to improve the functionality:  Is
>>> this enough
>>>      for the PMC to create a new component?  From the experience with the
>>>      "clustering" package mentioned above, I'd tend to think
>>> (unfortunately)
>>>      that it isn't.  He should first explore whether the Spark community
>>> is
>>>      interested, that the GA functionality be moved over there.
>>> 
>>> Gilles
>>> 
>>> [1] https://issues.apache.org/jira/browse/MATH-1563 <https://issues.apache.org/jira/browse/MATH-1563>
>>> [2] https://markmail.org/message/26yxj5vhysdsoety <https://markmail.org/message/26yxj5vhysdsoety>
>>> [3] https://issues.apache.org/jira/projects/MATH/issues/MATH-1509 <https://issues.apache.org/jira/projects/MATH/issues/MATH-1509>
>>> [4] https://issues.apache.org/jira/projects/MATH/issues/MATH-1524 <https://issues.apache.org/jira/projects/MATH/issues/MATH-1524>
>>> [5] https://issues.apache.org/jira/projects/MATH/issues/MATH-1528 <https://issues.apache.org/jira/projects/MATH/issues/MATH-1528>
>>> [6] https://issues.apache.org/jira/projects/MATH/issues/MATH-1526 <https://issues.apache.org/jira/projects/MATH/issues/MATH-1526>
>>> 
>>>> 
>>>> On Wed, 10 Feb 2021 at 08:27, Emmanuel Bourg <ebourg@apache.org <ma...@apache.org>> wrote:
>>>>> 
>>>>> -1 for commons-ml for the same reasons.
>>>>> 
>>>>> What about commons-machine-learning or commons-math-learning? The
>>> latter
>>>>> is as long as commons-configuration.
>>>>> 
>>>>> Emmanuel Bourg
>>>>> 
>>>>> 
>>>>> Le 2021-02-10 03:27, Ralph Goers a écrit :
>>>>>> -1 on commons-ml as the name. My first thought is such a repo would
>>>>>> hold stuff related to mailing lists. Then again maybe it contains
>>>>>> stuff relating to markup languages. Maybe it is Apache’s version of
>>>>>> the ML Programming Language [1].
>>>>>> 
>>>>>> However, I wouldn’t be -1 on commons-math-ml, although at best I
>>> would
>>>>>> be +0 since it is still not obvious what it would contain.
>>>>>> 
>>>>>> Ralph
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>> 
>>> 
>> 
>> --
>> Avijit Basak
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org <ma...@commons.apache.org>
> For additional commands, e-mail: dev-help@commons.apache.org <ma...@commons.apache.org>

Re: [Vote] Create a "machine learning" component

Posted by Paul King <pa...@gmail.com>.

Hi Avijit Basak,

+1 to thanking you for your offer. Just a couple of comments from
someone who is only a marginal contributor to the commons project.

I would be keen to see a new commons component incorporating various
machine learning/data science components. The other main contenders
that seem to be reasonably actively developed are Smile[1] and Weka[2]
which are licensed under GPL or LGPL. Such a component would be a
natural fit for the algorithm you propose. If you look at Apache
Spark[3] and Apache Ignite[4], they both offer some "machine learning"
offerings but they tend to only support algorithms which are either
"embarrassingly" parallel or inherently parallel. They tend not to
include sequential by nature algorithms. Even "embarrassingly"
parallel algorithms are often not included since they can typically
already be used already by Spark, Ignite, Beam, Wayang, or home-grown
threads/fibres.

There has been previous research into PGA with Hadoop, Spark and
Ignite[5][6] but so far, none of that has made it into those
distributions as far as I know. I don't know how customisable the
Ignite GA algorithm[7] is but it might be worth looking into.

With respect to component naming, you either go very broad with "math"
or something like "datascience", or potentially too narrow with
something like "ml" or "machinelearning". Of the latter two, "ml" is
most common when bundled into some other framework. The other
alternative is to simply come up with another name but the typical
convention within commons is to use a descriptive to purpose name.
Numerous "ml" libraries also bundle things like regression into them,
so there is precedence for such libraries to be algorithms broadly in
the topic space. On the commons math front, I think regression is
currently earmarked for statistics but not sure it has made the jump
as of yet. An "ml" home would be equally suitable in my mind.

Having said all of that, as others have pointed out, the volunteer
space in commons is somewhat lean at the moment. I would be happy to
help a little from the ASF side of things but machine learning/data
science isn't my principal area of expertise nor a major aspect in my
"day job" activities, it probably takes others with interest to fully
give this the effort it deserves. But sometimes someone has to get the
ball rolling before other interested parties show up.

Cheers, Paul

[1] https://haifengl.github.io/
[2] https://www.cs.waikato.ac.nz/ml/weka/
[3] https://spark.apache.org/mllib/
[4] https://ignite.apache.org/docs/latest/machine-learning/machine-learning
[5] https://hajirajabeen.github.io/publications/SGA.pdf
[6] https://dzone.com/articles/genetic-algorithms-with-apache-ignite
[7] https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/ml/util/genetic/GeneticAlgorithm.html

On Sun, Feb 14, 2021 at 6:06 PM Avijit Basak <av...@gmail.com> wrote:
>
> Hi
>
>        I would like to mention a few points here. Genetic Algorithm has a
> vast range of applications in optimization and search problems. Machine
> learning is only one of those.
>        If we couple the new GA library with any specific domain like ml it
> would be meaningless for people working in other domains. They have to
> incorporate the entire ml library which may be completely unrelated to
> their project. Coupling it with any technology like spark might also limit
> it's usability.
>        If a separate component is not approved for this change then we can
> incorporate the changes as part of *commons.math* library.
>        The same library can be reused in ml or neural network libraries as
> a dependency.
>        Kindly share further views on this.
>
> Thanks & Regards
> --Avijit Basak
>
> On Wed, 10 Feb 2021 at 19:49, Gilles Sadowski <gi...@gmail.com> wrote:
>
> > Le mer. 10 févr. 2021 à 13:19, sebb <se...@gmail.com> a écrit :
> > >
> > > Likewise, commons-ml is too cryptic.
> > >
> > > Also, the Spark project has a machine-learning library:
> > >
> > > https://spark.apache.org/mllib/
> >
> > Thanks for the pointer.
> >
> > >
> > > Maybe that would be better home?
> >
> > On the face of it, probably.
> > [For sure, Avijit should comment on the suggestion.]
> >
> > On the other hand, "Commons" is the place where one can pick "bare
> > bone" implementations, and add the functionality to one's application
> > without necessarily comply with an overarching framework.
> > [I don't mean that framework compliance is bad; quite the contrary, it is
> > hopefully the result of a thorough reflection by experts.  But ... cf. the
> > numerous "no-dependency" discussions ...]
> >
> > Actually, concerning Avijit's proposed contribution, didn't I say:[1]
> > ---CUT---
> > Thus, I think that we must assess whether the "genetic algorithms"
> > functionality has a reasonable future within "Apache Commons" (i.e.
> > potential users and contributors) while there exist other libraries that
> > seem much more advanced for any serious usage.
> > ---CUT---
> >
> > > I'm also a bit concerned as to whether there are sufficient developers
> > > here with knowledge of the ML domain to be able to support the code in
> > > the future.
> >
> > An interesting point; by all means not a new one (see e.g. [2]).
> >
> > Isn't it the same point I've been making about "Commons Math" (CM)?
> > There has been no releases because nobody here is able (or is willing
> > to) support it.
> >
> > Concerning the support of the purported "machinelearning" component:
> > 1. Package
> >         org.apache.commons.math4.ml.neuralnet
> >     * I've written it entirely and I have applications that depend on it
> > (and I
> >       cannot assume that I could easily switch to, or port it to, Spark),
> > so I
> >       can reasonably ensure that it would be supported.
> > 2. Package
> >         org.apache.commons.math4.ml.clustering
> >     * Functionality is mentioned in Spark's "mllib" user guide.
> >     * When a new feature was last contributed[3], it was noticed[4][5][6]
> >       that improvement were needed (but there was no follow-up).
> >     * I've an application that depend on it (from CM v3.6.1) but I wouldn't
> >       support it if shipped in CM v4.0.
> > 3. Package
> >         org.apache.commons.math4.genetics
> >     * Part of my "end-of-study" project consisted in a GA implementation.
> >       I've never used the CM implementation, and I don't deny that there
> >       could be perfectly fine uses of it but, just looking at the code, it
> > seems
> >       obvious that it cannot compete feature-wise with other libraries
> > out there.
> >     * I've suggested long ago that, without anyone supporting it actively
> > (and
> >       no known user community), it should be dropped from CM.
> >     * Avijit expressed a willingness to improve the functionality:  Is
> > this enough
> >       for the PMC to create a new component?  From the experience with the
> >       "clustering" package mentioned above, I'd tend to think
> > (unfortunately)
> >       that it isn't.  He should first explore whether the Spark community
> > is
> >       interested, that the GA functionality be moved over there.
> >
> > Gilles
> >
> > [1] https://issues.apache.org/jira/browse/MATH-1563
> > [2] https://markmail.org/message/26yxj5vhysdsoety
> > [3] https://issues.apache.org/jira/projects/MATH/issues/MATH-1509
> > [4] https://issues.apache.org/jira/projects/MATH/issues/MATH-1524
> > [5] https://issues.apache.org/jira/projects/MATH/issues/MATH-1528
> > [6] https://issues.apache.org/jira/projects/MATH/issues/MATH-1526
> >
> > >
> > > On Wed, 10 Feb 2021 at 08:27, Emmanuel Bourg <eb...@apache.org> wrote:
> > > >
> > > > -1 for commons-ml for the same reasons.
> > > >
> > > > What about commons-machine-learning or commons-math-learning? The
> > latter
> > > > is as long as commons-configuration.
> > > >
> > > > Emmanuel Bourg
> > > >
> > > >
> > > > Le 2021-02-10 03:27, Ralph Goers a écrit :
> > > > > -1 on commons-ml as the name. My first thought is such a repo would
> > > > > hold stuff related to mailing lists. Then again maybe it contains
> > > > > stuff relating to markup languages. Maybe it is Apache’s version of
> > > > > the ML Programming Language [1].
> > > > >
> > > > > However, I wouldn’t be -1 on commons-math-ml, although at best I
> > would
> > > > > be +0 since it is still not obvious what it would contain.
> > > > >
> > > > > Ralph
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > For additional commands, e-mail: dev-help@commons.apache.org
> >
> >
>
> --
> Avijit Basak

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Vote] Create a "machine learning" component

Posted by Avijit Basak <av...@gmail.com>.

Hi

       I would like to mention a few points here. Genetic Algorithm has a
vast range of applications in optimization and search problems. Machine
learning is only one of those.
       If we couple the new GA library with any specific domain like ml it
would be meaningless for people working in other domains. They have to
incorporate the entire ml library which may be completely unrelated to
their project. Coupling it with any technology like spark might also limit
it's usability.
       If a separate component is not approved for this change then we can
incorporate the changes as part of *commons.math* library.
       The same library can be reused in ml or neural network libraries as
a dependency.
       Kindly share further views on this.

Thanks & Regards
--Avijit Basak

On Wed, 10 Feb 2021 at 19:49, Gilles Sadowski <gi...@gmail.com> wrote:

> Le mer. 10 févr. 2021 à 13:19, sebb <se...@gmail.com> a écrit :
> >
> > Likewise, commons-ml is too cryptic.
> >
> > Also, the Spark project has a machine-learning library:
> >
> > https://spark.apache.org/mllib/
>
> Thanks for the pointer.
>
> >
> > Maybe that would be better home?
>
> On the face of it, probably.
> [For sure, Avijit should comment on the suggestion.]
>
> On the other hand, "Commons" is the place where one can pick "bare
> bone" implementations, and add the functionality to one's application
> without necessarily comply with an overarching framework.
> [I don't mean that framework compliance is bad; quite the contrary, it is
> hopefully the result of a thorough reflection by experts.  But ... cf. the
> numerous "no-dependency" discussions ...]
>
> Actually, concerning Avijit's proposed contribution, didn't I say:[1]
> ---CUT---
> Thus, I think that we must assess whether the "genetic algorithms"
> functionality has a reasonable future within "Apache Commons" (i.e.
> potential users and contributors) while there exist other libraries that
> seem much more advanced for any serious usage.
> ---CUT---
>
> > I'm also a bit concerned as to whether there are sufficient developers
> > here with knowledge of the ML domain to be able to support the code in
> > the future.
>
> An interesting point; by all means not a new one (see e.g. [2]).
>
> Isn't it the same point I've been making about "Commons Math" (CM)?
> There has been no releases because nobody here is able (or is willing
> to) support it.
>
> Concerning the support of the purported "machinelearning" component:
> 1. Package
>         org.apache.commons.math4.ml.neuralnet
>     * I've written it entirely and I have applications that depend on it
> (and I
>       cannot assume that I could easily switch to, or port it to, Spark),
> so I
>       can reasonably ensure that it would be supported.
> 2. Package
>         org.apache.commons.math4.ml.clustering
>     * Functionality is mentioned in Spark's "mllib" user guide.
>     * When a new feature was last contributed[3], it was noticed[4][5][6]
>       that improvement were needed (but there was no follow-up).
>     * I've an application that depend on it (from CM v3.6.1) but I wouldn't
>       support it if shipped in CM v4.0.
> 3. Package
>         org.apache.commons.math4.genetics
>     * Part of my "end-of-study" project consisted in a GA implementation.
>       I've never used the CM implementation, and I don't deny that there
>       could be perfectly fine uses of it but, just looking at the code, it
> seems
>       obvious that it cannot compete feature-wise with other libraries
> out there.
>     * I've suggested long ago that, without anyone supporting it actively
> (and
>       no known user community), it should be dropped from CM.
>     * Avijit expressed a willingness to improve the functionality:  Is
> this enough
>       for the PMC to create a new component?  From the experience with the
>       "clustering" package mentioned above, I'd tend to think
> (unfortunately)
>       that it isn't.  He should first explore whether the Spark community
> is
>       interested, that the GA functionality be moved over there.
>
> Gilles
>
> [1] https://issues.apache.org/jira/browse/MATH-1563
> [2] https://markmail.org/message/26yxj5vhysdsoety
> [3] https://issues.apache.org/jira/projects/MATH/issues/MATH-1509
> [4] https://issues.apache.org/jira/projects/MATH/issues/MATH-1524
> [5] https://issues.apache.org/jira/projects/MATH/issues/MATH-1528
> [6] https://issues.apache.org/jira/projects/MATH/issues/MATH-1526
>
> >
> > On Wed, 10 Feb 2021 at 08:27, Emmanuel Bourg <eb...@apache.org> wrote:
> > >
> > > -1 for commons-ml for the same reasons.
> > >
> > > What about commons-machine-learning or commons-math-learning? The
> latter
> > > is as long as commons-configuration.
> > >
> > > Emmanuel Bourg
> > >
> > >
> > > Le 2021-02-10 03:27, Ralph Goers a écrit :
> > > > -1 on commons-ml as the name. My first thought is such a repo would
> > > > hold stuff related to mailing lists. Then again maybe it contains
> > > > stuff relating to markup languages. Maybe it is Apache’s version of
> > > > the ML Programming Language [1].
> > > >
> > > > However, I wouldn’t be -1 on commons-math-ml, although at best I
> would
> > > > be +0 since it is still not obvious what it would contain.
> > > >
> > > > Ralph
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

-- 
Avijit Basak

Re: [Vote] Create a "machine learning" component

Posted by Gilles Sadowski <gi...@gmail.com>.

Le mer. 10 févr. 2021 à 13:19, sebb <se...@gmail.com> a écrit :
>
> Likewise, commons-ml is too cryptic.
>
> Also, the Spark project has a machine-learning library:
>
> https://spark.apache.org/mllib/

Thanks for the pointer.

>
> Maybe that would be better home?

On the face of it, probably.
[For sure, Avijit should comment on the suggestion.]

On the other hand, "Commons" is the place where one can pick "bare
bone" implementations, and add the functionality to one's application
without necessarily comply with an overarching framework.
[I don't mean that framework compliance is bad; quite the contrary, it is
hopefully the result of a thorough reflection by experts.  But ... cf. the
numerous "no-dependency" discussions ...]

Actually, concerning Avijit's proposed contribution, didn't I say:[1]
---CUT---
Thus, I think that we must assess whether the "genetic algorithms"
functionality has a reasonable future within "Apache Commons" (i.e.
potential users and contributors) while there exist other libraries that
seem much more advanced for any serious usage.
---CUT---

> I'm also a bit concerned as to whether there are sufficient developers
> here with knowledge of the ML domain to be able to support the code in
> the future.

An interesting point; by all means not a new one (see e.g. [2]).

Isn't it the same point I've been making about "Commons Math" (CM)?
There has been no releases because nobody here is able (or is willing
to) support it.

Concerning the support of the purported "machinelearning" component:
1. Package
        org.apache.commons.math4.ml.neuralnet
    * I've written it entirely and I have applications that depend on it (and I
      cannot assume that I could easily switch to, or port it to, Spark), so I
      can reasonably ensure that it would be supported.
2. Package
        org.apache.commons.math4.ml.clustering
    * Functionality is mentioned in Spark's "mllib" user guide.
    * When a new feature was last contributed[3], it was noticed[4][5][6]
      that improvement were needed (but there was no follow-up).
    * I've an application that depend on it (from CM v3.6.1) but I wouldn't
      support it if shipped in CM v4.0.
3. Package
        org.apache.commons.math4.genetics
    * Part of my "end-of-study" project consisted in a GA implementation.
      I've never used the CM implementation, and I don't deny that there
      could be perfectly fine uses of it but, just looking at the code, it seems
      obvious that it cannot compete feature-wise with other libraries
out there.
    * I've suggested long ago that, without anyone supporting it actively (and
      no known user community), it should be dropped from CM.
    * Avijit expressed a willingness to improve the functionality:  Is
this enough
      for the PMC to create a new component?  From the experience with the
      "clustering" package mentioned above, I'd tend to think (unfortunately)
      that it isn't.  He should first explore whether the Spark community is
      interested, that the GA functionality be moved over there.

Gilles

[1] https://issues.apache.org/jira/browse/MATH-1563
[2] https://markmail.org/message/26yxj5vhysdsoety
[3] https://issues.apache.org/jira/projects/MATH/issues/MATH-1509
[4] https://issues.apache.org/jira/projects/MATH/issues/MATH-1524
[5] https://issues.apache.org/jira/projects/MATH/issues/MATH-1528
[6] https://issues.apache.org/jira/projects/MATH/issues/MATH-1526

>
> On Wed, 10 Feb 2021 at 08:27, Emmanuel Bourg <eb...@apache.org> wrote:
> >
> > -1 for commons-ml for the same reasons.
> >
> > What about commons-machine-learning or commons-math-learning? The latter
> > is as long as commons-configuration.
> >
> > Emmanuel Bourg
> >
> >
> > Le 2021-02-10 03:27, Ralph Goers a écrit :
> > > -1 on commons-ml as the name. My first thought is such a repo would
> > > hold stuff related to mailing lists. Then again maybe it contains
> > > stuff relating to markup languages. Maybe it is Apache’s version of
> > > the ML Programming Language [1].
> > >
> > > However, I wouldn’t be -1 on commons-math-ml, although at best I would
> > > be +0 since it is still not obvious what it would contain.
> > >
> > > Ralph

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Vote] Create a "machine learning" component

Posted by sebb <se...@gmail.com>.

Likewise, commons-ml is too cryptic.

Also, the Spark project has a machine-learning library:

https://spark.apache.org/mllib/

Maybe that would be better home?

I'm also a bit concerned as to whether there are sufficient developers
here with knowledge of the ML domain to be able to support the code in
the future.

On Wed, 10 Feb 2021 at 08:27, Emmanuel Bourg <eb...@apache.org> wrote:
>
> -1 for commons-ml for the same reasons.
>
> What about commons-machine-learning or commons-math-learning? The latter
> is as long as commons-configuration.
>
> Emmanuel Bourg
>
>
> Le 2021-02-10 03:27, Ralph Goers a écrit :
> > -1 on commons-ml as the name. My first thought is such a repo would
> > hold stuff related to mailing lists. Then again maybe it contains
> > stuff relating to markup languages. Maybe it is Apache’s version of
> > the ML Programming Language [1].
> >
> > However, I wouldn’t be -1 on commons-math-ml, although at best I would
> > be +0 since it is still not obvious what it would contain.
> >
> > Ralph
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Vote] Create a "machine learning" component

Posted by Gilles Sadowski <gi...@gmail.com>.

Le mer. 10 févr. 2021 à 09:27, Emmanuel Bourg <eb...@apache.org> a écrit :
>
> -1 for commons-ml for the same reasons.
>
> What about commons-machine-learning or commons-math-learning? The latter
> is as long as commons-configuration.

Java users should be used to lengthy names.
It should thus be "commons-machinelearning" as hyphens, by convention,
separate items that become sub-packages in the Java code.

>
> Emmanuel Bourg
>
>
> Le 2021-02-10 03:27, Ralph Goers a écrit :
> > -1 on commons-ml as the name. My first thought is such a repo would
> > hold stuff related to mailing lists. Then again maybe it contains
> > stuff relating to markup languages. Maybe it is Apache’s version of
> > the ML Programming Language [1].

Strange rationale.  As if someone would not read the full name of a
libary before deciding whether it provides what he needs...

> >
> > However, I wouldn’t be -1 on commons-math-ml, although at best I would
> > be +0 since it is still not obvious what it would contain.

As explained, this is not a useful or descriptive name: ML is not part
of what mathematicians would consider a part of mathematics.
ML is an area of computer science, inspired by biological processes.

Gilles

> >
> > Ralph

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Vote] Create a "machine learning" component

Posted by Emmanuel Bourg <eb...@apache.org>.

-1 for commons-ml for the same reasons.

What about commons-machine-learning or commons-math-learning? The latter 
is as long as commons-configuration.

Emmanuel Bourg


Le 2021-02-10 03:27, Ralph Goers a écrit :
> -1 on commons-ml as the name. My first thought is such a repo would
> hold stuff related to mailing lists. Then again maybe it contains
> stuff relating to markup languages. Maybe it is Apache’s version of
> the ML Programming Language [1].
> 
> However, I wouldn’t be -1 on commons-math-ml, although at best I would
> be +0 since it is still not obvious what it would contain.
> 
> Ralph

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Vote] Create a "machine learning" component

Posted by Ralph Goers <ra...@dslextreme.com>.

-1 on commons-ml as the name. My first thought is such a repo would hold stuff related to mailing lists. Then again maybe it contains stuff relating to markup languages. Maybe it is Apache’s version of the ML Programming Language [1].

However, I wouldn’t be -1 on commons-math-ml, although at best I would be +0 since it is still not obvious what it would contain.

Ralph

1. http://web.cecs.pdx.edu/~black/CS311/ML.html

> On Feb 9, 2021, at 3:43 PM, Gilles Sadowski <gi...@gmail.com> wrote:
> 
> Hi.
> 
> Because of an offered contribution, a discussion happened on
> JIRA[1] and in another thread[2] about improving the genetic
> algorithm (GA) implementation currently in the
>   org.apache.commons.math4.genetic
> package of the "Commons Math" component.
> It would make sense to group "machine learning" algorithms[3]
> (to which GA belongs) within a single component, where codes from
>  org.apache.commons.math4.ml.neuralnet
>  org.apache.commons.math4.ml.clustering
> would be moved too.
> This would be the fifth (and last) component resulting from my proposal
> (see e.g. [4] among other threads) for the reorganization of the "Commons
> Math"[5] code base into more maintainable components[6][7][8][9], each
> focused on actually related functionalities (thus *not* the wide expertise
> necessary for the maintenance of a full-fledged math library).
> 
> I suggest "ML" for the name of the component.
> 
> Regards,
> Gilles
> 
> [1] https://issues.apache.org/jira/projects/MATH/issues/MATH-1563
> [2] https://markmail.org/message/dnujdcxuaq5bwuwe
> [3] https://en.wikipedia.org/wiki/Machine_learning
> [4] https://markmail.org/message/75vuyhzblfadc5op
> [5] http://commons.apache.org/proper/commons-math/
> [6] http://commons.apache.org/proper/commons-rng/
> [7] http://commons.apache.org/proper/commons-numbers/
> [8] http://commons.apache.org/proper/commons-geometry/
> [9] http://commons.apache.org/proper/commons-statistics/
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
> 
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org