You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Gilles Sadowski <gi...@gmail.com> on 2021/04/24 22:14:27 UTC

The case for a Commons component

Hello Paul.

Le sam. 24 avr. 2021 à 04:42, Paul King <pa...@gmail.com> a écrit :
>
> I added some more comments relevant to if the proposed algorithm
> belongs somewhere in the commons "math" area back in the Jira:
>
> https://issues.apache.org/jira/browse/MATH-1563

Thanks for a "real" user's testimony.

As the ML is still the official forum for such a discussion, I'm quoting
part of your post on JIRA:
---CUT---
For linear regression, taking just one example dataset, commons-math
is a couple of library calls for a single 2M library and solves the
problem in 240ms. Both Ignite and Spark involve "firing up the
platform" and the code is more complex for simple scenarios. Spark has
a 181M footprint across 210 jars and solves the problem in about 20s.
Ignite has a 87M footprint across 85 jars and solves the problem in >
40s. But I can also find more complex scenarios which need to scale
where Ignite and Spark really come into their own.
---CUT---

A similar rationale was behind my developing/using the SOFM
functionality in the "o.a.c.m.ml.neuralnet" package: I needed a
proof of concept, and taking the "lightweight" path seemed more
effective than experimenting with those platforms.
Admittingly, at that epoch, there were people around, who were
maintaining the clustering and GA codes; hence, the prototyping
of a machine-learning library didn't look strange to anyone.

Regards,
Gilles

>>> [...]

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: The case for a Commons component

Posted by Gilles Sadowski <gi...@gmail.com>.
Le jeu. 6 mai 2021 à 20:29, Oliver Heger
<ol...@oliver-heger.de> a écrit :
>
>
>
> Am 05.05.21 um 21:54 schrieb Gilles Sadowski:
> > Le mer. 5 mai 2021 à 20:33, Oliver Heger
> > <ol...@oliver-heger.de> a écrit :
> >>
> >>
> >>
> >> Am 05.05.21 um 20:26 schrieb Gilles Sadowski:
> >>> Le mer. 5 mai 2021 à 18:57, Gary Gregory <ga...@gmail.com> a écrit :
> >>>>
> >>>> IMO the lack of +1s shows the lack of appetite to manage another component
> >>>
> >>> That's certainly true.
> >>> And nobody is forced to do anything.
> >>>
> >>> When the other CM spin-offs started, there was only _one_ person
> >>> willing to do the work.
> >>
> >> What about the sandbox? IIUC, every committer can start a new component
> >> there. If then a community forms around this component, it can move to
> >> proper (which would then require a vote).
> >>
> >> Would this be an option to get started?
> >
> > [Graph] is listed in the sandbox[1], yet when someone expressed a willingness
> > to contribute, we had a "git" repository created[2] (even though the
> > web site has
> > remained outdated[3], probably because the attempt was short-lived).
> >
> > So indeed, I could have already created the repository a few weeks ago...
> >
> > However in this instance, what would it mean to have codes that have lived
> > within a "proper" component for 6 years and more be moved to "sandbox"?
>
> A way to move forward?

Thanks for trying to be contructive (and a decent tone).

I've been told that I should learn to count; that the vote (to
create a repository) has failed.
Hence that option has also been ruled out.  [What was OK for
[Graph] in sandbox, somehow is not anymore.  Go figure...]

Gilles

>
> Oliver
>
> >
> > Regards,
> > Gilles
> >
> > [1] http://commons.apache.org/sandbox/commons-graph/
> > [2] https://gitbox.apache.org/repos/asf?p=commons-graph.git
> > [3] http://commons.apache.org/sandbox/commons-graph/source-repository.html

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: The case for a Commons component

Posted by Oliver Heger <ol...@oliver-heger.de>.

Am 05.05.21 um 21:54 schrieb Gilles Sadowski:
> Le mer. 5 mai 2021 à 20:33, Oliver Heger
> <ol...@oliver-heger.de> a écrit :
>>
>>
>>
>> Am 05.05.21 um 20:26 schrieb Gilles Sadowski:
>>> Le mer. 5 mai 2021 à 18:57, Gary Gregory <ga...@gmail.com> a écrit :
>>>>
>>>> IMO the lack of +1s shows the lack of appetite to manage another component
>>>
>>> That's certainly true.
>>> And nobody is forced to do anything.
>>>
>>> When the other CM spin-offs started, there was only _one_ person
>>> willing to do the work.
>>
>> What about the sandbox? IIUC, every committer can start a new component
>> there. If then a community forms around this component, it can move to
>> proper (which would then require a vote).
>>
>> Would this be an option to get started?
> 
> [Graph] is listed in the sandbox[1], yet when someone expressed a willingness
> to contribute, we had a "git" repository created[2] (even though the
> web site has
> remained outdated[3], probably because the attempt was short-lived).
> 
> So indeed, I could have already created the repository a few weeks ago...
> 
> However in this instance, what would it mean to have codes that have lived
> within a "proper" component for 6 years and more be moved to "sandbox"?

A way to move forward?

Oliver

> 
> Regards,
> Gilles
> 
> [1] http://commons.apache.org/sandbox/commons-graph/
> [2] https://gitbox.apache.org/repos/asf?p=commons-graph.git
> [3] http://commons.apache.org/sandbox/commons-graph/source-repository.html
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: The case for a Commons component

Posted by Ralph Goers <ra...@dslextreme.com>.
I’ll be nice an summarize. Giles started two vote threads. The first was polluted with discussion and eventually closed. The second has not passed and is effectively dead but Giles hasn’t closed the vote.

So nothing has been approved.

Ralph

> On May 14, 2021, at 5:48 AM, Gary Gregory <ga...@gmail.com> wrote:
> 
> Are seriously asking someone else to read through 40 emails and summarize
> for you? Perhaps part of your contribution might be to do this yourself?
> 
> Gary
> 
> On Fri, May 14, 2021, 08:15 Avijit Basak <av...@gmail.com> wrote:
> 
>> Hi All
>> 
>>        This has been a long mail thread. It will be really helpful if
>> anyone can summarize the decisions.
>>        Is the proposal of developing the new machine learning component
>> approved?
>>        If the team repository is not provided is there any way to go
>> ahead?
>>        Waiting for a response.
>> 
>> Thanks & Regards
>> --Avijit Basak
>> 
>> On Fri, 7 May 2021 at 02:26, sebb <se...@gmail.com> wrote:
>> 
>>> On Thu, 6 May 2021 at 21:13, Gary Gregory <ga...@gmail.com>
>> wrote:
>>>> 
>>>> It is true that there much less friction these days to get a repository
>>>> going with GitHub, GitLab, and BitBucket, but, for now, the Commons
>>> Sandbox
>>>> is still available. If we want to do away with the sandbox, then let's
>>>> talk about that separately.
>>>> 
>>> 
>>> There is no need for a Sandbox component to use SVN, and it's easy to
>>> create a new Commons git repo.
>>> 
>>> A non-ASF code repo would require code to be checked for license
>>> compliance etc before it could become a Commons component.
>>> A Sandbox component does not require that.
>>> 
>>>> Gary
>>>> 
>>>> On Thu, May 6, 2021, 11:26 Ralph Goers <ra...@dslextreme.com>
>>> wrote:
>>>> 
>>>>> 
>>>>> 
>>>>>> On May 6, 2021, at 8:06 AM, Gary Gregory <ga...@gmail.com>
>>> wrote:
>>>>>> 
>>>>>> What about the Commons Sandox? Would that be a good place to start?
>>>>>> 
>>>>> 
>>>>> Emmanuel just sort of proposed doing away with it. As he put it,
>> anyone
>>>>> can create a
>>>>> GitHub repo so why does it need to be under the apache user.  He
>> hasn’t
>>>>> formally
>>>>> made a proposal for that and I’m not sure how I would vote on it if
>> he
>>>>> did. He does
>>>>> have a point. At the same time I’m not sure I’d close off doing
>>>>> experimental or
>>>>> early development within the ASF space.
>>>>> 
>>>>> Ralph
>>>>> 
>>>>> 
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>> 
>>>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>> 
>>> 
>> 
>> --
>> Avijit Basak
>> 



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: The case for a Commons component

Posted by Gary Gregory <ga...@gmail.com>.
Are seriously asking someone else to read through 40 emails and summarize
for you? Perhaps part of your contribution might be to do this yourself?

Gary

On Fri, May 14, 2021, 08:15 Avijit Basak <av...@gmail.com> wrote:

> Hi All
>
>         This has been a long mail thread. It will be really helpful if
> anyone can summarize the decisions.
>         Is the proposal of developing the new machine learning component
> approved?
>         If the team repository is not provided is there any way to go
> ahead?
>         Waiting for a response.
>
> Thanks & Regards
> --Avijit Basak
>
> On Fri, 7 May 2021 at 02:26, sebb <se...@gmail.com> wrote:
>
> > On Thu, 6 May 2021 at 21:13, Gary Gregory <ga...@gmail.com>
> wrote:
> > >
> > > It is true that there much less friction these days to get a repository
> > > going with GitHub, GitLab, and BitBucket, but, for now, the Commons
> > Sandbox
> > > is still available. If we want to do away with the sandbox, then let's
> > > talk about that separately.
> > >
> >
> > There is no need for a Sandbox component to use SVN, and it's easy to
> > create a new Commons git repo.
> >
> > A non-ASF code repo would require code to be checked for license
> > compliance etc before it could become a Commons component.
> > A Sandbox component does not require that.
> >
> > > Gary
> > >
> > > On Thu, May 6, 2021, 11:26 Ralph Goers <ra...@dslextreme.com>
> > wrote:
> > >
> > > >
> > > >
> > > > > On May 6, 2021, at 8:06 AM, Gary Gregory <ga...@gmail.com>
> > wrote:
> > > > >
> > > > > What about the Commons Sandox? Would that be a good place to start?
> > > > >
> > > >
> > > > Emmanuel just sort of proposed doing away with it. As he put it,
> anyone
> > > > can create a
> > > > GitHub repo so why does it need to be under the apache user.  He
> hasn’t
> > > > formally
> > > > made a proposal for that and I’m not sure how I would vote on it if
> he
> > > > did. He does
> > > > have a point. At the same time I’m not sure I’d close off doing
> > > > experimental or
> > > > early development within the ASF space.
> > > >
> > > > Ralph
> > > >
> > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > > > For additional commands, e-mail: dev-help@commons.apache.org
> > > >
> > > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > For additional commands, e-mail: dev-help@commons.apache.org
> >
> >
>
> --
> Avijit Basak
>

Re: The case for a Commons component

Posted by Avijit Basak <av...@gmail.com>.
Hi All

        This has been a long mail thread. It will be really helpful if
anyone can summarize the decisions.
        Is the proposal of developing the new machine learning component
approved?
        If the team repository is not provided is there any way to go ahead?
        Waiting for a response.

Thanks & Regards
--Avijit Basak

On Fri, 7 May 2021 at 02:26, sebb <se...@gmail.com> wrote:

> On Thu, 6 May 2021 at 21:13, Gary Gregory <ga...@gmail.com> wrote:
> >
> > It is true that there much less friction these days to get a repository
> > going with GitHub, GitLab, and BitBucket, but, for now, the Commons
> Sandbox
> > is still available. If we want to do away with the sandbox, then let's
> > talk about that separately.
> >
>
> There is no need for a Sandbox component to use SVN, and it's easy to
> create a new Commons git repo.
>
> A non-ASF code repo would require code to be checked for license
> compliance etc before it could become a Commons component.
> A Sandbox component does not require that.
>
> > Gary
> >
> > On Thu, May 6, 2021, 11:26 Ralph Goers <ra...@dslextreme.com>
> wrote:
> >
> > >
> > >
> > > > On May 6, 2021, at 8:06 AM, Gary Gregory <ga...@gmail.com>
> wrote:
> > > >
> > > > What about the Commons Sandox? Would that be a good place to start?
> > > >
> > >
> > > Emmanuel just sort of proposed doing away with it. As he put it, anyone
> > > can create a
> > > GitHub repo so why does it need to be under the apache user.  He hasn’t
> > > formally
> > > made a proposal for that and I’m not sure how I would vote on it if he
> > > did. He does
> > > have a point. At the same time I’m not sure I’d close off doing
> > > experimental or
> > > early development within the ASF space.
> > >
> > > Ralph
> > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > > For additional commands, e-mail: dev-help@commons.apache.org
> > >
> > >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

-- 
Avijit Basak

Re: The case for a Commons component

Posted by sebb <se...@gmail.com>.
On Thu, 6 May 2021 at 21:13, Gary Gregory <ga...@gmail.com> wrote:
>
> It is true that there much less friction these days to get a repository
> going with GitHub, GitLab, and BitBucket, but, for now, the Commons Sandbox
> is still available. If we want to do away with the sandbox, then let's
> talk about that separately.
>

There is no need for a Sandbox component to use SVN, and it's easy to
create a new Commons git repo.

A non-ASF code repo would require code to be checked for license
compliance etc before it could become a Commons component.
A Sandbox component does not require that.

> Gary
>
> On Thu, May 6, 2021, 11:26 Ralph Goers <ra...@dslextreme.com> wrote:
>
> >
> >
> > > On May 6, 2021, at 8:06 AM, Gary Gregory <ga...@gmail.com> wrote:
> > >
> > > What about the Commons Sandox? Would that be a good place to start?
> > >
> >
> > Emmanuel just sort of proposed doing away with it. As he put it, anyone
> > can create a
> > GitHub repo so why does it need to be under the apache user.  He hasn’t
> > formally
> > made a proposal for that and I’m not sure how I would vote on it if he
> > did. He does
> > have a point. At the same time I’m not sure I’d close off doing
> > experimental or
> > early development within the ASF space.
> >
> > Ralph
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > For additional commands, e-mail: dev-help@commons.apache.org
> >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: The case for a Commons component

Posted by Gary Gregory <ga...@gmail.com>.
It is true that there much less friction these days to get a repository
going with GitHub, GitLab, and BitBucket, but, for now, the Commons Sandbox
is still available. If we want to do away with the sandbox, then let's
talk about that separately.

Gary

On Thu, May 6, 2021, 11:26 Ralph Goers <ra...@dslextreme.com> wrote:

>
>
> > On May 6, 2021, at 8:06 AM, Gary Gregory <ga...@gmail.com> wrote:
> >
> > What about the Commons Sandox? Would that be a good place to start?
> >
>
> Emmanuel just sort of proposed doing away with it. As he put it, anyone
> can create a
> GitHub repo so why does it need to be under the apache user.  He hasn’t
> formally
> made a proposal for that and I’m not sure how I would vote on it if he
> did. He does
> have a point. At the same time I’m not sure I’d close off doing
> experimental or
> early development within the ASF space.
>
> Ralph
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

Re: The case for a Commons component

Posted by Ralph Goers <ra...@dslextreme.com>.

> On May 6, 2021, at 8:06 AM, Gary Gregory <ga...@gmail.com> wrote:
> 
> What about the Commons Sandox? Would that be a good place to start?
> 

Emmanuel just sort of proposed doing away with it. As he put it, anyone can create a 
GitHub repo so why does it need to be under the apache user.  He hasn’t formally 
made a proposal for that and I’m not sure how I would vote on it if he did. He does 
have a point. At the same time I’m not sure I’d close off doing experimental or 
early development within the ASF space.

Ralph



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: The case for a Commons component

Posted by Gary Gregory <ga...@gmail.com>.
What about the Commons Sandox? Would that be a good place to start?

Gary

On Thu, May 6, 2021, 09:37 Gilles Sadowski <gi...@gmail.com> wrote:

> Le jeu. 6 mai 2021 à 14:48, Emmanuel Bourg <eb...@apache.org> a écrit :
> >
> > Le 2021-05-06 13:06, Gilles Sadowski a écrit :
> >
> > > It is not nice to decide for others what they may need.
> >
> > It is not nice to suggest I shouldn't voice my opinions.
>
> Your argued opinion is welcome.
> In the text which you cut, you *explicitly* said that I should
> go somewhere else (GitHub or whatever).
>
> >
> > > It would have been courteous to acknowledge the answers to
> > > your argument against having a dedicated component
> >
> > I've little appetite for lengthy debate with you again.
>
> There is/was no debate (as in: "an exchange of arguments" or
> "trying to get consensus" or "not forcing me to do what I think is
> bad"), you state your opinion (as mentioned above) and that's it.
>
> > > My rationale, for whether a specific component is needed, has
> > > always been the same: Define a scope (and stick to it).
> > > You seem to find this acceptable for any Commons project except
> > > those which you tagged as "math-related".
> >
> > The machine learning scope is too wide, it doesn't belong here.
>
> I agree that it is wide, but much less so than "math", yet you never
> voiced such an opinion against CM (while I did).
>
> > > So I'm asking: Will it make any difference if the "machine learning"
> > > codes are further developed within [Math]?  Concretely:
> > >  * Would you vote to release CM v4.0?
> > >  * Would you help (more than if the ML codes were in a
> > >    specific component) to review/merge the PRs?
> >
> > I'd would vote favorably for a modularized CM 4.0 release,
>
> I really (really, really) can't figure out how you can reconcile that a
> library (CM) that *contains* a ML subset which you deem too big
> to be a Commons component, is not too big to be a Commons
> component!
>
> The spin-offs from CM do solve the issue of "too wide scope" that
> doomed CM.
> And again: I agree that "machine learning" may be too wide a
> scope itself; grouping all such algorithms in a single component
> was already a compromise wrt to having each ML field in its own,
> especially if we aimed at some common goal (multi-threading) that
> could lead to shared code (not the math algorithms but, o.a. things,
> the threads management).
>
> > but I still
> > think that the math related components would be best served in their own
> > TLP with a dedicated community
>
> When this was brought up somewhat seriously, most of the
> PMC voted against.
> Then last time (IIRC) the idea was floated, there wasn't the
> minimum of people required to support a TLP.  [FTR, that was
> the practical reason these codes are here (as is the for all the
> other Commons components): a place where more people can
> contribute to otherwise orphaned libraries.]
>
> OK, then let's move on; thus I'm asking who in this PMC, is
> now willing to provide the necessary clearance for an internal
> fork of the math-related codes for which it is deemed that they
> are not a good fit for Commons?
>
> > free of the Apache Commons rules and
> > constraints.
>
> I'm still to be shown what rules I'd be asking to be free of.
>
> Gilles
>
> >
> > Emmanuel Bourg
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

Re: The case for a Commons component

Posted by Gilles Sadowski <gi...@gmail.com>.
Le jeu. 6 mai 2021 à 14:48, Emmanuel Bourg <eb...@apache.org> a écrit :
>
> Le 2021-05-06 13:06, Gilles Sadowski a écrit :
>
> > It is not nice to decide for others what they may need.
>
> It is not nice to suggest I shouldn't voice my opinions.

Your argued opinion is welcome.
In the text which you cut, you *explicitly* said that I should
go somewhere else (GitHub or whatever).

>
> > It would have been courteous to acknowledge the answers to
> > your argument against having a dedicated component
>
> I've little appetite for lengthy debate with you again.

There is/was no debate (as in: "an exchange of arguments" or
"trying to get consensus" or "not forcing me to do what I think is
bad"), you state your opinion (as mentioned above) and that's it.

> > My rationale, for whether a specific component is needed, has
> > always been the same: Define a scope (and stick to it).
> > You seem to find this acceptable for any Commons project except
> > those which you tagged as "math-related".
>
> The machine learning scope is too wide, it doesn't belong here.

I agree that it is wide, but much less so than "math", yet you never
voiced such an opinion against CM (while I did).

> > So I'm asking: Will it make any difference if the "machine learning"
> > codes are further developed within [Math]?  Concretely:
> >  * Would you vote to release CM v4.0?
> >  * Would you help (more than if the ML codes were in a
> >    specific component) to review/merge the PRs?
>
> I'd would vote favorably for a modularized CM 4.0 release,

I really (really, really) can't figure out how you can reconcile that a
library (CM) that *contains* a ML subset which you deem too big
to be a Commons component, is not too big to be a Commons
component!

The spin-offs from CM do solve the issue of "too wide scope" that
doomed CM.
And again: I agree that "machine learning" may be too wide a
scope itself; grouping all such algorithms in a single component
was already a compromise wrt to having each ML field in its own,
especially if we aimed at some common goal (multi-threading) that
could lead to shared code (not the math algorithms but, o.a. things,
the threads management).

> but I still
> think that the math related components would be best served in their own
> TLP with a dedicated community

When this was brought up somewhat seriously, most of the
PMC voted against.
Then last time (IIRC) the idea was floated, there wasn't the
minimum of people required to support a TLP.  [FTR, that was
the practical reason these codes are here (as is the for all the
other Commons components): a place where more people can
contribute to otherwise orphaned libraries.]

OK, then let's move on; thus I'm asking who in this PMC, is
now willing to provide the necessary clearance for an internal
fork of the math-related codes for which it is deemed that they
are not a good fit for Commons?

> free of the Apache Commons rules and
> constraints.

I'm still to be shown what rules I'd be asking to be free of.

Gilles

>
> Emmanuel Bourg
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: The case for a Commons component

Posted by Emmanuel Bourg <eb...@apache.org>.
Le 2021-05-06 13:06, Gilles Sadowski a écrit :

> It is not nice to decide for others what they may need.

It is not nice to suggest I shouldn't voice my opinions.


> It would have been courteous to acknowledge the answers to
> your argument against having a dedicated component

I've little appetite for lengthy debate with you again.


> My rationale, for whether a specific component is needed, has
> always been the same: Define a scope (and stick to it).
> You seem to find this acceptable for any Commons project except
> those which you tagged as "math-related".

The machine learning scope is too wide, it doesn't belong here.


> So I'm asking: Will it make any difference if the "machine learning"
> codes are further developed within [Math]?  Concretely:
>  * Would you vote to release CM v4.0?
>  * Would you help (more than if the ML codes were in a
>    specific component) to review/merge the PRs?

I'd would vote favorably for a modularized CM 4.0 release, but I still 
think that the math related components would be best served in their own 
TLP with a dedicated community free of the Apache Commons rules and 
constraints.

Emmanuel Bourg

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: The case for a Commons component

Posted by Gilles Sadowski <gi...@gmail.com>.
Le jeu. 6 mai 2021 à 02:24, Emmanuel Bourg <eb...@apache.org> a écrit :
>
> Le 2021-05-05 20:31, Oliver Heger a écrit :
>
> > What about the sandbox? IIUC, every committer can start a new
> > component there. If then a community forms around this component, it
> > can move to proper (which would then require a vote).
>
> With the various source hosting solutions available today we no longer
> need the sandbox, and I think we should discontinue this practice. The
> machine learning library could as well start its life on GitHub, it
> doesn't need Apache Commons.

It is not nice to decide for others what they may need.

It would have been courteous to acknowledge the answers to
your argument against having a dedicated component (to more
efficiently manage codes that have already been accepted within
the "Commons" project, as part of CM), and explain
 * why those answers would not make you withdraw your -1,
 * why the ASF would be better off without the offered contribution,
 * why some initiatives in Commons deserve a worse treatment
   than others.

My rationale, for whether a specific component is needed, has
always been the same: Define a scope (and stick to it).
You seem to find this acceptable for any Commons project except
those which you tagged as "math-related".

So I'm asking: Will it make any difference if the "machine learning"
codes are further developed within [Math]?  Concretely:
 * Would you vote to release CM v4.0?
 * Would you help (more than if the ML codes were in a
   specific component) to review/merge the PRs?

Gilles

>
> Emmanuel Bourg
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: The case for a Commons component

Posted by Emmanuel Bourg <eb...@apache.org>.
Le 2021-05-05 20:31, Oliver Heger a écrit :

> What about the sandbox? IIUC, every committer can start a new
> component there. If then a community forms around this component, it
> can move to proper (which would then require a vote).

With the various source hosting solutions available today we no longer 
need the sandbox, and I think we should discontinue this practice. The 
machine learning library could as well start its life on GitHub, it 
doesn't need Apache Commons.

Emmanuel Bourg

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: The case for a Commons component

Posted by Gilles Sadowski <gi...@gmail.com>.
Le mer. 5 mai 2021 à 20:33, Oliver Heger
<ol...@oliver-heger.de> a écrit :
>
>
>
> Am 05.05.21 um 20:26 schrieb Gilles Sadowski:
> > Le mer. 5 mai 2021 à 18:57, Gary Gregory <ga...@gmail.com> a écrit :
> >>
> >> IMO the lack of +1s shows the lack of appetite to manage another component
> >
> > That's certainly true.
> > And nobody is forced to do anything.
> >
> > When the other CM spin-offs started, there was only _one_ person
> > willing to do the work.
>
> What about the sandbox? IIUC, every committer can start a new component
> there. If then a community forms around this component, it can move to
> proper (which would then require a vote).
>
> Would this be an option to get started?

[Graph] is listed in the sandbox[1], yet when someone expressed a willingness
to contribute, we had a "git" repository created[2] (even though the
web site has
remained outdated[3], probably because the attempt was short-lived).

So indeed, I could have already created the repository a few weeks ago...

However in this instance, what would it mean to have codes that have lived
within a "proper" component for 6 years and more be moved to "sandbox"?

Regards,
Gilles

[1] http://commons.apache.org/sandbox/commons-graph/
[2] https://gitbox.apache.org/repos/asf?p=commons-graph.git
[3] http://commons.apache.org/sandbox/commons-graph/source-repository.html

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: The case for a Commons component

Posted by Oliver Heger <ol...@oliver-heger.de>.

Am 05.05.21 um 20:26 schrieb Gilles Sadowski:
> Le mer. 5 mai 2021 à 18:57, Gary Gregory <ga...@gmail.com> a écrit :
>>
>> IMO the lack of +1s shows the lack of appetite to manage another component
> 
> That's certainly true.
> And nobody is forced to do anything.
> 
> When the other CM spin-offs started, there was only _one_ person
> willing to do the work.

What about the sandbox? IIUC, every committer can start a new component 
there. If then a community forms around this component, it can move to 
proper (which would then require a vote).

Would this be an option to get started?

Oliver

> 
> Gilles
> 
>> [...]
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: The case for a Commons component

Posted by Alex Herbert <al...@gmail.com>.
On Fri, 30 Apr 2021 at 16:40, Avijit Basak <av...@gmail.com> wrote:

>
>           >>  Then some examination of the data-structures is required (a
> binary chromosome is currently stored as a "List<Integer>").
>               -- I have recently done some work on this. Could you please
> check this article and share your thought.
>                   "*https://arxiv.org/abs/2103.04751
> <https://arxiv.org/abs/2103.04751>*"
>

Looking at the paper it relates to the efficiency of storing binary values
for many indexes. The conclusion being you should use each bit of the byte
to store each binary value, i.e. a bitset. In the example repository the
binary chromosome is stored using a List<Long> with each long representing
64 alleles. This is basically an unoptimised BitSet. So I would look at how
this is done in java.util.BitSet and write a custom version for
optimised genetic algorithm operations. It would also be faster than the
List<Long> and avoid the boxing of each long with a Long object wrapper
thus save a lot of memory.

Note: You cannot easily just use java.util.BitSet as you wish to have
access to the underlying long[] to store the chromosome to enable efficient
crossover. This can be done with bit manipulation of the longs containing
the crossover point and then a System.arraycopy via a temp array:

For a single point crossover of two long[] chromosomes:

long[] c1 = ...
long[] c2 = ...
// The chosen allele for the crossover
int cross = ...

// Find the index and bit in the 64-bit per long representation
int index = cross >> 6; // i.e. cross / 64
// This is not actually required...
// int bit = cross & 64; // i.e. cross % 64

// The following will create the mask for all bits up to the target bit
// long mask = -1 << bit;
long mask = -1 << index;

// Swap the bits before/after the crossover at the target index
long tmp = c1[index];
c1[index] = (tmp & mask) | (c2[index] & ~mask);
c2[index] = (tmp & ~mask) | (c2[index] & mask);

// Copy the rest
long[] data = new long[index];
System.arraycopy(c2, 0, data, 0, index);
System.arraycopy(c1, 0, c2, 0, index);
System.arraycopy(data, 0, c1, 0, index);

This is untested code but contains the main idea.

Setting and unsetting bits in the binary chromosome for mutation is much
easier as you just pick the mutation point, find the index and the bit and
then set it or unset it as appropriate using a xor operation of the bit
(see the source code for BitSet.flip).

Alex

Re: The case for a Commons component

Posted by Alex Herbert <al...@gmail.com>.
On Sun, 2 May 2021 at 16:51, Avijit Basak <av...@gmail.com> wrote:

> Hi
>
> >>        Note: You cannot easily just use java.util.BitSet as you wish to
> have
> access to the underlying long[] to store the chromosome to enable efficient
> crossover.
> --Thanks for pointing this. However, I have considered few constraints
> while doing the implementation.
>      1) I extended the existing class AbstractListChromosome, which
> requires a Generic type. This is the reason for using a list of Long.
> However, I can extend the Chromosome and use an array of primitive long.
> BitSet also uses a similar data structure.
>      2) One problem of BitSet is the use of MSB to retain bits. As a
> result, we won't be able to use the static utility methods of wrapper
> classes(Long) for conversion between primitive type and string. We will
> have to write custom code for conversion between string and integral types.
> This is the only reason I have used BLOCKSIZE as 63 instead of 64.
>

I did state you cannot use BitSet as there are requirements to access the
underlying long[] for certain operations such as crossover. Thus you have
to build a custom implementation that uses a long[] representation with the
operations you need. You can then store the bits using big or little endian
as you require. The BitSet is using LSB for bit 0 to MSB for bit 63 of each
word.

Writing custom code for toString() would be simple. You can use a 256 entry
look-up table and output 8 blocks per long:

String[] OUTPUT = { "00000000", "00000001", "00000010", "00000011", etc. };
long[] alleles = ...;
StringBuilder sb = new StringBuilder(alleles.length * 64);
for (long bits : alleles) {
    // The order of this depends on the endianness of the representation
    sb.append(OUTPUT[(int)(bits & 0xff)])
       .append(OUTPUT[(int)((bits >> 8) & 0xff)])
       .append(OUTPUT[(int)((bits >> 16) & 0xff)])
       // etc ...
}

There would be extra work for the final block of 64 if it is not complete
(i.e. less than 64 bits are used) to avoid extra zeros in the output.

Writing fromString input code could use Long.parseUnsignedLong(long, int)
with a radix of 2 if you have the correct endianness per block of 64. This
allows you to intake 64 characters at a time to create the long[].

I do not see it as a problem to write custom code based around long[] if
the result is a large gain in speed and memory efficiency for the
implementation.

Restricting functionality to the current CM AbstractListChromosome
or Chromosome is not necessary for a new package. This is the opportunity
to build new data structures that are appropriate for the intended use.


> >>// This is not actually required...
> // int bit = cross & 64; // i.e. cross % 64
> --Do you mean bit index is not required to calculate? How can we handle
> crossover indexes which are not multiple of 64.
>

Sorry for not being clear. You need to create the mask to determine where
in the 64-bit long to perform the crossover. What I meant was you do not
need to identify the bit with a modulus operator. This:

int cross = ...

int index = cross / 64;
int bit = cross % 64;
long mask = 0xffff_ffff_ffff_ffffL << bit;

Is the same as:

int index = cross >>> 6;
long mask = -1 << cross;

This is because the left shift operator only uses the int value from the
lowest 6 bits of the integer. These are all the same:

-1 << 1
-1 << (1 + 64)
-1 << (1 + 128)
-1 << (1 + 256)


> >> Do you think that allele sets other than binary would be useful to
> implement? [IIUC your document above, it seems not (?).]
> --The document only describes the data structure related to Binary
> genotype. We already have an implementation of RandomKey genotype in
> commons. We can think of adding other genotypes gradually.
>
>
> Thanks & Regards
> --Avijit Basak
>
>
>
> On Sat, 1 May 2021 at 22:18, Gilles Sadowski <gi...@gmail.com> wrote:
>
> > Le ven. 30 avr. 2021 à 17:40, Avijit Basak <av...@gmail.com> a
> > écrit :
> > >
> > > Hi
> > >
> > >          >>lot of spurious references to "Commons Numbers"
> > >              --I have only created the basic project structure. Changes
> > > need to be made. Can anyone from the existing commons team help in
> doing
> > > this.
> >
> > Wel, you should "search and replace":
> >   "Numbers" -> "Machine Learning"
> >   commons-numbers -> commons-machinelearning
> >
> > Other things (repository URL, JIRA project name and URL) require that
> > a component be created (vote is pending).
> > [As long as those files are not part of a PR, it is not urgent to fix
> > them.]
> >
> > >          >> For sure, populate it with the code extracted from CM's
> > > "genetics"
> > > package and proceed with the enhancements.
> > > At first, I'd suggest to refactor the layout of the package (i.e.
> create
> > > a "subpackage" for each component of a genetic algorithm).
> > >               -- I am working on it.
> >
> > Great!
> >
> > > Did not commit the code till now.
> >
> > OK.  When you do, please ask for review on the "dev" ML.
> >
> > >           >>  Then some examination of the data-structures is required
> (a
> > > binary chromosome is currently stored as a "List<Integer>").
> > >               -- I have recently done some work on this. Could you
> please
> > > check this article and share your thought.
> > >                   "*https://arxiv.org/abs/2103.04751
> > > <https://arxiv.org/abs/2103.04751>*"
> >
> > Alex already provided a thorough response.
> > It's a pity that JDK's BitSet is missing a few methods (e.g. "append")
> > for a readily usable implementation of a "binary chromosome".
> >
> > Do you think that allele sets other than binary would be useful to
> > implement? [IIUC your document above, it seems not (?).]
> >
> > >           Are we thinking to use Spark for our parallelism
> >
> > No, if the code is to reside in Commons.
> >
> > > or a simple
> > > multi-threading of Java.
> >
> > Yes, we'd depend only on JDK classes.
> >
> > > I would prefer to use java multi-threading and
> > > avoid any other framework.
> > >           In java we don't have any library which can be used for AI/ML
> > > programming with a very minimal learning curve. Can we think of
> > fulfilling
> > > this need?
> >
> > That would be nice. Don't hesitate to enlist fellow programmers. :-)
> >
> > Regards,
> > Gilles
> >
> > >           This will be helpful for many java developers to venture into
> > > AI/ML without learning a new language like Python.
> > >
> > >
> > >>> [...]
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > For additional commands, e-mail: dev-help@commons.apache.org
> >
> >
>
> --
> Avijit Basak
>

Re: The case for a Commons component

Posted by Gilles Sadowski <gi...@gmail.com>.
Hello.

Le lun. 3 mai 2021 à 08:53, Avijit Basak <av...@gmail.com> a écrit :
>
> Hi
>
>           I would like to vote for *commons-ml*.

Wrong thread, again.

Sorry for the nit-picking, but whenever a vote is requested, it is
often the basis of an official decision that must be traceable by
other parties, such as the ASF's INFRAstructure people.
In this case (the eventual creation of a repository, they might not
need to be involved, so I've voted on your behalf in the proper
thread (but, please, confirm by acknowledging, in that *other*
thread that the vote is according to your preference).

Thanks,
Gilles


>> [...]

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: The case for a Commons component

Posted by Avijit Basak <av...@gmail.com>.
Hi

          I would like to vote for *commons-ml*.

Thanks & Regards
--Avijit Basak

On Mon, 3 May 2021 at 04:29, Gilles Sadowski <gi...@gmail.com> wrote:

> Hi.
>
> > [... Discussion about GA data-structures...]
>
> I'd suggest that we finalize the [Vote] before getting into the
> details...
>
> Currently, there have been votes by:
>   Emmanuel Bourg (-1)
>   Sebastian Bazley (-0)
>   Ralph Goers (+0)
>   Paul King (+1)
>
> So currently, the discussion should be focused on settling to the
> issues put forward by the opponents to having this new component:
>   * Problem 1: Functionality should go somewhere else (Emmanuel, Sebb)
>   * Problem 2: Who will contribute? (Ralph)
>
> Partial answers have been given.
> We need more opinions (and votes).
>
> Regards,
> Gilles
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

-- 
Avijit Basak

Re: The case for a Commons component

Posted by Gilles Sadowski <gi...@gmail.com>.
> > [...]
> >>
> >> So a procedural vote requires a majority.
> >
> > There is a small majority (irrespective of the binding vs non-binding
> > categories).
>
> In votes ONLY PMC member votes are counted. Other votes are
> advisory. PMC members should take those votes into account
> when voting.

That's the point indeed: the "advisory" information was not taken
into account.

Last time the PMC turned down a contribution, the conversation
had made it clear that the donating people did not intend to
support it.
Here we have the "opposite" case: Code that is rotting here could
be taken back to life.  Yet it seems that sparing some bits on the
ASF servers is more important than having people feel welcome
to contribute here.

> If you don’t understand that concept you shouldn’t
> be on a PMC.

Sure. There is "concept" for that nowadays: Cancel culture...

> Trying to justify creating a new Commons component by endlessly
> discussing the topic just isn’t going to work.
>
> I’ll not be responding to more emails on this thread

... exactly (see above).

> as I consider the
> matter closed.


Gilles

>
> Ralph

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: The case for a Commons component

Posted by Ralph Goers <ra...@dslextreme.com>.

> On May 6, 2021, at 3:04 AM, Gilles Sadowski <gi...@gmail.com> wrote:
>> 
>> It looks like you didn’t read the page.
> 
> I did, of course. And my interpretation differs.
> 
>> For clarity I am copying it here
>> 
>> "Votes on procedural issues follow the common format of majority rule unless
>> 
>> otherwise stated. That is, if there are more favourable votes than unfavourable ones,
>> 
>> the issue is considered to have passed -- regardless of the number of votes in each
>> 
>> category. (If the number of votes seems too small to be representative of a community
>> 
>> consensus, the issue is typically not pursued. However, see the description of
>> 
>> lazy consensus <https://www.apache.org/foundation/voting.html#LazyConsensus <https://www.apache.org/foundation/voting.html#LazyConsensus>> for a modifying factor.)"
>> 
>> 
>> So a procedural vote requires a majority.
> 
> There is a small majority (irrespective of the binding vs non-binding
> categories).

In votes ONLY PMC member votes are counted. Other votes are 
advisory. PMC members should take those votes into account 
when voting. If you don’t understand that concept you shouldn’t 
be on a PMC.

Trying to justify creating a new Commons component by endlessly
discussing the topic just isn’t going to work.

I’ll not be responding to more emails on this thread as I consider the 
matter closed.

Ralph

Re: The case for a Commons component

Posted by Gilles Sadowski <gi...@gmail.com>.
Le jeu. 6 mai 2021 à 07:53, Ralph Goers <ra...@dslextreme.com> a écrit :
>
>
> > On May 5, 2021, at 11:13 AM, Gilles Sadowski <gi...@gmail.com> wrote:
> >
> > Le mer. 5 mai 2021 à 17:44, Ralph Goers <ra...@dslextreme.com> a écrit :
> >>
> >>
> >>
> >>> On May 5, 2021, at 6:38 AM, Gilles Sadowski <gi...@gmail.com> wrote:
> >>>
> >>> Le mar. 4 mai 2021 à 02:49, Ralph Goers <ra...@dslextreme.com> a écrit :
> >>>>
> >>>> I apologize. I started another thread regarding the vote before seeing this.
> >>>
> >>> No problem.
> >>>
> >>>> Maybe that will get more attention?
> >>>
> >>> It doesn't seem so. :-}
> >>>
> >>> IMHO, valid answers have been given to the statements/questions
> >>> from people who didn't vote +1.
> >>> The very low turnout makes the arithmetics of the result fairly subjective...
> >>>
> >>> The optimistic view is that
> >>> 1. most people don't care (that the repository is created),
> >>> 2. there is no reason to doubt the infos provided by actual users of
> >>> those codes,
> >>> 3. there is an embryo of a community (perhaps not viable, but only
> >>> the future can tell...),[1]
> >>> 4. the same kind of welcoming gestures should apply for the proposed
> >>> contributions, as for the attempt to resuscitate "Commons Graph"[2],
> >>> even if some of the PMC might arguably prefer another option.
> >>
> >> Regardless, following https://www.apache.org/foundation/voting.html <https://www.apache.org/foundation/voting.html> indicates that this vote is not going to pass.
> >
> > How so?
> > [It's not about a code change; and no "technical argument" can be invoked.]
>
> It looks like you didn’t read the page.

I did, of course. And my interpretation differs.

> For clarity I am copying it here
>
> "Votes on procedural issues follow the common format of majority rule unless
>
> otherwise stated. That is, if there are more favourable votes than unfavourable ones,
>
> the issue is considered to have passed -- regardless of the number of votes in each
>
> category. (If the number of votes seems too small to be representative of a community
>
>  consensus, the issue is typically not pursued. However, see the description of
>
> lazy consensus <https://www.apache.org/foundation/voting.html#LazyConsensus> for a modifying factor.)"
>
>
> So a procedural vote requires a majority.

There is a small majority (irrespective of the binding vs non-binding
categories).

> But note that it also calls out that if the number of voters
> seems too small then the issue is usually not pursued.

"usually"...
In Commons, the number of votes has always been low, in
proportion of the official number of committers.
No surprise that, for very specific functionalities, it is even
lower.
However the main point should rather have been whether
the perspective exists that someone will do the work for
getting a chance for a community to ever exist.
In the case of ML algorithms, a discussion started that has
involved 4 people (among them 2 PMC people); this is largely
more than the "usual" attendance about any one specific
component's issue.

>  Both of these describe this situation perfectly.
> The vote did not get a majority of binding votes (it was a tie) and the number of votes was very small.
>
>
> >
> >> You can’t assert lazy consensus on an explicit vote.  If you had started this as a lazy consensus vote it
> >> is likely it would have still gotten a -1 vote since both Sebb and Emmanuel have voice opposition.
> >
>
>
> > A "veto" does not apply here.
> > Hence my remark on the "arithmetics" since the total tally is slightly
> > "pro" although the PMC tally is slightly "con”.
>
> Where did I use the word “veto”? I never used the word “veto”.

I was trying to figure out how you reached your conclusion from the
page which you referred to (i.e. how a "-1" vote would be sufficient).

> There are essentially 3 ways to vote,
> Yes, No, and Abstain. In a procedural vote + or -1 represent an abstention. Anything less than 0 is
> a No and anything greater is a Yes. So saying there were -1 votes implies there are “No” votes and
> therefore there is no consensus.

Oliver reminded us that "[...] every committer can start a new
component [in the sandbox]".
Your interpration of the procedural vote seems to mean that
anyone else can prevent such an initiative.

Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: The case for a Commons component

Posted by Ralph Goers <ra...@dslextreme.com>.
> On May 5, 2021, at 11:13 AM, Gilles Sadowski <gi...@gmail.com> wrote:
> 
> Le mer. 5 mai 2021 à 17:44, Ralph Goers <ra...@dslextreme.com> a écrit :
>> 
>> 
>> 
>>> On May 5, 2021, at 6:38 AM, Gilles Sadowski <gi...@gmail.com> wrote:
>>> 
>>> Le mar. 4 mai 2021 à 02:49, Ralph Goers <ra...@dslextreme.com> a écrit :
>>>> 
>>>> I apologize. I started another thread regarding the vote before seeing this.
>>> 
>>> No problem.
>>> 
>>>> Maybe that will get more attention?
>>> 
>>> It doesn't seem so. :-}
>>> 
>>> IMHO, valid answers have been given to the statements/questions
>>> from people who didn't vote +1.
>>> The very low turnout makes the arithmetics of the result fairly subjective...
>>> 
>>> The optimistic view is that
>>> 1. most people don't care (that the repository is created),
>>> 2. there is no reason to doubt the infos provided by actual users of
>>> those codes,
>>> 3. there is an embryo of a community (perhaps not viable, but only
>>> the future can tell...),[1]
>>> 4. the same kind of welcoming gestures should apply for the proposed
>>> contributions, as for the attempt to resuscitate "Commons Graph"[2],
>>> even if some of the PMC might arguably prefer another option.
>> 
>> Regardless, following https://www.apache.org/foundation/voting.html <https://www.apache.org/foundation/voting.html> indicates that this vote is not going to pass.
> 
> How so?
> [It's not about a code change; and no "technical argument" can be invoked.]

It looks like you didn’t read the page. For clarity I am copying it here

"Votes on procedural issues follow the common format of majority rule unless 

otherwise stated. That is, if there are more favourable votes than unfavourable ones, 

the issue is considered to have passed -- regardless of the number of votes in each 

category. (If the number of votes seems too small to be representative of a community

 consensus, the issue is typically not pursued. However, see the description of 

lazy consensus <https://www.apache.org/foundation/voting.html#LazyConsensus> for a modifying factor.)"


So a procedural vote requires a majority. But note that it also calls out that if the number of voters 
seems too small then the issue is usually not pursued.  Both of these describe this situation perfectly. 
The vote did not get a majority of binding votes (it was a tie) and the number of votes was very small.


> 
>> You can’t assert lazy consensus on an explicit vote.  If you had started this as a lazy consensus vote it
>> is likely it would have still gotten a -1 vote since both Sebb and Emmanuel have voice opposition.
> 


> A "veto" does not apply here.
> Hence my remark on the "arithmetics" since the total tally is slightly
> "pro" although the PMC tally is slightly "con”.

Where did I use the word “veto”? I never used the word “veto”.  There are essentially 3 ways to vote, 
Yes, No, and Abstain. In a procedural vote + or -1 represent an abstention. Anything less than 0 is 
a No and anything greater is a Yes. So saying there were -1 votes implies there are “No” votes and 
therefore there is no consensus.

Ralph



Re: The case for a Commons component

Posted by Gilles Sadowski <gi...@gmail.com>.
Le mer. 5 mai 2021 à 17:44, Ralph Goers <ra...@dslextreme.com> a écrit :
>
>
>
> > On May 5, 2021, at 6:38 AM, Gilles Sadowski <gi...@gmail.com> wrote:
> >
> > Le mar. 4 mai 2021 à 02:49, Ralph Goers <ra...@dslextreme.com> a écrit :
> >>
> >> I apologize. I started another thread regarding the vote before seeing this.
> >
> > No problem.
> >
> >> Maybe that will get more attention?
> >
> > It doesn't seem so. :-}
> >
> > IMHO, valid answers have been given to the statements/questions
> > from people who didn't vote +1.
> > The very low turnout makes the arithmetics of the result fairly subjective...
> >
> > The optimistic view is that
> >  1. most people don't care (that the repository is created),
> >  2. there is no reason to doubt the infos provided by actual users of
> > those codes,
> >  3. there is an embryo of a community (perhaps not viable, but only
> > the future can tell...),[1]
> >  4. the same kind of welcoming gestures should apply for the proposed
> > contributions, as for the attempt to resuscitate "Commons Graph"[2],
> > even if some of the PMC might arguably prefer another option.
>
> Regardless, following https://www.apache.org/foundation/voting.html <https://www.apache.org/foundation/voting.html> indicates that this vote is not going to pass.

How so?
[It's not about a code change; and no "technical argument" can be invoked.]

> You can’t assert lazy consensus on an explicit vote.  If you had started this as a lazy consensus vote it
> is likely it would have still gotten a -1 vote since both Sebb and Emmanuel have voice opposition.

A "veto" does not apply here.
Hence my remark on the "arithmetics" since the total tally is slightly
"pro" although the PMC tally is slightly "con".

Gilles

>
> Ralph
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: The case for a Commons component

Posted by Ralph Goers <ra...@dslextreme.com>.

> On May 5, 2021, at 6:38 AM, Gilles Sadowski <gi...@gmail.com> wrote:
> 
> Le mar. 4 mai 2021 à 02:49, Ralph Goers <ra...@dslextreme.com> a écrit :
>> 
>> I apologize. I started another thread regarding the vote before seeing this.
> 
> No problem.
> 
>> Maybe that will get more attention?
> 
> It doesn't seem so. :-}
> 
> IMHO, valid answers have been given to the statements/questions
> from people who didn't vote +1.
> The very low turnout makes the arithmetics of the result fairly subjective...
> 
> The optimistic view is that
>  1. most people don't care (that the repository is created),
>  2. there is no reason to doubt the infos provided by actual users of
> those codes,
>  3. there is an embryo of a community (perhaps not viable, but only
> the future can tell...),[1]
>  4. the same kind of welcoming gestures should apply for the proposed
> contributions, as for the attempt to resuscitate "Commons Graph"[2],
> even if some of the PMC might arguably prefer another option.

Regardless, following https://www.apache.org/foundation/voting.html <https://www.apache.org/foundation/voting.html> indicates that this vote is not going to pass. 
You can’t assert lazy consensus on an explicit vote.  If you had started this as a lazy consensus vote it 
is likely it would have still gotten a -1 vote since both Sebb and Emmanuel have voice opposition.

Ralph


Re: The case for a Commons component

Posted by Gilles Sadowski <gi...@gmail.com>.
Le mer. 5 mai 2021 à 18:57, Gary Gregory <ga...@gmail.com> a écrit :
>
> IMO the lack of +1s shows the lack of appetite to manage another component

That's certainly true.
And nobody is forced to do anything.

When the other CM spin-offs started, there was only _one_ person
willing to do the work.

Gilles

> [...]

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: The case for a Commons component

Posted by Gary Gregory <ga...@gmail.com>.
IMO the lack of +1s shows the lack of appetite to manage another component
that not "common" to "most" Java apps, where I use quotes to understand
that YMMV.

Personally, my plate is full with the current slate of components in which
I participate.

Gary

On Wed, May 5, 2021, 09:38 Gilles Sadowski <gi...@gmail.com> wrote:

> Le mar. 4 mai 2021 à 02:49, Ralph Goers <ra...@dslextreme.com> a
> écrit :
> >
> > I apologize. I started another thread regarding the vote before seeing
> this.
>
> No problem.
>
> > Maybe that will get more attention?
>
> It doesn't seem so. :-}
>
> IMHO, valid answers have been given to the statements/questions
> from people who didn't vote +1.
> The very low turnout makes the arithmetics of the result fairly
> subjective...
>
> The optimistic view is that
>   1. most people don't care (that the repository is created),
>   2. there is no reason to doubt the infos provided by actual users of
> those codes,
>   3. there is an embryo of a community (perhaps not viable, but only
> the future can tell...),[1]
>   4. the same kind of welcoming gestures should apply for the proposed
> contributions, as for the attempt to resuscitate "Commons Graph"[2],
> even if some of the PMC might arguably prefer another option.
>
> Regards,
> Gilles
>
> [1] Three Java implementations of the SOFM turned up as the top results
> of a web search; none seem to include multi-threading.
> [2] https://gitbox.apache.org/repos/asf?p=commons-graph.git
>
>
> >
> > Ralph
> >
> > > On May 2, 2021, at 3:59 PM, Gilles Sadowski <gi...@gmail.com>
> wrote:
> > >
> > > Hi.
> > >
> > >> [... Discussion about GA data-structures...]
> > >
> > > I'd suggest that we finalize the [Vote] before getting into the
> > > details...
> > >
> > > Currently, there have been votes by:
> > >  Emmanuel Bourg (-1)
> > >  Sebastian Bazley (-0)
> > >  Ralph Goers (+0)
> > >  Paul King (+1)
> > >
> > > So currently, the discussion should be focused on settling to the
> > > issues put forward by the opponents to having this new component:
> > >  * Problem 1: Functionality should go somewhere else (Emmanuel, Sebb)
> > >  * Problem 2: Who will contribute? (Ralph)
> > >
> > > Partial answers have been given.
> > > We need more opinions (and votes).
> > >
> > > Regards,
> > > Gilles
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

Re: The case for a Commons component

Posted by Gilles Sadowski <gi...@gmail.com>.
Le mar. 4 mai 2021 à 02:49, Ralph Goers <ra...@dslextreme.com> a écrit :
>
> I apologize. I started another thread regarding the vote before seeing this.

No problem.

> Maybe that will get more attention?

It doesn't seem so. :-}

IMHO, valid answers have been given to the statements/questions
from people who didn't vote +1.
The very low turnout makes the arithmetics of the result fairly subjective...

The optimistic view is that
  1. most people don't care (that the repository is created),
  2. there is no reason to doubt the infos provided by actual users of
those codes,
  3. there is an embryo of a community (perhaps not viable, but only
the future can tell...),[1]
  4. the same kind of welcoming gestures should apply for the proposed
contributions, as for the attempt to resuscitate "Commons Graph"[2],
even if some of the PMC might arguably prefer another option.

Regards,
Gilles

[1] Three Java implementations of the SOFM turned up as the top results
of a web search; none seem to include multi-threading.
[2] https://gitbox.apache.org/repos/asf?p=commons-graph.git


>
> Ralph
>
> > On May 2, 2021, at 3:59 PM, Gilles Sadowski <gi...@gmail.com> wrote:
> >
> > Hi.
> >
> >> [... Discussion about GA data-structures...]
> >
> > I'd suggest that we finalize the [Vote] before getting into the
> > details...
> >
> > Currently, there have been votes by:
> >  Emmanuel Bourg (-1)
> >  Sebastian Bazley (-0)
> >  Ralph Goers (+0)
> >  Paul King (+1)
> >
> > So currently, the discussion should be focused on settling to the
> > issues put forward by the opponents to having this new component:
> >  * Problem 1: Functionality should go somewhere else (Emmanuel, Sebb)
> >  * Problem 2: Who will contribute? (Ralph)
> >
> > Partial answers have been given.
> > We need more opinions (and votes).
> >
> > Regards,
> > Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: The case for a Commons component

Posted by Ralph Goers <ra...@dslextreme.com>.
I apologize. I started another thread regarding the vote before seeing this. Maybe that will get more attention?

Ralph

> On May 2, 2021, at 3:59 PM, Gilles Sadowski <gi...@gmail.com> wrote:
> 
> Hi.
> 
>> [... Discussion about GA data-structures...]
> 
> I'd suggest that we finalize the [Vote] before getting into the
> details...
> 
> Currently, there have been votes by:
>  Emmanuel Bourg (-1)
>  Sebastian Bazley (-0)
>  Ralph Goers (+0)
>  Paul King (+1)
> 
> So currently, the discussion should be focused on settling to the
> issues put forward by the opponents to having this new component:
>  * Problem 1: Functionality should go somewhere else (Emmanuel, Sebb)
>  * Problem 2: Who will contribute? (Ralph)
> 
> Partial answers have been given.
> We need more opinions (and votes).
> 
> Regards,
> Gilles
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
> 
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: The case for a Commons component

Posted by Gilles Sadowski <gi...@gmail.com>.
Hi.

> [... Discussion about GA data-structures...]

I'd suggest that we finalize the [Vote] before getting into the
details...

Currently, there have been votes by:
  Emmanuel Bourg (-1)
  Sebastian Bazley (-0)
  Ralph Goers (+0)
  Paul King (+1)

So currently, the discussion should be focused on settling to the
issues put forward by the opponents to having this new component:
  * Problem 1: Functionality should go somewhere else (Emmanuel, Sebb)
  * Problem 2: Who will contribute? (Ralph)

Partial answers have been given.
We need more opinions (and votes).

Regards,
Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: The case for a Commons component

Posted by Avijit Basak <av...@gmail.com>.
Hi

>>        Note: You cannot easily just use java.util.BitSet as you wish to
have
access to the underlying long[] to store the chromosome to enable efficient
crossover.
--Thanks for pointing this. However, I have considered few constraints
while doing the implementation.
     1) I extended the existing class AbstractListChromosome, which
requires a Generic type. This is the reason for using a list of Long.
However, I can extend the Chromosome and use an array of primitive long.
BitSet also uses a similar data structure.
     2) One problem of BitSet is the use of MSB to retain bits. As a
result, we won't be able to use the static utility methods of wrapper
classes(Long) for conversion between primitive type and string. We will
have to write custom code for conversion between string and integral types.
This is the only reason I have used BLOCKSIZE as 63 instead of 64.
>>// This is not actually required...
// int bit = cross & 64; // i.e. cross % 64
--Do you mean bit index is not required to calculate? How can we handle
crossover indexes which are not multiple of 64.
>> Do you think that allele sets other than binary would be useful to
implement? [IIUC your document above, it seems not (?).]
--The document only describes the data structure related to Binary
genotype. We already have an implementation of RandomKey genotype in
commons. We can think of adding other genotypes gradually.


Thanks & Regards
--Avijit Basak



On Sat, 1 May 2021 at 22:18, Gilles Sadowski <gi...@gmail.com> wrote:

> Le ven. 30 avr. 2021 à 17:40, Avijit Basak <av...@gmail.com> a
> écrit :
> >
> > Hi
> >
> >          >>lot of spurious references to "Commons Numbers"
> >              --I have only created the basic project structure. Changes
> > need to be made. Can anyone from the existing commons team help in doing
> > this.
>
> Wel, you should "search and replace":
>   "Numbers" -> "Machine Learning"
>   commons-numbers -> commons-machinelearning
>
> Other things (repository URL, JIRA project name and URL) require that
> a component be created (vote is pending).
> [As long as those files are not part of a PR, it is not urgent to fix
> them.]
>
> >          >> For sure, populate it with the code extracted from CM's
> > "genetics"
> > package and proceed with the enhancements.
> > At first, I'd suggest to refactor the layout of the package (i.e. create
> > a "subpackage" for each component of a genetic algorithm).
> >               -- I am working on it.
>
> Great!
>
> > Did not commit the code till now.
>
> OK.  When you do, please ask for review on the "dev" ML.
>
> >           >>  Then some examination of the data-structures is required (a
> > binary chromosome is currently stored as a "List<Integer>").
> >               -- I have recently done some work on this. Could you please
> > check this article and share your thought.
> >                   "*https://arxiv.org/abs/2103.04751
> > <https://arxiv.org/abs/2103.04751>*"
>
> Alex already provided a thorough response.
> It's a pity that JDK's BitSet is missing a few methods (e.g. "append")
> for a readily usable implementation of a "binary chromosome".
>
> Do you think that allele sets other than binary would be useful to
> implement? [IIUC your document above, it seems not (?).]
>
> >           Are we thinking to use Spark for our parallelism
>
> No, if the code is to reside in Commons.
>
> > or a simple
> > multi-threading of Java.
>
> Yes, we'd depend only on JDK classes.
>
> > I would prefer to use java multi-threading and
> > avoid any other framework.
> >           In java we don't have any library which can be used for AI/ML
> > programming with a very minimal learning curve. Can we think of
> fulfilling
> > this need?
>
> That would be nice. Don't hesitate to enlist fellow programmers. :-)
>
> Regards,
> Gilles
>
> >           This will be helpful for many java developers to venture into
> > AI/ML without learning a new language like Python.
> >
> >
> >>> [...]
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

-- 
Avijit Basak

Re: The case for a Commons component

Posted by Gilles Sadowski <gi...@gmail.com>.
Le ven. 30 avr. 2021 à 17:40, Avijit Basak <av...@gmail.com> a écrit :
>
> Hi
>
>          >>lot of spurious references to "Commons Numbers"
>              --I have only created the basic project structure. Changes
> need to be made. Can anyone from the existing commons team help in doing
> this.

Wel, you should "search and replace":
  "Numbers" -> "Machine Learning"
  commons-numbers -> commons-machinelearning

Other things (repository URL, JIRA project name and URL) require that
a component be created (vote is pending).
[As long as those files are not part of a PR, it is not urgent to fix them.]

>          >> For sure, populate it with the code extracted from CM's
> "genetics"
> package and proceed with the enhancements.
> At first, I'd suggest to refactor the layout of the package (i.e. create
> a "subpackage" for each component of a genetic algorithm).
>               -- I am working on it.

Great!

> Did not commit the code till now.

OK.  When you do, please ask for review on the "dev" ML.

>           >>  Then some examination of the data-structures is required (a
> binary chromosome is currently stored as a "List<Integer>").
>               -- I have recently done some work on this. Could you please
> check this article and share your thought.
>                   "*https://arxiv.org/abs/2103.04751
> <https://arxiv.org/abs/2103.04751>*"

Alex already provided a thorough response.
It's a pity that JDK's BitSet is missing a few methods (e.g. "append")
for a readily usable implementation of a "binary chromosome".

Do you think that allele sets other than binary would be useful to
implement? [IIUC your document above, it seems not (?).]

>           Are we thinking to use Spark for our parallelism

No, if the code is to reside in Commons.

> or a simple
> multi-threading of Java.

Yes, we'd depend only on JDK classes.

> I would prefer to use java multi-threading and
> avoid any other framework.
>           In java we don't have any library which can be used for AI/ML
> programming with a very minimal learning curve. Can we think of fulfilling
> this need?

That would be nice. Don't hesitate to enlist fellow programmers. :-)

Regards,
Gilles

>           This will be helpful for many java developers to venture into
> AI/ML without learning a new language like Python.
>
>
>>> [...]

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: The case for a Commons component

Posted by Avijit Basak <av...@gmail.com>.
Hi

         >>lot of spurious references to "Commons Numbers"
             --I have only created the basic project structure. Changes
need to be made. Can anyone from the existing commons team help in doing
this.
         >> For sure, populate it with the code extracted from CM's
"genetics"
package and proceed with the enhancements.
At first, I'd suggest to refactor the layout of the package (i.e. create
a "subpackage" for each component of a genetic algorithm).
              -- I am working on it. Did not commit the code till now.
          >>  Then some examination of the data-structures is required (a
binary chromosome is currently stored as a "List<Integer>").
              -- I have recently done some work on this. Could you please
check this article and share your thought.
                  "*https://arxiv.org/abs/2103.04751
<https://arxiv.org/abs/2103.04751>*"

          Are we thinking to use Spark for our parallelism or a simple
multi-threading of Java. I would prefer to use java multi-threading and
avoid any other framework.
          In java we don't have any library which can be used for AI/ML
programming with a very minimal learning curve. Can we think of fulfilling
this need?
          This will be helpful for many java developers to venture into
AI/ML without learning a new language like Python.


Thanks & Regards
--Avijit Basak

On Wed, 28 Apr 2021 at 18:48, Gilles Sadowski <gi...@gmail.com> wrote:

> Le lun. 26 avr. 2021 à 16:18, Avijit Basak <av...@gmail.com> a
> écrit :
> >
> > Hi
> >
> >         As per previous discussions, I have created a temporary
> repository
> > in GitHub under my personal GitHub Id(avijitbasak). The artifacts have
> been
> > copied from commons-numbers. A preliminary structure has been created for
> > the proposed component.
> > Please let me know if we want to proceed with this format.
>
> There is no source code (and a lot of spurious references to
> "Commons Numbers").
> For sure, populate it with the code extracted from CM's "genetics"
> package and proceed with the enhancements.
> At first, I'd suggest to refactor the layout of the package (i.e. create
> a "subpackage" for each component of a genetic algorithm).
> Then some examination of the data-structures is required (a binary
> chromosome is currently stored as a "List<Integer>").
> Shouldn't the whole design be revised (based on interfaces and
> streams)?
>
> > We can copy the
> > same to any other team repository if required.
>
> That would be a repository on an ASF server, once the pending vote
> process is completed.  [By the way: You didn't vote...]
>
> Regards,
> Gilles
>
> >> [...]
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

-- 
Avijit Basak

Re: The case for a Commons component

Posted by Gilles Sadowski <gi...@gmail.com>.
Le lun. 26 avr. 2021 à 16:18, Avijit Basak <av...@gmail.com> a écrit :
>
> Hi
>
>         As per previous discussions, I have created a temporary repository
> in GitHub under my personal GitHub Id(avijitbasak). The artifacts have been
> copied from commons-numbers. A preliminary structure has been created for
> the proposed component.
> Please let me know if we want to proceed with this format.

There is no source code (and a lot of spurious references to
"Commons Numbers").
For sure, populate it with the code extracted from CM's "genetics"
package and proceed with the enhancements.
At first, I'd suggest to refactor the layout of the package (i.e. create
a "subpackage" for each component of a genetic algorithm).
Then some examination of the data-structures is required (a binary
chromosome is currently stored as a "List<Integer>").
Shouldn't the whole design be revised (based on interfaces and
streams)?

> We can copy the
> same to any other team repository if required.

That would be a repository on an ASF server, once the pending vote
process is completed.  [By the way: You didn't vote...]

Regards,
Gilles

>> [...]

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: The case for a Commons component

Posted by Gilles Sadowski <gi...@gmail.com>.
Le lun. 26 avr. 2021 à 17:08, Ralph Goers <ra...@dslextreme.com> a écrit :
>
> How many committers will be active for this component?

No less than there were for [RNG], [Numbers] and [Geometry]. ;-)

Those new components have attracted high-quality contributions;
two of the people who provided them have become committers.

Gilles

> > [...]

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: The case for a Commons component

Posted by Ralph Goers <ra...@dslextreme.com>.
How many committers will be active for this component?

Ralph

> On Apr 26, 2021, at 7:17 AM, Avijit Basak <av...@gmail.com> wrote:
> 
> Hi
> 
>        As per previous discussions, I have created a temporary repository
> in GitHub under my personal GitHub Id(avijitbasak). The artifacts have been
> copied from commons-numbers. A preliminary structure has been created for
> the proposed component.
> Please let me know if we want to proceed with this format. We can copy the
> same to any other team repository if required.
> 
>        Repo URL: https://github.com/avijitbasak/commons-machinelearning
> 
> Thanks & Regards
> --Avijit Basak
> 
> On Mon, 26 Apr 2021 at 04:49, Paul King <pa...@gmail.com> wrote:
> 
>> On Mon, Apr 26, 2021 at 12:27 AM sebb <se...@gmail.com> wrote:
>>> 
>>> I assume this thread is about the possible ML component.
>>> 
>>> If the code was developed by Commons, I assume it could be used as
>>> part of Spark.
>>> However Commons does not currently have many developers who are
>>> familiar with the field.
>>> So it would seem to me better to have development done by a project
>>> which does have relevant experience.
>>> 
>>> You say that Spark etc have lots of jars.
>>> Surely that allows for it to be implemented as a separate jar which
>>> can either be used as part of the Spark platform, or used
>>> independently?
>> 
>> The stats I gave were for the current minimal use of those algorithms.
>> Most algorithms are written in Scala, use RDD "dataframes" rather than
>> say double arrays, and assume you're running on "the platform" which
>> handles how you might get your data and return results and do logging
>> etc. in a potentially concurrent world. Some of those design choices
>> are key to scaling up but don't align with the goal of making the
>> algorithms runnable "independently".
>> 
>>> The only other option I see is for Commons to persuade some developers
>>> who are familiar with the field to join Commons to assist with the
>>> algorithms.
>> 
>> I agree that is the crux of the issue here. The "commons doesn't have
>> the bandwidth to absorb another algorithm" part of the discussion
>> seems perfectly legit to me. The "and there is an obvious home
>> elsewhere" part of the discussion seemed a little more dubious to me,
>> though obviously that is something which should be considered.
>> 
>>> Existing Commons developers can help manage the logistics of packaging
>>> and releasing the code, as this does not require in depth knowledge of
>>> the design.
>>> However this only makes sense if the developers skilled in the are are
>>> prepared to assist long-term.
>>> 
>>> 
>>> On Sat, 24 Apr 2021 at 23:32, Paul King <pa...@gmail.com>
>> wrote:
>>>> 
>>>> Thanks Gilles,
>>>> 
>>>> I can provide the same sort of stats across a clustering example
>>>> across commons-math (KMeans) vs Apache Ignite, Apache Spark and
>>>> Rheem/Apache Wayang (incubating) if anyone would find that useful. It
>>>> would no doubt lead to similar conclusions.
>>>> 
>>>> Cheers, Paul.
>>>> 
>>>> On Sun, Apr 25, 2021 at 8:15 AM Gilles Sadowski <gi...@gmail.com>
>> wrote:
>>>>> 
>>>>> Hello Paul.
>>>>> 
>>>>> Le sam. 24 avr. 2021 à 04:42, Paul King <pa...@gmail.com>
>> a écrit :
>>>>>> 
>>>>>> I added some more comments relevant to if the proposed algorithm
>>>>>> belongs somewhere in the commons "math" area back in the Jira:
>>>>>> 
>>>>>> https://issues.apache.org/jira/browse/MATH-1563
>>>>> 
>>>>> Thanks for a "real" user's testimony.
>>>>> 
>>>>> As the ML is still the official forum for such a discussion, I'm
>> quoting
>>>>> part of your post on JIRA:
>>>>> ---CUT---
>>>>> For linear regression, taking just one example dataset, commons-math
>>>>> is a couple of library calls for a single 2M library and solves the
>>>>> problem in 240ms. Both Ignite and Spark involve "firing up the
>>>>> platform" and the code is more complex for simple scenarios. Spark
>> has
>>>>> a 181M footprint across 210 jars and solves the problem in about 20s.
>>>>> Ignite has a 87M footprint across 85 jars and solves the problem in >
>>>>> 40s. But I can also find more complex scenarios which need to scale
>>>>> where Ignite and Spark really come into their own.
>>>>> ---CUT---
>>>>> 
>>>>> A similar rationale was behind my developing/using the SOFM
>>>>> functionality in the "o.a.c.m.ml.neuralnet" package: I needed a
>>>>> proof of concept, and taking the "lightweight" path seemed more
>>>>> effective than experimenting with those platforms.
>>>>> Admittingly, at that epoch, there were people around, who were
>>>>> maintaining the clustering and GA codes; hence, the prototyping
>>>>> of a machine-learning library didn't look strange to anyone.
>>>>> 
>>>>> Regards,
>>>>> Gilles
>>>>> 
>>>>>>>> [...]
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>> 
>> 
> 
> -- 
> Avijit Basak



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: The case for a Commons component

Posted by Avijit Basak <av...@gmail.com>.
Hi

        As per previous discussions, I have created a temporary repository
in GitHub under my personal GitHub Id(avijitbasak). The artifacts have been
copied from commons-numbers. A preliminary structure has been created for
the proposed component.
Please let me know if we want to proceed with this format. We can copy the
same to any other team repository if required.

        Repo URL: https://github.com/avijitbasak/commons-machinelearning

Thanks & Regards
--Avijit Basak

On Mon, 26 Apr 2021 at 04:49, Paul King <pa...@gmail.com> wrote:

> On Mon, Apr 26, 2021 at 12:27 AM sebb <se...@gmail.com> wrote:
> >
> > I assume this thread is about the possible ML component.
> >
> > If the code was developed by Commons, I assume it could be used as
> > part of Spark.
> > However Commons does not currently have many developers who are
> > familiar with the field.
> > So it would seem to me better to have development done by a project
> > which does have relevant experience.
> >
> > You say that Spark etc have lots of jars.
> > Surely that allows for it to be implemented as a separate jar which
> > can either be used as part of the Spark platform, or used
> > independently?
>
> The stats I gave were for the current minimal use of those algorithms.
> Most algorithms are written in Scala, use RDD "dataframes" rather than
> say double arrays, and assume you're running on "the platform" which
> handles how you might get your data and return results and do logging
> etc. in a potentially concurrent world. Some of those design choices
> are key to scaling up but don't align with the goal of making the
> algorithms runnable "independently".
>
> > The only other option I see is for Commons to persuade some developers
> > who are familiar with the field to join Commons to assist with the
> > algorithms.
>
> I agree that is the crux of the issue here. The "commons doesn't have
> the bandwidth to absorb another algorithm" part of the discussion
> seems perfectly legit to me. The "and there is an obvious home
> elsewhere" part of the discussion seemed a little more dubious to me,
> though obviously that is something which should be considered.
>
> > Existing Commons developers can help manage the logistics of packaging
> > and releasing the code, as this does not require in depth knowledge of
> > the design.
> > However this only makes sense if the developers skilled in the are are
> > prepared to assist long-term.
> >
> >
> > On Sat, 24 Apr 2021 at 23:32, Paul King <pa...@gmail.com>
> wrote:
> > >
> > > Thanks Gilles,
> > >
> > > I can provide the same sort of stats across a clustering example
> > > across commons-math (KMeans) vs Apache Ignite, Apache Spark and
> > > Rheem/Apache Wayang (incubating) if anyone would find that useful. It
> > > would no doubt lead to similar conclusions.
> > >
> > > Cheers, Paul.
> > >
> > > On Sun, Apr 25, 2021 at 8:15 AM Gilles Sadowski <gi...@gmail.com>
> wrote:
> > > >
> > > > Hello Paul.
> > > >
> > > > Le sam. 24 avr. 2021 à 04:42, Paul King <pa...@gmail.com>
> a écrit :
> > > > >
> > > > > I added some more comments relevant to if the proposed algorithm
> > > > > belongs somewhere in the commons "math" area back in the Jira:
> > > > >
> > > > > https://issues.apache.org/jira/browse/MATH-1563
> > > >
> > > > Thanks for a "real" user's testimony.
> > > >
> > > > As the ML is still the official forum for such a discussion, I'm
> quoting
> > > > part of your post on JIRA:
> > > > ---CUT---
> > > > For linear regression, taking just one example dataset, commons-math
> > > > is a couple of library calls for a single 2M library and solves the
> > > > problem in 240ms. Both Ignite and Spark involve "firing up the
> > > > platform" and the code is more complex for simple scenarios. Spark
> has
> > > > a 181M footprint across 210 jars and solves the problem in about 20s.
> > > > Ignite has a 87M footprint across 85 jars and solves the problem in >
> > > > 40s. But I can also find more complex scenarios which need to scale
> > > > where Ignite and Spark really come into their own.
> > > > ---CUT---
> > > >
> > > > A similar rationale was behind my developing/using the SOFM
> > > > functionality in the "o.a.c.m.ml.neuralnet" package: I needed a
> > > > proof of concept, and taking the "lightweight" path seemed more
> > > > effective than experimenting with those platforms.
> > > > Admittingly, at that epoch, there were people around, who were
> > > > maintaining the clustering and GA codes; hence, the prototyping
> > > > of a machine-learning library didn't look strange to anyone.
> > > >
> > > > Regards,
> > > > Gilles
> > > >
> > > > >>> [...]
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > > > For additional commands, e-mail: dev-help@commons.apache.org
> > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > > For additional commands, e-mail: dev-help@commons.apache.org
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > For additional commands, e-mail: dev-help@commons.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

-- 
Avijit Basak

Re: The case for a Commons component

Posted by Paul King <pa...@gmail.com>.
On Mon, Apr 26, 2021 at 12:27 AM sebb <se...@gmail.com> wrote:
>
> I assume this thread is about the possible ML component.
>
> If the code was developed by Commons, I assume it could be used as
> part of Spark.
> However Commons does not currently have many developers who are
> familiar with the field.
> So it would seem to me better to have development done by a project
> which does have relevant experience.
>
> You say that Spark etc have lots of jars.
> Surely that allows for it to be implemented as a separate jar which
> can either be used as part of the Spark platform, or used
> independently?

The stats I gave were for the current minimal use of those algorithms.
Most algorithms are written in Scala, use RDD "dataframes" rather than
say double arrays, and assume you're running on "the platform" which
handles how you might get your data and return results and do logging
etc. in a potentially concurrent world. Some of those design choices
are key to scaling up but don't align with the goal of making the
algorithms runnable "independently".

> The only other option I see is for Commons to persuade some developers
> who are familiar with the field to join Commons to assist with the
> algorithms.

I agree that is the crux of the issue here. The "commons doesn't have
the bandwidth to absorb another algorithm" part of the discussion
seems perfectly legit to me. The "and there is an obvious home
elsewhere" part of the discussion seemed a little more dubious to me,
though obviously that is something which should be considered.

> Existing Commons developers can help manage the logistics of packaging
> and releasing the code, as this does not require in depth knowledge of
> the design.
> However this only makes sense if the developers skilled in the are are
> prepared to assist long-term.
>
>
> On Sat, 24 Apr 2021 at 23:32, Paul King <pa...@gmail.com> wrote:
> >
> > Thanks Gilles,
> >
> > I can provide the same sort of stats across a clustering example
> > across commons-math (KMeans) vs Apache Ignite, Apache Spark and
> > Rheem/Apache Wayang (incubating) if anyone would find that useful. It
> > would no doubt lead to similar conclusions.
> >
> > Cheers, Paul.
> >
> > On Sun, Apr 25, 2021 at 8:15 AM Gilles Sadowski <gi...@gmail.com> wrote:
> > >
> > > Hello Paul.
> > >
> > > Le sam. 24 avr. 2021 à 04:42, Paul King <pa...@gmail.com> a écrit :
> > > >
> > > > I added some more comments relevant to if the proposed algorithm
> > > > belongs somewhere in the commons "math" area back in the Jira:
> > > >
> > > > https://issues.apache.org/jira/browse/MATH-1563
> > >
> > > Thanks for a "real" user's testimony.
> > >
> > > As the ML is still the official forum for such a discussion, I'm quoting
> > > part of your post on JIRA:
> > > ---CUT---
> > > For linear regression, taking just one example dataset, commons-math
> > > is a couple of library calls for a single 2M library and solves the
> > > problem in 240ms. Both Ignite and Spark involve "firing up the
> > > platform" and the code is more complex for simple scenarios. Spark has
> > > a 181M footprint across 210 jars and solves the problem in about 20s.
> > > Ignite has a 87M footprint across 85 jars and solves the problem in >
> > > 40s. But I can also find more complex scenarios which need to scale
> > > where Ignite and Spark really come into their own.
> > > ---CUT---
> > >
> > > A similar rationale was behind my developing/using the SOFM
> > > functionality in the "o.a.c.m.ml.neuralnet" package: I needed a
> > > proof of concept, and taking the "lightweight" path seemed more
> > > effective than experimenting with those platforms.
> > > Admittingly, at that epoch, there were people around, who were
> > > maintaining the clustering and GA codes; hence, the prototyping
> > > of a machine-learning library didn't look strange to anyone.
> > >
> > > Regards,
> > > Gilles
> > >
> > > >>> [...]
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > > For additional commands, e-mail: dev-help@commons.apache.org
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > For additional commands, e-mail: dev-help@commons.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: The case for a Commons component

Posted by Gilles Sadowski <gi...@gmail.com>.
Le dim. 25 avr. 2021 à 16:27, sebb <se...@gmail.com> a écrit :
>
> I assume this thread is about the possible ML component.

I hesitated with Subject: "The case for *any* Commons component".

> If the code was developed by Commons, I assume it could be used as
> part of Spark.
> However Commons does not currently have many developers who are
> familiar with the field.
> So it would seem to me better to have development done by a project
> which does have relevant experience.

I expressed the same concern/opinion; in fact, if I were tempted
to implement something of the like now, I would probably indeed
start experimenting with Spark. [CM's implementation of SOFM
dates from early 2014.]

On the other hand, several people (at different times) expressed
an interest of having such codes free of the "high-level" features
that come with the "platforms".
My own current usage of the "neuralnet" package does not
warrant a move to Spark.
I'm also interested in refactoring the "clustering" package (but will
not pursue it alone).

> You say that Spark etc have lots of jars.
> Surely that allows for it to be implemented as a separate jar which
> can either be used as part of the Spark platform, or used
> independently?

https://spark.apache.org/docs/latest/spark-standalone.html

TL;DR; but there are many references to a "cluster", so that seems
the common use-case, while code here could for example focus on
multi-thread-ready components, primarily targetting applications that
run in a single multi-core machine).

> The only other option I see is for Commons to persuade some developers
> who are familiar with the field to join Commons to assist with the
> algorithms.
> Existing Commons developers can help manage the logistics of packaging
> and releasing the code, as this does not require in depth knowledge of
> the design.
> However this only makes sense if the developers skilled in the are are
> prepared to assist long-term.

I try to make that crystal-clear to every new contributor (cf. proposal to
revive "Commons Graph", the exchange on refactoring  the "clustering"
package, the necessary features for a GA implementation that purports
to be more than a toy example, ...).

However, it is obviously impossible to enforce something as "prepared
to assist long-term"; it is rightfully a necessary condition for being
granted commit access, but it's up to the project to create a "place"
where people want to stay (and know what to expect).
For people interested in "ML" (not necessarily experts: They could be
developers willing to implement standard algorithms, as we did in CM),
it means that there should be global guidelines (like there were for CM)
such as e.g. "multi-thread-ready" (in addition to the usual "full doc",
"full coverage", etc.), and a repository for those codes.

We don't have much grasp on the arrival rate of contributors but I
contend that a component with a specific scope is much more
appealing (especially to newcomers) than a mixed bag à la CM
which nobody here is able (or willing) to maintain (and the reason
why I'll only merge bug-fixes).

Not creating the "place" will of course pave the way to a self-fulfilling
prophecy.

Gilles

> On Sat, 24 Apr 2021 at 23:32, Paul King <pa...@gmail.com> wrote:
> >
> > Thanks Gilles,
> >
> > I can provide the same sort of stats across a clustering example
> > across commons-math (KMeans) vs Apache Ignite, Apache Spark and
> > Rheem/Apache Wayang (incubating) if anyone would find that useful. It
> > would no doubt lead to similar conclusions.
> >
> > Cheers, Paul.
> >
> > On Sun, Apr 25, 2021 at 8:15 AM Gilles Sadowski <gi...@gmail.com> wrote:
> > >
> > > Hello Paul.
> > >
> > > Le sam. 24 avr. 2021 à 04:42, Paul King <pa...@gmail.com> a écrit :
> > > >
> > > > I added some more comments relevant to if the proposed algorithm
> > > > belongs somewhere in the commons "math" area back in the Jira:
> > > >
> > > > https://issues.apache.org/jira/browse/MATH-1563
> > >
> > > Thanks for a "real" user's testimony.
> > >
> > > As the ML is still the official forum for such a discussion, I'm quoting
> > > part of your post on JIRA:
> > > ---CUT---
> > > For linear regression, taking just one example dataset, commons-math
> > > is a couple of library calls for a single 2M library and solves the
> > > problem in 240ms. Both Ignite and Spark involve "firing up the
> > > platform" and the code is more complex for simple scenarios. Spark has
> > > a 181M footprint across 210 jars and solves the problem in about 20s.
> > > Ignite has a 87M footprint across 85 jars and solves the problem in >
> > > 40s. But I can also find more complex scenarios which need to scale
> > > where Ignite and Spark really come into their own.
> > > ---CUT---
> > >
> > > A similar rationale was behind my developing/using the SOFM
> > > functionality in the "o.a.c.m.ml.neuralnet" package: I needed a
> > > proof of concept, and taking the "lightweight" path seemed more
> > > effective than experimenting with those platforms.
> > > Admittingly, at that epoch, there were people around, who were
> > > maintaining the clustering and GA codes; hence, the prototyping
> > > of a machine-learning library didn't look strange to anyone.
> > >
> > > Regards,
> > > Gilles
> > >
> > > >>> [...]

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: The case for a Commons component

Posted by sebb <se...@gmail.com>.
I assume this thread is about the possible ML component.

If the code was developed by Commons, I assume it could be used as
part of Spark.
However Commons does not currently have many developers who are
familiar with the field.
So it would seem to me better to have development done by a project
which does have relevant experience.

You say that Spark etc have lots of jars.
Surely that allows for it to be implemented as a separate jar which
can either be used as part of the Spark platform, or used
independently?

The only other option I see is for Commons to persuade some developers
who are familiar with the field to join Commons to assist with the
algorithms.
Existing Commons developers can help manage the logistics of packaging
and releasing the code, as this does not require in depth knowledge of
the design.
However this only makes sense if the developers skilled in the are are
prepared to assist long-term.


On Sat, 24 Apr 2021 at 23:32, Paul King <pa...@gmail.com> wrote:
>
> Thanks Gilles,
>
> I can provide the same sort of stats across a clustering example
> across commons-math (KMeans) vs Apache Ignite, Apache Spark and
> Rheem/Apache Wayang (incubating) if anyone would find that useful. It
> would no doubt lead to similar conclusions.
>
> Cheers, Paul.
>
> On Sun, Apr 25, 2021 at 8:15 AM Gilles Sadowski <gi...@gmail.com> wrote:
> >
> > Hello Paul.
> >
> > Le sam. 24 avr. 2021 à 04:42, Paul King <pa...@gmail.com> a écrit :
> > >
> > > I added some more comments relevant to if the proposed algorithm
> > > belongs somewhere in the commons "math" area back in the Jira:
> > >
> > > https://issues.apache.org/jira/browse/MATH-1563
> >
> > Thanks for a "real" user's testimony.
> >
> > As the ML is still the official forum for such a discussion, I'm quoting
> > part of your post on JIRA:
> > ---CUT---
> > For linear regression, taking just one example dataset, commons-math
> > is a couple of library calls for a single 2M library and solves the
> > problem in 240ms. Both Ignite and Spark involve "firing up the
> > platform" and the code is more complex for simple scenarios. Spark has
> > a 181M footprint across 210 jars and solves the problem in about 20s.
> > Ignite has a 87M footprint across 85 jars and solves the problem in >
> > 40s. But I can also find more complex scenarios which need to scale
> > where Ignite and Spark really come into their own.
> > ---CUT---
> >
> > A similar rationale was behind my developing/using the SOFM
> > functionality in the "o.a.c.m.ml.neuralnet" package: I needed a
> > proof of concept, and taking the "lightweight" path seemed more
> > effective than experimenting with those platforms.
> > Admittingly, at that epoch, there were people around, who were
> > maintaining the clustering and GA codes; hence, the prototyping
> > of a machine-learning library didn't look strange to anyone.
> >
> > Regards,
> > Gilles
> >
> > >>> [...]
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > For additional commands, e-mail: dev-help@commons.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: The case for a Commons component

Posted by Gilles Sadowski <gi...@gmail.com>.
Le dim. 25 avr. 2021 à 00:32, Paul King <pa...@gmail.com> a écrit :
>
> Thanks Gilles,
>
> I can provide the same sort of stats across a clustering example
> across commons-math (KMeans) vs Apache Ignite, Apache Spark and
> Rheem/Apache Wayang (incubating) if anyone would find that useful. It
> would no doubt lead to similar conclusions.

There also were relatively recent discussions concerning the codes in
the "o.a.c.m.ml.clustering" package.[1]
If they are useful as of the old CM v3.6.1, they can very probably be
improved upon in terms of flexibilty[2] and performance through (a.o.
things) multi-threading (in much the same way as for GA, I guess).

Best regards,
Gilles

[1] https://issues.apache.org/jira/browse/MATH-1515
[2] Fixes and enhancements are already in CM "master" branch.

>
> Cheers, Paul.
>
> On Sun, Apr 25, 2021 at 8:15 AM Gilles Sadowski <gi...@gmail.com> wrote:
> >
> > Hello Paul.
> >
> > Le sam. 24 avr. 2021 à 04:42, Paul King <pa...@gmail.com> a écrit :
> > >
> > > I added some more comments relevant to if the proposed algorithm
> > > belongs somewhere in the commons "math" area back in the Jira:
> > >
> > > https://issues.apache.org/jira/browse/MATH-1563
> >
> > Thanks for a "real" user's testimony.
> >
> > As the ML is still the official forum for such a discussion, I'm quoting
> > part of your post on JIRA:
> > ---CUT---
> > For linear regression, taking just one example dataset, commons-math
> > is a couple of library calls for a single 2M library and solves the
> > problem in 240ms. Both Ignite and Spark involve "firing up the
> > platform" and the code is more complex for simple scenarios. Spark has
> > a 181M footprint across 210 jars and solves the problem in about 20s.
> > Ignite has a 87M footprint across 85 jars and solves the problem in >
> > 40s. But I can also find more complex scenarios which need to scale
> > where Ignite and Spark really come into their own.
> > ---CUT---
> >
> > A similar rationale was behind my developing/using the SOFM
> > functionality in the "o.a.c.m.ml.neuralnet" package: I needed a
> > proof of concept, and taking the "lightweight" path seemed more
> > effective than experimenting with those platforms.
> > Admittingly, at that epoch, there were people around, who were
> > maintaining the clustering and GA codes; hence, the prototyping
> > of a machine-learning library didn't look strange to anyone.
> >
> > Regards,
> > Gilles
> >
> > >>> [...]

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: The case for a Commons component

Posted by Paul King <pa...@gmail.com>.
Thanks Gilles,

I can provide the same sort of stats across a clustering example
across commons-math (KMeans) vs Apache Ignite, Apache Spark and
Rheem/Apache Wayang (incubating) if anyone would find that useful. It
would no doubt lead to similar conclusions.

Cheers, Paul.

On Sun, Apr 25, 2021 at 8:15 AM Gilles Sadowski <gi...@gmail.com> wrote:
>
> Hello Paul.
>
> Le sam. 24 avr. 2021 à 04:42, Paul King <pa...@gmail.com> a écrit :
> >
> > I added some more comments relevant to if the proposed algorithm
> > belongs somewhere in the commons "math" area back in the Jira:
> >
> > https://issues.apache.org/jira/browse/MATH-1563
>
> Thanks for a "real" user's testimony.
>
> As the ML is still the official forum for such a discussion, I'm quoting
> part of your post on JIRA:
> ---CUT---
> For linear regression, taking just one example dataset, commons-math
> is a couple of library calls for a single 2M library and solves the
> problem in 240ms. Both Ignite and Spark involve "firing up the
> platform" and the code is more complex for simple scenarios. Spark has
> a 181M footprint across 210 jars and solves the problem in about 20s.
> Ignite has a 87M footprint across 85 jars and solves the problem in >
> 40s. But I can also find more complex scenarios which need to scale
> where Ignite and Spark really come into their own.
> ---CUT---
>
> A similar rationale was behind my developing/using the SOFM
> functionality in the "o.a.c.m.ml.neuralnet" package: I needed a
> proof of concept, and taking the "lightweight" path seemed more
> effective than experimenting with those platforms.
> Admittingly, at that epoch, there were people around, who were
> maintaining the clustering and GA codes; hence, the prototyping
> of a machine-learning library didn't look strange to anyone.
>
> Regards,
> Gilles
>
> >>> [...]
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org