You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@commons.apache.org by Avijit Basak <av...@gmail.com> on 2021/04/12 15:19:55 UTC

Re: [Vote] Create a "machine learning" component

Hi

         Sorry for the delayed response. Thanks for your patience. Please
find my comments below:

 (1) Why not Spark?  [At least post over there (?).]
      --We can move to Spark. But it will be very much useful if the things
can also run without Spark. The use of Spark would make more sense in a
production environment. But the portability of the library will be more
useful for the non-prod environment. Definitely, we can reach the Spark
team and query.
 (2) Further develop a monolithic CM?  [Who will do it?]
       --I can help with the upgrade of the existing library related to GA
functionality.
 (3) Modularize CM? [Who will do it?]
       --I can help with the upgrade of the existing library related to GA
functionality.
 (4) New component (with another name) with the proposed contents?
       --This is the best option if permitted.

      The code which I have written can be reused with minor modifications.
So it won't take too much effort for this activity.
      Kindly share further thoughts.

Thanks & Regards
--Avijit Basak


On Sun, 14 Feb 2021 at 19:56, Gilles Sadowski <gi...@gmail.com> wrote:

> Le dim. 14 févr. 2021 à 09:06, Avijit Basak <av...@gmail.com> a
> écrit :
> >
> > Hi
> >
> >        I would like to mention a few points here. Genetic Algorithm has a
> > vast range of applications in optimization and search problems. Machine
> > learning is only one of those.
> >        If we couple the new GA library with any specific domain like ml
> it
> > would be meaningless for people working in other domains.
>
> Isn't "meaningless" a slight overstatement?
> We might have an issue of terminology: There is no necessary "coupling"
> but maybe "acquaintance" (for lack of a better word), as a set of tools
> that
> might come in handy for solving certain types of problems.  [For example,
> the Traveling Salesman Problem can be tackled by GA and SOFM, both
> of which are candidate for inclusion in the new component, although they
> don't share any code.]
>
> If the name "machine learning" is not the most appropriate one to convey
> the intended scope, do you have another idea?
> ["AI" would perhaps be more correct if we consider a strict hierarchy, but
> would obviously be far too presumptuous.]
>
> > They have to
> > incorporate the entire ml library
>
> No, they won't.  Given the stated goal of "modularity": the "ga" module
> will be available as a dedicated JAR (possibly with a dependency to
> codes that can be reused in other modules provided by the component).
>
> > which may be completely unrelated to
> > their project. Coupling it with any technology like spark might also
> limit
> > it's usability.
>
> You may be right; I have no idea about the "restrictions" imposed by
> Spark.  [It seems that in this case, one would have to indeed depend
> on Spark's "mllib" (?).  This would be one reason, as I already stated,
> for having something in "Commons".]
>
> Could you elaborate on a concrete use-case where one would be
> starting to develop an application with the specific requirement that
> Spark could not be used?
> In particular, IIRC Spark has multi-threading built in.  Don't you see
> it as a huge problem that CM would not provide such a feature?
>
> >        If a separate component is not approved for this change then we
> can
> > incorporate the changes as part of *commons.math* library.
>
> Of course, if somebody wants to do that, he's welcome.
> [That will not be me, for all the reasons which I've explained.  In the
> last
> 5 years I've been pretty much alone in handling bug reports about CM;
> I'm unwilling to assume implicit support for even more codes.]
>
> Also, with this solution, you'd now be willing to accept what you weren't
> above: Anyone wanting to use the GA functionality would indeed have to
> "incorporate" the whole of "Commons Math" (CM).
> Of course, the latter could be modularized, but this will only mitigate the
> issue, as any release of the GA functionality will potentially be then held
> off by potential issues in other parts of CM (which nobody has been able
> to consistently support for more than 5 years now).
>
> >        The same library can be reused in ml or neural network libraries
> as
> > a dependency.
>
> It is the other way around:  The development version of CM currently
> depends on "lower-level" components.
> Furthermore, right now its (embryonic) "machine learning" functionality
> hasn't any substantial dependency on codes outside the "o.a.c.math4.ml"
> package.
>
> >        Kindly share further views on this.
>
> In summary, to be clarified:
>  (1) Why not Spark?  [At least post over there (?).]
>  (2) Further develop a monolithic CM?  [Who will do it?]
>  (3) Modularize CM? [Who will do it?]
>  (4) New component (with another name) with the proposed contents?
>
> To make things clear from my side:  As a *user*, I've currently some
> stake at having a clean, independent "ml" component or an independent
> "sofm" module.  So I could do (4).  Or help with (3), on the condition that
> *other* people get things moving.
>
> Regards,
> Gilles
>
> >
> > Thanks & Regards
> > --Avijit Basak
> >
> > On Wed, 10 Feb 2021 at 19:49, Gilles Sadowski <gi...@gmail.com>
> wrote:
> >
> > > Le mer. 10 févr. 2021 à 13:19, sebb <se...@gmail.com> a écrit :
> > > >
> > > > Likewise, commons-ml is too cryptic.
> > > >
> > > > Also, the Spark project has a machine-learning library:
> > > >
> > > > https://spark.apache.org/mllib/
> > >
> > > Thanks for the pointer.
> > >
> > > >
> > > > Maybe that would be better home?
> > >
> > > On the face of it, probably.
> > > [For sure, Avijit should comment on the suggestion.]
> > >
> > > On the other hand, "Commons" is the place where one can pick "bare
> > > bone" implementations, and add the functionality to one's application
> > > without necessarily comply with an overarching framework.
> > > [I don't mean that framework compliance is bad; quite the contrary, it
> is
> > > hopefully the result of a thorough reflection by experts.  But ... cf.
> the
> > > numerous "no-dependency" discussions ...]
> > >
> > > Actually, concerning Avijit's proposed contribution, didn't I say:[1]
> > > ---CUT---
> > > Thus, I think that we must assess whether the "genetic algorithms"
> > > functionality has a reasonable future within "Apache Commons" (i.e.
> > > potential users and contributors) while there exist other libraries
> that
> > > seem much more advanced for any serious usage.
> > > ---CUT---
> > >
> > > > I'm also a bit concerned as to whether there are sufficient
> developers
> > > > here with knowledge of the ML domain to be able to support the code
> in
> > > > the future.
> > >
> > > An interesting point; by all means not a new one (see e.g. [2]).
> > >
> > > Isn't it the same point I've been making about "Commons Math" (CM)?
> > > There has been no releases because nobody here is able (or is willing
> > > to) support it.
> > >
> > > Concerning the support of the purported "machinelearning" component:
> > > 1. Package
> > >         org.apache.commons.math4.ml.neuralnet
> > >     * I've written it entirely and I have applications that depend on
> it
> > > (and I
> > >       cannot assume that I could easily switch to, or port it to,
> Spark),
> > > so I
> > >       can reasonably ensure that it would be supported.
> > > 2. Package
> > >         org.apache.commons.math4.ml.clustering
> > >     * Functionality is mentioned in Spark's "mllib" user guide.
> > >     * When a new feature was last contributed[3], it was
> noticed[4][5][6]
> > >       that improvement were needed (but there was no follow-up).
> > >     * I've an application that depend on it (from CM v3.6.1) but I
> wouldn't
> > >       support it if shipped in CM v4.0.
> > > 3. Package
> > >         org.apache.commons.math4.genetics
> > >     * Part of my "end-of-study" project consisted in a GA
> implementation.
> > >       I've never used the CM implementation, and I don't deny that
> there
> > >       could be perfectly fine uses of it but, just looking at the
> code, it
> > > seems
> > >       obvious that it cannot compete feature-wise with other libraries
> > > out there.
> > >     * I've suggested long ago that, without anyone supporting it
> actively
> > > (and
> > >       no known user community), it should be dropped from CM.
> > >     * Avijit expressed a willingness to improve the functionality:  Is
> > > this enough
> > >       for the PMC to create a new component?  From the experience with
> the
> > >       "clustering" package mentioned above, I'd tend to think
> > > (unfortunately)
> > >       that it isn't.  He should first explore whether the Spark
> community
> > > is
> > >       interested, that the GA functionality be moved over there.
> > >
> > > Gilles
> > >
> > > [1] https://issues.apache.org/jira/browse/MATH-1563
> > > [2] https://markmail.org/message/26yxj5vhysdsoety
> > > [3] https://issues.apache.org/jira/projects/MATH/issues/MATH-1509
> > > [4] https://issues.apache.org/jira/projects/MATH/issues/MATH-1524
> > > [5] https://issues.apache.org/jira/projects/MATH/issues/MATH-1528
> > > [6] https://issues.apache.org/jira/projects/MATH/issues/MATH-1526
> > >
> > > >
> > > > On Wed, 10 Feb 2021 at 08:27, Emmanuel Bourg <eb...@apache.org>
> wrote:
> > > > >
> > > > > -1 for commons-ml for the same reasons.
> > > > >
> > > > > What about commons-machine-learning or commons-math-learning? The
> > > latter
> > > > > is as long as commons-configuration.
> > > > >
> > > > > Emmanuel Bourg
> > > > >
> > > > >
> > > > > Le 2021-02-10 03:27, Ralph Goers a écrit :
> > > > > > -1 on commons-ml as the name. My first thought is such a repo
> would
> > > > > > hold stuff related to mailing lists. Then again maybe it contains
> > > > > > stuff relating to markup languages. Maybe it is Apache’s version
> of
> > > > > > the ML Programming Language [1].
> > > > > >
> > > > > > However, I wouldn’t be -1 on commons-math-ml, although at best I
> > > would
> > > > > > be +0 since it is still not obvious what it would contain.
> > > > > >
> > > > > > Ralph
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > > For additional commands, e-mail: dev-help@commons.apache.org
> > >
> > >
> >
> > --
> > Avijit Basak
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

-- 
Avijit Basak

Re: [Vote] Create a "machine learning" component

Posted by Gilles Sadowski <gi...@gmail.com>.

Le mar. 20 avr. 2021 à 16:09, Avijit Basak <av...@gmail.com> a écrit :
>
> Hi
>
>           > Did you ask "Spark" people about their opinion about it?
>             -- Not yet. I am not sure what would be the right option for
> this communication. It will be good if you can approach them.

You are the one who proposes a functionality that might be of interest
to the "Spark" project, perhaps on some condition on their part which
*you* are going to have to accept (or not).

In other words: It would be useless that *I* go and tell them there exist
some code in Commons Math which they could take an adapt for their
project (they can always do that).
What might be of value to them (as to the Commons project, too), is a
contributor willing to do the necessary work to create or improve a
community-supported feature.

>           > where it can be used in real-life (performance-wise)
> applications, then you should demonstrate it
>             -- Do we have any kind of performance benchmark or use case
> regarding this?

Please assume that *you* are the person with the most GA expertise
in this forum.
There certainly are unit tests for the GA functionality, but I don't think
there are benchmarks; certainly, one task would be to set up a module
for (JMH-based) experimentation.

> Once that is decided,

One mantra of ASF communities is that "those who do the work get
to decide".
[The PMC can decide (by vote) whether to accept a new component;
but it's up to you to show that it's worth it (with the risk that the PMC
won't accurately judge the contribution, unfortunately)...]

> then I can proceed with this.

There is already a long list of things that can be done.

You don't *have* to contact "Spark" if you don't feel that it's the
right project for your work.  You could just hope for the best, and
start somewhere else (modularization of Commons Math, a fork
on GitHub of of CM ML-related codes, and so on).

The one thing which I won't be helping with is merging ad-hoc
GA-related changes into the current CM codebase.
This doesn't preclude that other committers might want to do that
for you; however judging by the last 5 years, I wouldn't count too
much on it. ;-)

Regards,
Gilles

>
>
> Thanks & Regards
> --Avijit Basak
>
> On Mon, 19 Apr 2021 at 18:51, Gilles Sadowski <gi...@gmail.com> wrote:
>
> > Hello.
> >
> > Le lun. 19 avr. 2021 à 08:35, Avijit Basak <av...@gmail.com> a
> > écrit :
> > >
> > > Hi
> > >
> > > >Isn't a GA inherently parallel?
> > > >If so, why not take advantage of the concurrency tools provided by the
> > JDK?
> > >   -- Are we planning to implement multi-threading for GA operations even
> > as
> > > part of a single population
> >
> > This seems an obvious improvement to our current implementation
> > (in case a chromosome's evaluation is not population-dependent).
> >
> > > or only for multi-population parallel GA.
> > >   -- We can implement different types of co-evolution as part of parallel
> > > GA. Need to decide on the corresponding strategies we are going to
> > > incorporate.
> >
> > The discussion is still about the "administrative" question of whether
> > any of this should be implemented in the "Commons" project...
> >
> > Did you ask "Spark" people about their opinion about it?
> >
> > As I said, if you are confident that you can bring our implementation to
> > a state where it can be used in real-life (performance-wise) applications,
> > then you should demonstrate it (in order to convince other people from
> > the Commons PMC that it is worth engaging in long-term maintenance).
> > AFAICT, a way to do it would be to create a GitHub project (aimed at
> > becoming a new "machine learning" component, or a maven/JPMS
> > module within Commons Math).
> >
> > Best regards,
> > Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Vote] Create a "machine learning" component

Posted by Avijit Basak <av...@gmail.com>.

Hi

          > Did you ask "Spark" people about their opinion about it?
            -- Not yet. I am not sure what would be the right option for
this communication. It will be good if you can approach them.
          > where it can be used in real-life (performance-wise)
applications, then you should demonstrate it
            -- Do we have any kind of performance benchmark or use case
regarding this? Once that is decided, then I can proceed with this.


Thanks & Regards
--Avijit Basak

On Mon, 19 Apr 2021 at 18:51, Gilles Sadowski <gi...@gmail.com> wrote:

> Hello.
>
> Le lun. 19 avr. 2021 à 08:35, Avijit Basak <av...@gmail.com> a
> écrit :
> >
> > Hi
> >
> > >Isn't a GA inherently parallel?
> > >If so, why not take advantage of the concurrency tools provided by the
> JDK?
> >   -- Are we planning to implement multi-threading for GA operations even
> as
> > part of a single population
>
> This seems an obvious improvement to our current implementation
> (in case a chromosome's evaluation is not population-dependent).
>
> > or only for multi-population parallel GA.
> >   -- We can implement different types of co-evolution as part of parallel
> > GA. Need to decide on the corresponding strategies we are going to
> > incorporate.
>
> The discussion is still about the "administrative" question of whether
> any of this should be implemented in the "Commons" project...
>
> Did you ask "Spark" people about their opinion about it?
>
> As I said, if you are confident that you can bring our implementation to
> a state where it can be used in real-life (performance-wise) applications,
> then you should demonstrate it (in order to convince other people from
> the Commons PMC that it is worth engaging in long-term maintenance).
> AFAICT, a way to do it would be to create a GitHub project (aimed at
> becoming a new "machine learning" component, or a maven/JPMS
> module within Commons Math).
>
> Best regards,
> Gilles
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

-- 
Avijit Basak

Re: [Vote] Create a "machine learning" component

Posted by Gilles Sadowski <gi...@gmail.com>.

Hello.

Le lun. 19 avr. 2021 à 08:35, Avijit Basak <av...@gmail.com> a écrit :
>
> Hi
>
> >Isn't a GA inherently parallel?
> >If so, why not take advantage of the concurrency tools provided by the JDK?
>   -- Are we planning to implement multi-threading for GA operations even as
> part of a single population

This seems an obvious improvement to our current implementation
(in case a chromosome's evaluation is not population-dependent).

> or only for multi-population parallel GA.
>   -- We can implement different types of co-evolution as part of parallel
> GA. Need to decide on the corresponding strategies we are going to
> incorporate.

The discussion is still about the "administrative" question of whether
any of this should be implemented in the "Commons" project...

Did you ask "Spark" people about their opinion about it?

As I said, if you are confident that you can bring our implementation to
a state where it can be used in real-life (performance-wise) applications,
then you should demonstrate it (in order to convince other people from
the Commons PMC that it is worth engaging in long-term maintenance).
AFAICT, a way to do it would be to create a GitHub project (aimed at
becoming a new "machine learning" component, or a maven/JPMS
module within Commons Math).

Best regards,
Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Vote] Create a "machine learning" component

Posted by Avijit Basak <av...@gmail.com>.

Hi

>Isn't a GA inherently parallel?
>If so, why not take advantage of the concurrency tools provided by the JDK?
  -- Are we planning to implement multi-threading for GA operations even as
part of a single population or only for multi-population parallel GA.
  -- We can implement different types of co-evolution as part of parallel
GA. Need to decide on the corresponding strategies we are going to
incorporate.

Thanks & Regards
--Avijit Basak

On Wed, 14 Apr 2021 at 05:53, Gilles Sadowski <gi...@gmail.com> wrote:

> Le mar. 13 avr. 2021 à 18:21, Avijit Basak <av...@gmail.com> a
> écrit :
> >
> > Hi
> >
> >           Please find my comments below.
> >
> > >> I don't follow the distinction "prod" vs "non-prod".
> >      -- Actually in Prod we really need a very high performing system. So
> > use of implicit parallelism in spark would help us to achieve it. But for
> > other types of work like POC or R&D we may not need such performance.
>
> Isn't a GA inherently parallel?
> If so, why not take advantage of the concurrency tools provided by the JDK?
>
> > >> the question was actually whether you are willing to modularize CM
> >      -- I am not much aware of other ml components in commons. I would
> look
> > into it.
>
> I've mentioned them in earlier messages:
>  * Self-organizing feature map (artificial neural net)
>  * Clustering
>
> The former is multi-threaded; the latter should be refactored to
> take advantage of multi-threading.
>
> > >>You did not expand about the usability/performance (e.g. the issue of
> > multi-threading)
> >      -- Are we planning to incorporate parallel GA.
>
> Aren't you?
>
> > Then multi-threading
> > would be a more appropriate option.
>
> IMHO, a necessary one.
>
> > >> So, as a way forward, I would suggest that you create a project on
> > GitHub (copying all the settings from a *Commons modular* component,
> such as
> > "Commons Numbers")
> >      -- Could you kindly share the GitHub repository URL for any Commons
> > modular component.
>
> https://github.com/apache/commons-rng
> https://github.com/apache/commons-numbers
> https://github.com/apache/commons-geometry
> https://github.com/apache/commons-statistics
>
> >
> > Thanks & Regards
> > --Avijit Basak
> >
> >
> > On Tue, 13 Apr 2021 at 18:29, Gilles Sadowski <gi...@gmail.com>
> wrote:
> >
> > > Hello.
> > >
> > > Le lun. 12 avr. 2021 à 17:21, Avijit Basak <av...@gmail.com> a
> > > écrit :
> > > >
> > > > Hi
> > > >
> > > >          Sorry for the delayed response. Thanks for your patience.
> Please
> > > > find my comments below:
> > > >
> > > >  (1) Why not Spark?  [At least post over there (?).]
> > > >       --We can move to Spark. But it will be very much useful if the
> > > things
> > > > can also run without Spark. The use of Spark would make more sense
> in a
> > > > production environment. But the portability of the library will be
> more
> > > > useful for the non-prod environment.
> > >
> > > I don't follow the distinction "prod" vs "non-prod".
> > >
> > > > Definitely, we can reach the Spark
> > > > team and query.
> > >
> > > That would be a good idea...
> > >
> > > >  (2) Further develop a monolithic CM?  [Who will do it?]
> > > >        --I can help with the upgrade of the existing library related
> to
> > > GA
> > > > functionality.
> > >
> > > Sure, but nobody is currently working on (2).
> > >
> > > >  (3) Modularize CM? [Who will do it?]
> > > >        --I can help with the upgrade of the existing library related
> to
> > > GA
> > > > functionality.
> > >
> > > I don't doubt it; but the question was actually whether you are willing
> > > to modularize CM (that is: in addition to, and before, contributing to
> > > the GA functionality).
> > >
> > > >  (4) New component (with another name) with the proposed contents?
> > > >        --This is the best option if permitted.
> > >
> > > Currently, only the two of us are in favour of this alternative.
> > >
> > > Nobody, by their action, is really in favour of any of the other
> > > alternatives.
> > > So, as a way forward, I would suggest that you create a project on
> GitHub
> > > (copying all the settings from a Commons modular component, such as
> > > "Commons Numbers"), to be eventually integrated here, once its
> potential
> > > has been demonstrated.
> > >
> > > >       The code which I have written can be reused with minor
> > > modifications.
> > > > So it won't take too much effort for this activity.
> > >
> > > You did not expand about the usability/performance (e.g. the issue of
> > > multi-threading)...
> > >
> > > Regards,
> > > Gilles
> > >
> > > >> [...]
> > >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

-- 
Avijit Basak

Re: [Vote] Create a "machine learning" component

Posted by Gilles Sadowski <gi...@gmail.com>.

Le mar. 13 avr. 2021 à 18:21, Avijit Basak <av...@gmail.com> a écrit :
>
> Hi
>
>           Please find my comments below.
>
> >> I don't follow the distinction "prod" vs "non-prod".
>      -- Actually in Prod we really need a very high performing system. So
> use of implicit parallelism in spark would help us to achieve it. But for
> other types of work like POC or R&D we may not need such performance.

Isn't a GA inherently parallel?
If so, why not take advantage of the concurrency tools provided by the JDK?

> >> the question was actually whether you are willing to modularize CM
>      -- I am not much aware of other ml components in commons. I would look
> into it.

I've mentioned them in earlier messages:
 * Self-organizing feature map (artificial neural net)
 * Clustering

The former is multi-threaded; the latter should be refactored to
take advantage of multi-threading.

> >>You did not expand about the usability/performance (e.g. the issue of
> multi-threading)
>      -- Are we planning to incorporate parallel GA.

Aren't you?

> Then multi-threading
> would be a more appropriate option.

IMHO, a necessary one.

> >> So, as a way forward, I would suggest that you create a project on
> GitHub (copying all the settings from a *Commons modular* component, such as
> "Commons Numbers")
>      -- Could you kindly share the GitHub repository URL for any Commons
> modular component.

https://github.com/apache/commons-rng
https://github.com/apache/commons-numbers
https://github.com/apache/commons-geometry
https://github.com/apache/commons-statistics

>
> Thanks & Regards
> --Avijit Basak
>
>
> On Tue, 13 Apr 2021 at 18:29, Gilles Sadowski <gi...@gmail.com> wrote:
>
> > Hello.
> >
> > Le lun. 12 avr. 2021 à 17:21, Avijit Basak <av...@gmail.com> a
> > écrit :
> > >
> > > Hi
> > >
> > >          Sorry for the delayed response. Thanks for your patience. Please
> > > find my comments below:
> > >
> > >  (1) Why not Spark?  [At least post over there (?).]
> > >       --We can move to Spark. But it will be very much useful if the
> > things
> > > can also run without Spark. The use of Spark would make more sense in a
> > > production environment. But the portability of the library will be more
> > > useful for the non-prod environment.
> >
> > I don't follow the distinction "prod" vs "non-prod".
> >
> > > Definitely, we can reach the Spark
> > > team and query.
> >
> > That would be a good idea...
> >
> > >  (2) Further develop a monolithic CM?  [Who will do it?]
> > >        --I can help with the upgrade of the existing library related to
> > GA
> > > functionality.
> >
> > Sure, but nobody is currently working on (2).
> >
> > >  (3) Modularize CM? [Who will do it?]
> > >        --I can help with the upgrade of the existing library related to
> > GA
> > > functionality.
> >
> > I don't doubt it; but the question was actually whether you are willing
> > to modularize CM (that is: in addition to, and before, contributing to
> > the GA functionality).
> >
> > >  (4) New component (with another name) with the proposed contents?
> > >        --This is the best option if permitted.
> >
> > Currently, only the two of us are in favour of this alternative.
> >
> > Nobody, by their action, is really in favour of any of the other
> > alternatives.
> > So, as a way forward, I would suggest that you create a project on GitHub
> > (copying all the settings from a Commons modular component, such as
> > "Commons Numbers"), to be eventually integrated here, once its potential
> > has been demonstrated.
> >
> > >       The code which I have written can be reused with minor
> > modifications.
> > > So it won't take too much effort for this activity.
> >
> > You did not expand about the usability/performance (e.g. the issue of
> > multi-threading)...
> >
> > Regards,
> > Gilles
> >
> > >> [...]
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Vote] Create a "machine learning" component

Posted by Avijit Basak <av...@gmail.com>.

Hi

          Please find my comments below.

>> I don't follow the distinction "prod" vs "non-prod".
     -- Actually in Prod we really need a very high performing system. So
use of implicit parallelism in spark would help us to achieve it. But for
other types of work like POC or R&D we may not need such performance.
>> the question was actually whether you are willing to modularize CM
     -- I am not much aware of other ml components in commons. I would look
into it.
>>You did not expand about the usability/performance (e.g. the issue of
multi-threading)
     -- Are we planning to incorporate parallel GA. Then multi-threading
would be a more appropriate option.
>> So, as a way forward, I would suggest that you create a project on
GitHub (copying all the settings from a *Commons modular* component, such as
"Commons Numbers")
     -- Could you kindly share the GitHub repository URL for any Commons
modular component.

Thanks & Regards
--Avijit Basak


On Tue, 13 Apr 2021 at 18:29, Gilles Sadowski <gi...@gmail.com> wrote:

> Hello.
>
> Le lun. 12 avr. 2021 à 17:21, Avijit Basak <av...@gmail.com> a
> écrit :
> >
> > Hi
> >
> >          Sorry for the delayed response. Thanks for your patience. Please
> > find my comments below:
> >
> >  (1) Why not Spark?  [At least post over there (?).]
> >       --We can move to Spark. But it will be very much useful if the
> things
> > can also run without Spark. The use of Spark would make more sense in a
> > production environment. But the portability of the library will be more
> > useful for the non-prod environment.
>
> I don't follow the distinction "prod" vs "non-prod".
>
> > Definitely, we can reach the Spark
> > team and query.
>
> That would be a good idea...
>
> >  (2) Further develop a monolithic CM?  [Who will do it?]
> >        --I can help with the upgrade of the existing library related to
> GA
> > functionality.
>
> Sure, but nobody is currently working on (2).
>
> >  (3) Modularize CM? [Who will do it?]
> >        --I can help with the upgrade of the existing library related to
> GA
> > functionality.
>
> I don't doubt it; but the question was actually whether you are willing
> to modularize CM (that is: in addition to, and before, contributing to
> the GA functionality).
>
> >  (4) New component (with another name) with the proposed contents?
> >        --This is the best option if permitted.
>
> Currently, only the two of us are in favour of this alternative.
>
> Nobody, by their action, is really in favour of any of the other
> alternatives.
> So, as a way forward, I would suggest that you create a project on GitHub
> (copying all the settings from a Commons modular component, such as
> "Commons Numbers"), to be eventually integrated here, once its potential
> has been demonstrated.
>
> >       The code which I have written can be reused with minor
> modifications.
> > So it won't take too much effort for this activity.
>
> You did not expand about the usability/performance (e.g. the issue of
> multi-threading)...
>
> Regards,
> Gilles
>
> >> [...]
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

-- 
Avijit Basak

Re: [Vote] Create a "machine learning" component

Posted by Gilles Sadowski <gi...@gmail.com>.

Hello.

Le lun. 12 avr. 2021 à 17:21, Avijit Basak <av...@gmail.com> a écrit :
>
> Hi
>
>          Sorry for the delayed response. Thanks for your patience. Please
> find my comments below:
>
>  (1) Why not Spark?  [At least post over there (?).]
>       --We can move to Spark. But it will be very much useful if the things
> can also run without Spark. The use of Spark would make more sense in a
> production environment. But the portability of the library will be more
> useful for the non-prod environment.

I don't follow the distinction "prod" vs "non-prod".

> Definitely, we can reach the Spark
> team and query.

That would be a good idea...

>  (2) Further develop a monolithic CM?  [Who will do it?]
>        --I can help with the upgrade of the existing library related to GA
> functionality.

Sure, but nobody is currently working on (2).

>  (3) Modularize CM? [Who will do it?]
>        --I can help with the upgrade of the existing library related to GA
> functionality.

I don't doubt it; but the question was actually whether you are willing
to modularize CM (that is: in addition to, and before, contributing to
the GA functionality).

>  (4) New component (with another name) with the proposed contents?
>        --This is the best option if permitted.

Currently, only the two of us are in favour of this alternative.

Nobody, by their action, is really in favour of any of the other alternatives.
So, as a way forward, I would suggest that you create a project on GitHub
(copying all the settings from a Commons modular component, such as
"Commons Numbers"), to be eventually integrated here, once its potential
has been demonstrated.

>       The code which I have written can be reused with minor modifications.
> So it won't take too much effort for this activity.

You did not expand about the usability/performance (e.g. the issue of
multi-threading)...

Regards,
Gilles

>> [...]

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org