You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Sreenivas Raghavan <sr...@gmail.com> on 2015/06/13 10:17:15 UTC

Contribution

Hello everyone,
                  I am interested in contributing to mahout project. I am
interested in algorithms, machine learning and linear algebra. Please give
me some idea as where to start and how to start. I know python and some
parts of Java, so please tell me is this knowledge of languages enough for
writing and optimizing codes
-- 

*With Regards,*
*K.S.Sreenivasa Raghavan*

Re: Contribution

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
Great, file an issue please.
On Jun 17, 2015 7:07 PM, "Sreenivas Raghavan" <sr...@gmail.com>
wrote:

> I am interested in trying Cholesky issue
>
> On Wed, Jun 17, 2015 at 11:18 PM, Dmitriy Lyubimov <dl...@gmail.com>
> wrote:
>
> > Thanks for doing this. this is greatly appreciated.
> >
> > What about that Cholesky issue? any takers?
> >
> > On Wed, Jun 17, 2015 at 12:34 AM, Rohit Shinde <
> > rohit.shinde12194@gmail.com>
> > wrote:
> >
> > > Okay, so I'll get started with fixing the mahout spark shell. I'll ask
> > > issues on the mailing list as and when I encounter them. I'll go slowly
> > > though. I have GSoC going on and I will not be able to dedicate much
> time
> > > for the next two months.
> > >
> > > On Wed, Jun 17, 2015 at 2:21 AM, Dmitriy Lyubimov <dl...@gmail.com>
> > > wrote:
> > >
> > > > Guys, please file a Jira issue for Cholesky. this needs a bit of
> > > > investigation. I don't really know who wants to pick it.
> > > >
> > > > Mathematical problems -- i assume basic ones -- we need MVN and
> Wishart
> > > > multivariate distribution implementations which do not depend on
> > > > apache-math or any other 3rd party, as well as Gaussian process. I am
> > > > willing to outsource those to a first taker :-)
> > > >
> > > > for non-basic ones, as i mentioned, please scan the world :-) Topical
> > > stuff
> > > > would be nice to port back, like LDA CVB0 (although i think i read a
> > > paper
> > > > that basically goes back to gibbs sampling technique and now it is
> > > somehow
> > > > more fashionable way than variational bayes again for some reason:)
> > > >
> > > >
> > > > On Tue, Jun 16, 2015 at 1:34 PM, Nikolis Galerakis <
> > nikolisgal@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > Hello
> > > > >
> > > > > I am really interested on Cholesky Decomposition is there any
> process
> > > > that
> > > > > I should follow to get assigned
> > > > > this task or I should  just dive into it ?
> > > > >
> > > > > Nikos
> > > > >
> > > > >
> > > > > 2015-06-16 20:48 GMT+02:00 Sreenivas Raghavan <
> > > > > sreenivas.raghavan7@gmail.com
> > > > > >:
> > > > >
> > > > > > Sir,
> > > > > >     I am interested in such kind of mathematical problems. Can
> you
> > > stat
> > > > > few
> > > > > > more?
> > > > > >
> > > > > > On Tue, Jun 16, 2015 at 10:29 PM, Dmitriy Lyubimov <
> > > dlieu.7@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > (1) Yes, making spark shell work with spark 1.3+ on
> 0.11-snapshot
> > > > would
> > > > > > be
> > > > > > > an awesome help.
> > > > > > > (2) I was thinking, if you are still into math problem, we
> have,
> > in
> > > > my
> > > > > > > view, a problem in CholeskyDecomposition.
> > > > > > >
> > > > > > > This needs a little research. This involves methods solveRight,
> > > > > > solveLeft.
> > > > > > > (2a) solveLeft claims to do forward substitution (which it
> does),
> > > and
> > > > > > > solveRight claims to do back substitution, which it probably
> does
> > > > too.
> > > > > > But
> > > > > > > in reality it solves a different problem it is supposed to. In
> > > > classic
> > > > > > > scheme of things, if AX=B is positive (semi)definite, and A=LL'
> > > > > Cholesky
> > > > > > > decomposition, then forward substitution is supposed to solve
> > LY=B
> > > > for
> > > > > Y
> > > > > > > and back substitution is supposed to solve L'X=Y, i.e. back
> > > > > substitution
> > > > > > is
> > > > > > > supposed to compute result of L'^-1Y. But current
> implementation
> > > does
> > > > > > > something that can be shown to be essentially equivalent to
> > > > solveLeft()
> > > > > > > rather than solution for L'X=Y. This needs to be looked at more
> > > > > carefully
> > > > > > >
> > > > > > > (2b) I also believe the whole names ofr solveLeft, solveRight
> are
> > > > > > > misleading. In all other cases, solve() methods traditionally
> > > denote
> > > > > > > solution of AX=B or XA=B for X. In Cholesky, neither of these
> > > methods
> > > > > > > actually provides a solution for AX=B, but rather provides a
> part
> > > of
> > > > > the
> > > > > > > solution. Therefore, i think, these methods should be renamed
> to
> > > > > > something
> > > > > > > like forwardSubs(), backSubs(), or better yet, name exactly
> what
> > > they
> > > > > are
> > > > > > > doing, e.g. computeLtInvZ(mxZ:Matrix). more over, it is
> probably
> > > > > > beneficial
> > > > > > > to have solve methods that actually do compute full solution of
> > > Ax=b
> > > > or
> > > > > > xA
> > > > > > > = b' by combining forward and back substitutions properly.
> > > > > > >
> > > > > > > I hope some of this fits, it takes time to write this.
> > > > > > >
> > > > > > > -Dmitriy
> > > > > > >
> > > > > > > On Tue, Jun 16, 2015 at 4:17 AM, Rohit Shinde <
> > > > > > rohit.shinde12194@gmail.com
> > > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Okay, it seems that methodology is a bit too advanced for
> me. I
> > > > would
> > > > > > go
> > > > > > > > with framework/engineering tasks. So should I start with
> fixing
> > > the
> > > > > > > mahout
> > > > > > > > spark shell?
> > > > > > > >
> > > > > > > > On Tue, Jun 16, 2015 at 11:20 AM, Dmitriy Lyubimov <
> > > > > dlieu.7@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > As i said, in methodology you can pick _anything_ that you
> > > think
> > > > > has
> > > > > > > > merit
> > > > > > > > > and not yet in the roadmap or done.
> > > > > > > > >
> > > > > > > > > For example, do you feel like you might research PSVM or
> > > interior
> > > > > > point
> > > > > > > > > SVM? Actually, any flavor of non-linear SVM that is
> different
> > > > from
> > > > > a
> > > > > > > > simple
> > > > > > > > > hinge loss?
> > > > > > > > > Do you think you can fit it in our algebraic engine?
> > > > > > > > >
> > > > > > > > > I think we also need a fair amount of port of MR methods --
> > > like
> > > > > > > > seq2sparse
> > > > > > > > > and cvb0 lda.
> > > > > > > > >
> > > > > > > > > i would still look at framework performance tasks, they are
> > > badly
> > > > > > > needed.
> > > > > > > > > Just today listened about flyby matrix multiplication
> > approach
> > > > for
> > > > > > > spark
> > > > > > > > > for medium-sized matrices which probably beats our since
> even
> > > > > though
> > > > > > we
> > > > > > > > do
> > > > > > > > > not use cartesian (god forbid), our implementation is
> > somewhat
> > > > > closer
> > > > > > > to
> > > > > > > > > what the speaker described as "massively mapside join" --
> > which
> > > > > > > > eventually,
> > > > > > > > > according to him, is supposed to gain over flyby multiply,
> > but
> > > > > > there's
> > > > > > > a
> > > > > > > > > fair amount of tasks when it is not .
> > > > > > > > >
> > > > > > > > > similarly bolting on hardware libraries for in-core
> > operations
> > > is
> > > > > > > still a
> > > > > > > > > big undecided issue.
> > > > > > > > >
> > > > > > > > > unfortunately a lot of known outstanding issues are still
> > about
> > > > > > > > > engineering.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Mon, Jun 15, 2015 at 10:27 PM, Rohit Shinde <
> > > > > > > > > rohit.shinde12194@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > I would prefer some methodology work if it falls within
> my
> > > > > > > > capabilities.
> > > > > > > > > If
> > > > > > > > > > it doesn't then your suggestion is a good one and I'll
> take
> > > it
> > > > > up.
> > > > > > > > > > Substantial according to me means a task where I can get
> > > quite
> > > > > > > familiar
> > > > > > > > > > with as much of the code base as possible.
> > > > > > > > > >
> > > > > > > > > > On Tue, Jun 16, 2015 at 10:49 AM, Dmitriy Lyubimov <
> > > > > > > dlieu.7@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > I gave you 3 types of problems. Define substantial.
> > > > > > > > > > >
> > > > > > > > > > > Say, does fixing mahout spark shell sound substantial
> > > enough?
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Jun 15, 2015 at 10:11 PM, Rohit Shinde <
> > > > > > > > > > > rohit.shinde12194@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > So do you have any suggestions for getting started? I
> > > would
> > > > > > like
> > > > > > > to
> > > > > > > > > > > > contribute to something substantial that is going on,
> > > after
> > > > > > > getting
> > > > > > > > > > > > familiar with the required part of the codebase.
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Jun 15, 2015 at 11:39 PM, Dmitriy Lyubimov <
> > > > > > > > > dlieu.7@gmail.com>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > i don't think there's a formal list published
> > anywhere.
> > > > > > > > > > > > >
> > > > > > > > > > > > > There is an informal roadmap.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The contributions are, the way i see it, mainly can
> > be
> > > > in 3
> > > > > > > > areas:
> > > > > > > > > > (1)
> > > > > > > > > > > > > project support issues like for example fixing
> shell
> > > > > > > > compatibility
> > > > > > > > > > with
> > > > > > > > > > > > > spark 1.3; (2) framework support problems like for
> > > > example
> > > > > > > > > > performance
> > > > > > > > > > > > and
> > > > > > > > > > > > > integrating 3rd party hardware accelerated linalg
> > > > > libraries;
> > > > > > > (3)
> > > > > > > > > > > > > methodology work.
> > > > > > > > > > > > >
> > > > > > > > > > > > > We have some pending items for (1) and (2) i think
> > but
> > > > for
> > > > > > > > > > methodology
> > > > > > > > > > > > > items (3) we simply can't compile the list of
> > > everything
> > > > > that
> > > > > > > can
> > > > > > > > > > > > possibly
> > > > > > > > > > > > > be done and contriubted. We just don't have that
> much
> > > > > > > expertise,
> > > > > > > > > > > > combined.
> > > > > > > > > > > > > No one has [1]. The way it works is usually people
> > > would
> > > > > come
> > > > > > > up
> > > > > > > > > with
> > > > > > > > > > > > > pieces that they were missing on their own for some
> > > > reason;
> > > > > > and
> > > > > > > > > they
> > > > > > > > > > > need
> > > > > > > > > > > > > to propose methodology, parallelization strategy,
> > maybe
> > > > > even
> > > > > > a
> > > > > > > > code
> > > > > > > > > > > > sketch
> > > > > > > > > > > > > -- that all will be fine.
> > > > > > > > > > > > >
> > > > > > > > > > > > > [1]
> > > > http://matt.might.net/articles/phd-school-in-pictures/
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Sun, Jun 14, 2015 at 11:49 PM, Rohit Shinde <
> > > > > > > > > > > > > rohit.shinde12194@gmail.com>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > But is there a list of projects that new people
> > could
> > > > > take
> > > > > > > up?
> > > > > > > > > > Even I
> > > > > > > > > > > > am
> > > > > > > > > > > > > a
> > > > > > > > > > > > > > student interested in contributing to the machine
> > > > > learning
> > > > > > > and
> > > > > > > > > data
> > > > > > > > > > > > > mining
> > > > > > > > > > > > > > parts of Apache Mahout.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I am familiar with Scala and Java, Python and
> C++.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > What can I contribute to?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Mon, Jun 15, 2015 at 10:24 AM, Dmitriy
> Lyubimov
> > <
> > > > > > > > > > > dlieu.7@gmail.com>
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Well we are predominantly Scala shop now. Being
> > > > fluent
> > > > > in
> > > > > > > > Scala
> > > > > > > > > > > seems
> > > > > > > > > > > > > > like
> > > > > > > > > > > > > > > one prerequisite.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Sat, Jun 13, 2015 at 1:17 AM, Sreenivas
> > > Raghavan <
> > > > > > > > > > > > > > > sreenivas.raghavan7@gmail.com> wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Hello everyone,
> > > > > > > > > > > > > > > >                   I am interested in
> > contributing
> > > > to
> > > > > > > mahout
> > > > > > > > > > > > project.
> > > > > > > > > > > > > I
> > > > > > > > > > > > > > am
> > > > > > > > > > > > > > > > interested in algorithms, machine learning
> and
> > > > linear
> > > > > > > > > algebra.
> > > > > > > > > > > > Please
> > > > > > > > > > > > > > > give
> > > > > > > > > > > > > > > > me some idea as where to start and how to
> > start.
> > > I
> > > > > know
> > > > > > > > > python
> > > > > > > > > > > and
> > > > > > > > > > > > > some
> > > > > > > > > > > > > > > > parts of Java, so please tell me is this
> > > knowledge
> > > > of
> > > > > > > > > languages
> > > > > > > > > > > > > enough
> > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > writing and optimizing codes
> > > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > *With Regards,*
> > > > > > > > > > > > > > > > *K.S.Sreenivasa Raghavan*
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > > > *With Regards,*
> > > > > > *K.S.Sreenivasa Raghavan*
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
>
> --
>
> *With Regards,*
> *K.S.Sreenivasa Raghavan*
>

Re: Contribution

Posted by Sreenivas Raghavan <sr...@gmail.com>.
I am interested in trying Cholesky issue

On Wed, Jun 17, 2015 at 11:18 PM, Dmitriy Lyubimov <dl...@gmail.com>
wrote:

> Thanks for doing this. this is greatly appreciated.
>
> What about that Cholesky issue? any takers?
>
> On Wed, Jun 17, 2015 at 12:34 AM, Rohit Shinde <
> rohit.shinde12194@gmail.com>
> wrote:
>
> > Okay, so I'll get started with fixing the mahout spark shell. I'll ask
> > issues on the mailing list as and when I encounter them. I'll go slowly
> > though. I have GSoC going on and I will not be able to dedicate much time
> > for the next two months.
> >
> > On Wed, Jun 17, 2015 at 2:21 AM, Dmitriy Lyubimov <dl...@gmail.com>
> > wrote:
> >
> > > Guys, please file a Jira issue for Cholesky. this needs a bit of
> > > investigation. I don't really know who wants to pick it.
> > >
> > > Mathematical problems -- i assume basic ones -- we need MVN and Wishart
> > > multivariate distribution implementations which do not depend on
> > > apache-math or any other 3rd party, as well as Gaussian process. I am
> > > willing to outsource those to a first taker :-)
> > >
> > > for non-basic ones, as i mentioned, please scan the world :-) Topical
> > stuff
> > > would be nice to port back, like LDA CVB0 (although i think i read a
> > paper
> > > that basically goes back to gibbs sampling technique and now it is
> > somehow
> > > more fashionable way than variational bayes again for some reason:)
> > >
> > >
> > > On Tue, Jun 16, 2015 at 1:34 PM, Nikolis Galerakis <
> nikolisgal@gmail.com
> > >
> > > wrote:
> > >
> > > > Hello
> > > >
> > > > I am really interested on Cholesky Decomposition is there any process
> > > that
> > > > I should follow to get assigned
> > > > this task or I should  just dive into it ?
> > > >
> > > > Nikos
> > > >
> > > >
> > > > 2015-06-16 20:48 GMT+02:00 Sreenivas Raghavan <
> > > > sreenivas.raghavan7@gmail.com
> > > > >:
> > > >
> > > > > Sir,
> > > > >     I am interested in such kind of mathematical problems. Can you
> > stat
> > > > few
> > > > > more?
> > > > >
> > > > > On Tue, Jun 16, 2015 at 10:29 PM, Dmitriy Lyubimov <
> > dlieu.7@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > (1) Yes, making spark shell work with spark 1.3+ on 0.11-snapshot
> > > would
> > > > > be
> > > > > > an awesome help.
> > > > > > (2) I was thinking, if you are still into math problem, we have,
> in
> > > my
> > > > > > view, a problem in CholeskyDecomposition.
> > > > > >
> > > > > > This needs a little research. This involves methods solveRight,
> > > > > solveLeft.
> > > > > > (2a) solveLeft claims to do forward substitution (which it does),
> > and
> > > > > > solveRight claims to do back substitution, which it probably does
> > > too.
> > > > > But
> > > > > > in reality it solves a different problem it is supposed to. In
> > > classic
> > > > > > scheme of things, if AX=B is positive (semi)definite, and A=LL'
> > > > Cholesky
> > > > > > decomposition, then forward substitution is supposed to solve
> LY=B
> > > for
> > > > Y
> > > > > > and back substitution is supposed to solve L'X=Y, i.e. back
> > > > substitution
> > > > > is
> > > > > > supposed to compute result of L'^-1Y. But current implementation
> > does
> > > > > > something that can be shown to be essentially equivalent to
> > > solveLeft()
> > > > > > rather than solution for L'X=Y. This needs to be looked at more
> > > > carefully
> > > > > >
> > > > > > (2b) I also believe the whole names ofr solveLeft, solveRight are
> > > > > > misleading. In all other cases, solve() methods traditionally
> > denote
> > > > > > solution of AX=B or XA=B for X. In Cholesky, neither of these
> > methods
> > > > > > actually provides a solution for AX=B, but rather provides a part
> > of
> > > > the
> > > > > > solution. Therefore, i think, these methods should be renamed to
> > > > > something
> > > > > > like forwardSubs(), backSubs(), or better yet, name exactly what
> > they
> > > > are
> > > > > > doing, e.g. computeLtInvZ(mxZ:Matrix). more over, it is probably
> > > > > beneficial
> > > > > > to have solve methods that actually do compute full solution of
> > Ax=b
> > > or
> > > > > xA
> > > > > > = b' by combining forward and back substitutions properly.
> > > > > >
> > > > > > I hope some of this fits, it takes time to write this.
> > > > > >
> > > > > > -Dmitriy
> > > > > >
> > > > > > On Tue, Jun 16, 2015 at 4:17 AM, Rohit Shinde <
> > > > > rohit.shinde12194@gmail.com
> > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Okay, it seems that methodology is a bit too advanced for me. I
> > > would
> > > > > go
> > > > > > > with framework/engineering tasks. So should I start with fixing
> > the
> > > > > > mahout
> > > > > > > spark shell?
> > > > > > >
> > > > > > > On Tue, Jun 16, 2015 at 11:20 AM, Dmitriy Lyubimov <
> > > > dlieu.7@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > As i said, in methodology you can pick _anything_ that you
> > think
> > > > has
> > > > > > > merit
> > > > > > > > and not yet in the roadmap or done.
> > > > > > > >
> > > > > > > > For example, do you feel like you might research PSVM or
> > interior
> > > > > point
> > > > > > > > SVM? Actually, any flavor of non-linear SVM that is different
> > > from
> > > > a
> > > > > > > simple
> > > > > > > > hinge loss?
> > > > > > > > Do you think you can fit it in our algebraic engine?
> > > > > > > >
> > > > > > > > I think we also need a fair amount of port of MR methods --
> > like
> > > > > > > seq2sparse
> > > > > > > > and cvb0 lda.
> > > > > > > >
> > > > > > > > i would still look at framework performance tasks, they are
> > badly
> > > > > > needed.
> > > > > > > > Just today listened about flyby matrix multiplication
> approach
> > > for
> > > > > > spark
> > > > > > > > for medium-sized matrices which probably beats our since even
> > > > though
> > > > > we
> > > > > > > do
> > > > > > > > not use cartesian (god forbid), our implementation is
> somewhat
> > > > closer
> > > > > > to
> > > > > > > > what the speaker described as "massively mapside join" --
> which
> > > > > > > eventually,
> > > > > > > > according to him, is supposed to gain over flyby multiply,
> but
> > > > > there's
> > > > > > a
> > > > > > > > fair amount of tasks when it is not .
> > > > > > > >
> > > > > > > > similarly bolting on hardware libraries for in-core
> operations
> > is
> > > > > > still a
> > > > > > > > big undecided issue.
> > > > > > > >
> > > > > > > > unfortunately a lot of known outstanding issues are still
> about
> > > > > > > > engineering.
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, Jun 15, 2015 at 10:27 PM, Rohit Shinde <
> > > > > > > > rohit.shinde12194@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > I would prefer some methodology work if it falls within my
> > > > > > > capabilities.
> > > > > > > > If
> > > > > > > > > it doesn't then your suggestion is a good one and I'll take
> > it
> > > > up.
> > > > > > > > > Substantial according to me means a task where I can get
> > quite
> > > > > > familiar
> > > > > > > > > with as much of the code base as possible.
> > > > > > > > >
> > > > > > > > > On Tue, Jun 16, 2015 at 10:49 AM, Dmitriy Lyubimov <
> > > > > > dlieu.7@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > I gave you 3 types of problems. Define substantial.
> > > > > > > > > >
> > > > > > > > > > Say, does fixing mahout spark shell sound substantial
> > enough?
> > > > > > > > > >
> > > > > > > > > > On Mon, Jun 15, 2015 at 10:11 PM, Rohit Shinde <
> > > > > > > > > > rohit.shinde12194@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > So do you have any suggestions for getting started? I
> > would
> > > > > like
> > > > > > to
> > > > > > > > > > > contribute to something substantial that is going on,
> > after
> > > > > > getting
> > > > > > > > > > > familiar with the required part of the codebase.
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Jun 15, 2015 at 11:39 PM, Dmitriy Lyubimov <
> > > > > > > > dlieu.7@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > i don't think there's a formal list published
> anywhere.
> > > > > > > > > > > >
> > > > > > > > > > > > There is an informal roadmap.
> > > > > > > > > > > >
> > > > > > > > > > > > The contributions are, the way i see it, mainly can
> be
> > > in 3
> > > > > > > areas:
> > > > > > > > > (1)
> > > > > > > > > > > > project support issues like for example fixing shell
> > > > > > > compatibility
> > > > > > > > > with
> > > > > > > > > > > > spark 1.3; (2) framework support problems like for
> > > example
> > > > > > > > > performance
> > > > > > > > > > > and
> > > > > > > > > > > > integrating 3rd party hardware accelerated linalg
> > > > libraries;
> > > > > > (3)
> > > > > > > > > > > > methodology work.
> > > > > > > > > > > >
> > > > > > > > > > > > We have some pending items for (1) and (2) i think
> but
> > > for
> > > > > > > > > methodology
> > > > > > > > > > > > items (3) we simply can't compile the list of
> > everything
> > > > that
> > > > > > can
> > > > > > > > > > > possibly
> > > > > > > > > > > > be done and contriubted. We just don't have that much
> > > > > > expertise,
> > > > > > > > > > > combined.
> > > > > > > > > > > > No one has [1]. The way it works is usually people
> > would
> > > > come
> > > > > > up
> > > > > > > > with
> > > > > > > > > > > > pieces that they were missing on their own for some
> > > reason;
> > > > > and
> > > > > > > > they
> > > > > > > > > > need
> > > > > > > > > > > > to propose methodology, parallelization strategy,
> maybe
> > > > even
> > > > > a
> > > > > > > code
> > > > > > > > > > > sketch
> > > > > > > > > > > > -- that all will be fine.
> > > > > > > > > > > >
> > > > > > > > > > > > [1]
> > > http://matt.might.net/articles/phd-school-in-pictures/
> > > > > > > > > > > >
> > > > > > > > > > > > On Sun, Jun 14, 2015 at 11:49 PM, Rohit Shinde <
> > > > > > > > > > > > rohit.shinde12194@gmail.com>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > But is there a list of projects that new people
> could
> > > > take
> > > > > > up?
> > > > > > > > > Even I
> > > > > > > > > > > am
> > > > > > > > > > > > a
> > > > > > > > > > > > > student interested in contributing to the machine
> > > > learning
> > > > > > and
> > > > > > > > data
> > > > > > > > > > > > mining
> > > > > > > > > > > > > parts of Apache Mahout.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I am familiar with Scala and Java, Python and C++.
> > > > > > > > > > > > >
> > > > > > > > > > > > > What can I contribute to?
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, Jun 15, 2015 at 10:24 AM, Dmitriy Lyubimov
> <
> > > > > > > > > > dlieu.7@gmail.com>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Well we are predominantly Scala shop now. Being
> > > fluent
> > > > in
> > > > > > > Scala
> > > > > > > > > > seems
> > > > > > > > > > > > > like
> > > > > > > > > > > > > > one prerequisite.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Sat, Jun 13, 2015 at 1:17 AM, Sreenivas
> > Raghavan <
> > > > > > > > > > > > > > sreenivas.raghavan7@gmail.com> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hello everyone,
> > > > > > > > > > > > > > >                   I am interested in
> contributing
> > > to
> > > > > > mahout
> > > > > > > > > > > project.
> > > > > > > > > > > > I
> > > > > > > > > > > > > am
> > > > > > > > > > > > > > > interested in algorithms, machine learning and
> > > linear
> > > > > > > > algebra.
> > > > > > > > > > > Please
> > > > > > > > > > > > > > give
> > > > > > > > > > > > > > > me some idea as where to start and how to
> start.
> > I
> > > > know
> > > > > > > > python
> > > > > > > > > > and
> > > > > > > > > > > > some
> > > > > > > > > > > > > > > parts of Java, so please tell me is this
> > knowledge
> > > of
> > > > > > > > languages
> > > > > > > > > > > > enough
> > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > writing and optimizing codes
> > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > *With Regards,*
> > > > > > > > > > > > > > > *K.S.Sreenivasa Raghavan*
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > *With Regards,*
> > > > > *K.S.Sreenivasa Raghavan*
> > > > >
> > > >
> > >
> >
>



-- 

*With Regards,*
*K.S.Sreenivasa Raghavan*

Re: Contribution

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
Thanks for doing this. this is greatly appreciated.

What about that Cholesky issue? any takers?

On Wed, Jun 17, 2015 at 12:34 AM, Rohit Shinde <ro...@gmail.com>
wrote:

> Okay, so I'll get started with fixing the mahout spark shell. I'll ask
> issues on the mailing list as and when I encounter them. I'll go slowly
> though. I have GSoC going on and I will not be able to dedicate much time
> for the next two months.
>
> On Wed, Jun 17, 2015 at 2:21 AM, Dmitriy Lyubimov <dl...@gmail.com>
> wrote:
>
> > Guys, please file a Jira issue for Cholesky. this needs a bit of
> > investigation. I don't really know who wants to pick it.
> >
> > Mathematical problems -- i assume basic ones -- we need MVN and Wishart
> > multivariate distribution implementations which do not depend on
> > apache-math or any other 3rd party, as well as Gaussian process. I am
> > willing to outsource those to a first taker :-)
> >
> > for non-basic ones, as i mentioned, please scan the world :-) Topical
> stuff
> > would be nice to port back, like LDA CVB0 (although i think i read a
> paper
> > that basically goes back to gibbs sampling technique and now it is
> somehow
> > more fashionable way than variational bayes again for some reason:)
> >
> >
> > On Tue, Jun 16, 2015 at 1:34 PM, Nikolis Galerakis <nikolisgal@gmail.com
> >
> > wrote:
> >
> > > Hello
> > >
> > > I am really interested on Cholesky Decomposition is there any process
> > that
> > > I should follow to get assigned
> > > this task or I should  just dive into it ?
> > >
> > > Nikos
> > >
> > >
> > > 2015-06-16 20:48 GMT+02:00 Sreenivas Raghavan <
> > > sreenivas.raghavan7@gmail.com
> > > >:
> > >
> > > > Sir,
> > > >     I am interested in such kind of mathematical problems. Can you
> stat
> > > few
> > > > more?
> > > >
> > > > On Tue, Jun 16, 2015 at 10:29 PM, Dmitriy Lyubimov <
> dlieu.7@gmail.com>
> > > > wrote:
> > > >
> > > > > (1) Yes, making spark shell work with spark 1.3+ on 0.11-snapshot
> > would
> > > > be
> > > > > an awesome help.
> > > > > (2) I was thinking, if you are still into math problem, we have, in
> > my
> > > > > view, a problem in CholeskyDecomposition.
> > > > >
> > > > > This needs a little research. This involves methods solveRight,
> > > > solveLeft.
> > > > > (2a) solveLeft claims to do forward substitution (which it does),
> and
> > > > > solveRight claims to do back substitution, which it probably does
> > too.
> > > > But
> > > > > in reality it solves a different problem it is supposed to. In
> > classic
> > > > > scheme of things, if AX=B is positive (semi)definite, and A=LL'
> > > Cholesky
> > > > > decomposition, then forward substitution is supposed to solve LY=B
> > for
> > > Y
> > > > > and back substitution is supposed to solve L'X=Y, i.e. back
> > > substitution
> > > > is
> > > > > supposed to compute result of L'^-1Y. But current implementation
> does
> > > > > something that can be shown to be essentially equivalent to
> > solveLeft()
> > > > > rather than solution for L'X=Y. This needs to be looked at more
> > > carefully
> > > > >
> > > > > (2b) I also believe the whole names ofr solveLeft, solveRight are
> > > > > misleading. In all other cases, solve() methods traditionally
> denote
> > > > > solution of AX=B or XA=B for X. In Cholesky, neither of these
> methods
> > > > > actually provides a solution for AX=B, but rather provides a part
> of
> > > the
> > > > > solution. Therefore, i think, these methods should be renamed to
> > > > something
> > > > > like forwardSubs(), backSubs(), or better yet, name exactly what
> they
> > > are
> > > > > doing, e.g. computeLtInvZ(mxZ:Matrix). more over, it is probably
> > > > beneficial
> > > > > to have solve methods that actually do compute full solution of
> Ax=b
> > or
> > > > xA
> > > > > = b' by combining forward and back substitutions properly.
> > > > >
> > > > > I hope some of this fits, it takes time to write this.
> > > > >
> > > > > -Dmitriy
> > > > >
> > > > > On Tue, Jun 16, 2015 at 4:17 AM, Rohit Shinde <
> > > > rohit.shinde12194@gmail.com
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > Okay, it seems that methodology is a bit too advanced for me. I
> > would
> > > > go
> > > > > > with framework/engineering tasks. So should I start with fixing
> the
> > > > > mahout
> > > > > > spark shell?
> > > > > >
> > > > > > On Tue, Jun 16, 2015 at 11:20 AM, Dmitriy Lyubimov <
> > > dlieu.7@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > As i said, in methodology you can pick _anything_ that you
> think
> > > has
> > > > > > merit
> > > > > > > and not yet in the roadmap or done.
> > > > > > >
> > > > > > > For example, do you feel like you might research PSVM or
> interior
> > > > point
> > > > > > > SVM? Actually, any flavor of non-linear SVM that is different
> > from
> > > a
> > > > > > simple
> > > > > > > hinge loss?
> > > > > > > Do you think you can fit it in our algebraic engine?
> > > > > > >
> > > > > > > I think we also need a fair amount of port of MR methods --
> like
> > > > > > seq2sparse
> > > > > > > and cvb0 lda.
> > > > > > >
> > > > > > > i would still look at framework performance tasks, they are
> badly
> > > > > needed.
> > > > > > > Just today listened about flyby matrix multiplication approach
> > for
> > > > > spark
> > > > > > > for medium-sized matrices which probably beats our since even
> > > though
> > > > we
> > > > > > do
> > > > > > > not use cartesian (god forbid), our implementation is somewhat
> > > closer
> > > > > to
> > > > > > > what the speaker described as "massively mapside join" -- which
> > > > > > eventually,
> > > > > > > according to him, is supposed to gain over flyby multiply, but
> > > > there's
> > > > > a
> > > > > > > fair amount of tasks when it is not .
> > > > > > >
> > > > > > > similarly bolting on hardware libraries for in-core operations
> is
> > > > > still a
> > > > > > > big undecided issue.
> > > > > > >
> > > > > > > unfortunately a lot of known outstanding issues are still about
> > > > > > > engineering.
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Jun 15, 2015 at 10:27 PM, Rohit Shinde <
> > > > > > > rohit.shinde12194@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > I would prefer some methodology work if it falls within my
> > > > > > capabilities.
> > > > > > > If
> > > > > > > > it doesn't then your suggestion is a good one and I'll take
> it
> > > up.
> > > > > > > > Substantial according to me means a task where I can get
> quite
> > > > > familiar
> > > > > > > > with as much of the code base as possible.
> > > > > > > >
> > > > > > > > On Tue, Jun 16, 2015 at 10:49 AM, Dmitriy Lyubimov <
> > > > > dlieu.7@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > I gave you 3 types of problems. Define substantial.
> > > > > > > > >
> > > > > > > > > Say, does fixing mahout spark shell sound substantial
> enough?
> > > > > > > > >
> > > > > > > > > On Mon, Jun 15, 2015 at 10:11 PM, Rohit Shinde <
> > > > > > > > > rohit.shinde12194@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > So do you have any suggestions for getting started? I
> would
> > > > like
> > > > > to
> > > > > > > > > > contribute to something substantial that is going on,
> after
> > > > > getting
> > > > > > > > > > familiar with the required part of the codebase.
> > > > > > > > > >
> > > > > > > > > > On Mon, Jun 15, 2015 at 11:39 PM, Dmitriy Lyubimov <
> > > > > > > dlieu.7@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > i don't think there's a formal list published anywhere.
> > > > > > > > > > >
> > > > > > > > > > > There is an informal roadmap.
> > > > > > > > > > >
> > > > > > > > > > > The contributions are, the way i see it, mainly can be
> > in 3
> > > > > > areas:
> > > > > > > > (1)
> > > > > > > > > > > project support issues like for example fixing shell
> > > > > > compatibility
> > > > > > > > with
> > > > > > > > > > > spark 1.3; (2) framework support problems like for
> > example
> > > > > > > > performance
> > > > > > > > > > and
> > > > > > > > > > > integrating 3rd party hardware accelerated linalg
> > > libraries;
> > > > > (3)
> > > > > > > > > > > methodology work.
> > > > > > > > > > >
> > > > > > > > > > > We have some pending items for (1) and (2) i think but
> > for
> > > > > > > > methodology
> > > > > > > > > > > items (3) we simply can't compile the list of
> everything
> > > that
> > > > > can
> > > > > > > > > > possibly
> > > > > > > > > > > be done and contriubted. We just don't have that much
> > > > > expertise,
> > > > > > > > > > combined.
> > > > > > > > > > > No one has [1]. The way it works is usually people
> would
> > > come
> > > > > up
> > > > > > > with
> > > > > > > > > > > pieces that they were missing on their own for some
> > reason;
> > > > and
> > > > > > > they
> > > > > > > > > need
> > > > > > > > > > > to propose methodology, parallelization strategy, maybe
> > > even
> > > > a
> > > > > > code
> > > > > > > > > > sketch
> > > > > > > > > > > -- that all will be fine.
> > > > > > > > > > >
> > > > > > > > > > > [1]
> > http://matt.might.net/articles/phd-school-in-pictures/
> > > > > > > > > > >
> > > > > > > > > > > On Sun, Jun 14, 2015 at 11:49 PM, Rohit Shinde <
> > > > > > > > > > > rohit.shinde12194@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > But is there a list of projects that new people could
> > > take
> > > > > up?
> > > > > > > > Even I
> > > > > > > > > > am
> > > > > > > > > > > a
> > > > > > > > > > > > student interested in contributing to the machine
> > > learning
> > > > > and
> > > > > > > data
> > > > > > > > > > > mining
> > > > > > > > > > > > parts of Apache Mahout.
> > > > > > > > > > > >
> > > > > > > > > > > > I am familiar with Scala and Java, Python and C++.
> > > > > > > > > > > >
> > > > > > > > > > > > What can I contribute to?
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Jun 15, 2015 at 10:24 AM, Dmitriy Lyubimov <
> > > > > > > > > dlieu.7@gmail.com>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Well we are predominantly Scala shop now. Being
> > fluent
> > > in
> > > > > > Scala
> > > > > > > > > seems
> > > > > > > > > > > > like
> > > > > > > > > > > > > one prerequisite.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Sat, Jun 13, 2015 at 1:17 AM, Sreenivas
> Raghavan <
> > > > > > > > > > > > > sreenivas.raghavan7@gmail.com> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hello everyone,
> > > > > > > > > > > > > >                   I am interested in contributing
> > to
> > > > > mahout
> > > > > > > > > > project.
> > > > > > > > > > > I
> > > > > > > > > > > > am
> > > > > > > > > > > > > > interested in algorithms, machine learning and
> > linear
> > > > > > > algebra.
> > > > > > > > > > Please
> > > > > > > > > > > > > give
> > > > > > > > > > > > > > me some idea as where to start and how to start.
> I
> > > know
> > > > > > > python
> > > > > > > > > and
> > > > > > > > > > > some
> > > > > > > > > > > > > > parts of Java, so please tell me is this
> knowledge
> > of
> > > > > > > languages
> > > > > > > > > > > enough
> > > > > > > > > > > > > for
> > > > > > > > > > > > > > writing and optimizing codes
> > > > > > > > > > > > > > --
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > *With Regards,*
> > > > > > > > > > > > > > *K.S.Sreenivasa Raghavan*
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > *With Regards,*
> > > > *K.S.Sreenivasa Raghavan*
> > > >
> > >
> >
>

Re: Contribution

Posted by Rohit Shinde <ro...@gmail.com>.
Okay, so I'll get started with fixing the mahout spark shell. I'll ask
issues on the mailing list as and when I encounter them. I'll go slowly
though. I have GSoC going on and I will not be able to dedicate much time
for the next two months.

On Wed, Jun 17, 2015 at 2:21 AM, Dmitriy Lyubimov <dl...@gmail.com> wrote:

> Guys, please file a Jira issue for Cholesky. this needs a bit of
> investigation. I don't really know who wants to pick it.
>
> Mathematical problems -- i assume basic ones -- we need MVN and Wishart
> multivariate distribution implementations which do not depend on
> apache-math or any other 3rd party, as well as Gaussian process. I am
> willing to outsource those to a first taker :-)
>
> for non-basic ones, as i mentioned, please scan the world :-) Topical stuff
> would be nice to port back, like LDA CVB0 (although i think i read a paper
> that basically goes back to gibbs sampling technique and now it is somehow
> more fashionable way than variational bayes again for some reason:)
>
>
> On Tue, Jun 16, 2015 at 1:34 PM, Nikolis Galerakis <ni...@gmail.com>
> wrote:
>
> > Hello
> >
> > I am really interested on Cholesky Decomposition is there any process
> that
> > I should follow to get assigned
> > this task or I should  just dive into it ?
> >
> > Nikos
> >
> >
> > 2015-06-16 20:48 GMT+02:00 Sreenivas Raghavan <
> > sreenivas.raghavan7@gmail.com
> > >:
> >
> > > Sir,
> > >     I am interested in such kind of mathematical problems. Can you stat
> > few
> > > more?
> > >
> > > On Tue, Jun 16, 2015 at 10:29 PM, Dmitriy Lyubimov <dl...@gmail.com>
> > > wrote:
> > >
> > > > (1) Yes, making spark shell work with spark 1.3+ on 0.11-snapshot
> would
> > > be
> > > > an awesome help.
> > > > (2) I was thinking, if you are still into math problem, we have, in
> my
> > > > view, a problem in CholeskyDecomposition.
> > > >
> > > > This needs a little research. This involves methods solveRight,
> > > solveLeft.
> > > > (2a) solveLeft claims to do forward substitution (which it does), and
> > > > solveRight claims to do back substitution, which it probably does
> too.
> > > But
> > > > in reality it solves a different problem it is supposed to. In
> classic
> > > > scheme of things, if AX=B is positive (semi)definite, and A=LL'
> > Cholesky
> > > > decomposition, then forward substitution is supposed to solve LY=B
> for
> > Y
> > > > and back substitution is supposed to solve L'X=Y, i.e. back
> > substitution
> > > is
> > > > supposed to compute result of L'^-1Y. But current implementation does
> > > > something that can be shown to be essentially equivalent to
> solveLeft()
> > > > rather than solution for L'X=Y. This needs to be looked at more
> > carefully
> > > >
> > > > (2b) I also believe the whole names ofr solveLeft, solveRight are
> > > > misleading. In all other cases, solve() methods traditionally denote
> > > > solution of AX=B or XA=B for X. In Cholesky, neither of these methods
> > > > actually provides a solution for AX=B, but rather provides a part of
> > the
> > > > solution. Therefore, i think, these methods should be renamed to
> > > something
> > > > like forwardSubs(), backSubs(), or better yet, name exactly what they
> > are
> > > > doing, e.g. computeLtInvZ(mxZ:Matrix). more over, it is probably
> > > beneficial
> > > > to have solve methods that actually do compute full solution of Ax=b
> or
> > > xA
> > > > = b' by combining forward and back substitutions properly.
> > > >
> > > > I hope some of this fits, it takes time to write this.
> > > >
> > > > -Dmitriy
> > > >
> > > > On Tue, Jun 16, 2015 at 4:17 AM, Rohit Shinde <
> > > rohit.shinde12194@gmail.com
> > > > >
> > > > wrote:
> > > >
> > > > > Okay, it seems that methodology is a bit too advanced for me. I
> would
> > > go
> > > > > with framework/engineering tasks. So should I start with fixing the
> > > > mahout
> > > > > spark shell?
> > > > >
> > > > > On Tue, Jun 16, 2015 at 11:20 AM, Dmitriy Lyubimov <
> > dlieu.7@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > As i said, in methodology you can pick _anything_ that you think
> > has
> > > > > merit
> > > > > > and not yet in the roadmap or done.
> > > > > >
> > > > > > For example, do you feel like you might research PSVM or interior
> > > point
> > > > > > SVM? Actually, any flavor of non-linear SVM that is different
> from
> > a
> > > > > simple
> > > > > > hinge loss?
> > > > > > Do you think you can fit it in our algebraic engine?
> > > > > >
> > > > > > I think we also need a fair amount of port of MR methods -- like
> > > > > seq2sparse
> > > > > > and cvb0 lda.
> > > > > >
> > > > > > i would still look at framework performance tasks, they are badly
> > > > needed.
> > > > > > Just today listened about flyby matrix multiplication approach
> for
> > > > spark
> > > > > > for medium-sized matrices which probably beats our since even
> > though
> > > we
> > > > > do
> > > > > > not use cartesian (god forbid), our implementation is somewhat
> > closer
> > > > to
> > > > > > what the speaker described as "massively mapside join" -- which
> > > > > eventually,
> > > > > > according to him, is supposed to gain over flyby multiply, but
> > > there's
> > > > a
> > > > > > fair amount of tasks when it is not .
> > > > > >
> > > > > > similarly bolting on hardware libraries for in-core operations is
> > > > still a
> > > > > > big undecided issue.
> > > > > >
> > > > > > unfortunately a lot of known outstanding issues are still about
> > > > > > engineering.
> > > > > >
> > > > > >
> > > > > > On Mon, Jun 15, 2015 at 10:27 PM, Rohit Shinde <
> > > > > > rohit.shinde12194@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > I would prefer some methodology work if it falls within my
> > > > > capabilities.
> > > > > > If
> > > > > > > it doesn't then your suggestion is a good one and I'll take it
> > up.
> > > > > > > Substantial according to me means a task where I can get quite
> > > > familiar
> > > > > > > with as much of the code base as possible.
> > > > > > >
> > > > > > > On Tue, Jun 16, 2015 at 10:49 AM, Dmitriy Lyubimov <
> > > > dlieu.7@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > I gave you 3 types of problems. Define substantial.
> > > > > > > >
> > > > > > > > Say, does fixing mahout spark shell sound substantial enough?
> > > > > > > >
> > > > > > > > On Mon, Jun 15, 2015 at 10:11 PM, Rohit Shinde <
> > > > > > > > rohit.shinde12194@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > So do you have any suggestions for getting started? I would
> > > like
> > > > to
> > > > > > > > > contribute to something substantial that is going on, after
> > > > getting
> > > > > > > > > familiar with the required part of the codebase.
> > > > > > > > >
> > > > > > > > > On Mon, Jun 15, 2015 at 11:39 PM, Dmitriy Lyubimov <
> > > > > > dlieu.7@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > i don't think there's a formal list published anywhere.
> > > > > > > > > >
> > > > > > > > > > There is an informal roadmap.
> > > > > > > > > >
> > > > > > > > > > The contributions are, the way i see it, mainly can be
> in 3
> > > > > areas:
> > > > > > > (1)
> > > > > > > > > > project support issues like for example fixing shell
> > > > > compatibility
> > > > > > > with
> > > > > > > > > > spark 1.3; (2) framework support problems like for
> example
> > > > > > > performance
> > > > > > > > > and
> > > > > > > > > > integrating 3rd party hardware accelerated linalg
> > libraries;
> > > > (3)
> > > > > > > > > > methodology work.
> > > > > > > > > >
> > > > > > > > > > We have some pending items for (1) and (2) i think but
> for
> > > > > > > methodology
> > > > > > > > > > items (3) we simply can't compile the list of everything
> > that
> > > > can
> > > > > > > > > possibly
> > > > > > > > > > be done and contriubted. We just don't have that much
> > > > expertise,
> > > > > > > > > combined.
> > > > > > > > > > No one has [1]. The way it works is usually people would
> > come
> > > > up
> > > > > > with
> > > > > > > > > > pieces that they were missing on their own for some
> reason;
> > > and
> > > > > > they
> > > > > > > > need
> > > > > > > > > > to propose methodology, parallelization strategy, maybe
> > even
> > > a
> > > > > code
> > > > > > > > > sketch
> > > > > > > > > > -- that all will be fine.
> > > > > > > > > >
> > > > > > > > > > [1]
> http://matt.might.net/articles/phd-school-in-pictures/
> > > > > > > > > >
> > > > > > > > > > On Sun, Jun 14, 2015 at 11:49 PM, Rohit Shinde <
> > > > > > > > > > rohit.shinde12194@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > But is there a list of projects that new people could
> > take
> > > > up?
> > > > > > > Even I
> > > > > > > > > am
> > > > > > > > > > a
> > > > > > > > > > > student interested in contributing to the machine
> > learning
> > > > and
> > > > > > data
> > > > > > > > > > mining
> > > > > > > > > > > parts of Apache Mahout.
> > > > > > > > > > >
> > > > > > > > > > > I am familiar with Scala and Java, Python and C++.
> > > > > > > > > > >
> > > > > > > > > > > What can I contribute to?
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Jun 15, 2015 at 10:24 AM, Dmitriy Lyubimov <
> > > > > > > > dlieu.7@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Well we are predominantly Scala shop now. Being
> fluent
> > in
> > > > > Scala
> > > > > > > > seems
> > > > > > > > > > > like
> > > > > > > > > > > > one prerequisite.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Sat, Jun 13, 2015 at 1:17 AM, Sreenivas Raghavan <
> > > > > > > > > > > > sreenivas.raghavan7@gmail.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hello everyone,
> > > > > > > > > > > > >                   I am interested in contributing
> to
> > > > mahout
> > > > > > > > > project.
> > > > > > > > > > I
> > > > > > > > > > > am
> > > > > > > > > > > > > interested in algorithms, machine learning and
> linear
> > > > > > algebra.
> > > > > > > > > Please
> > > > > > > > > > > > give
> > > > > > > > > > > > > me some idea as where to start and how to start. I
> > know
> > > > > > python
> > > > > > > > and
> > > > > > > > > > some
> > > > > > > > > > > > > parts of Java, so please tell me is this knowledge
> of
> > > > > > languages
> > > > > > > > > > enough
> > > > > > > > > > > > for
> > > > > > > > > > > > > writing and optimizing codes
> > > > > > > > > > > > > --
> > > > > > > > > > > > >
> > > > > > > > > > > > > *With Regards,*
> > > > > > > > > > > > > *K.S.Sreenivasa Raghavan*
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > *With Regards,*
> > > *K.S.Sreenivasa Raghavan*
> > >
> >
>

Re: Contribution

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
Guys, please file a Jira issue for Cholesky. this needs a bit of
investigation. I don't really know who wants to pick it.

Mathematical problems -- i assume basic ones -- we need MVN and Wishart
multivariate distribution implementations which do not depend on
apache-math or any other 3rd party, as well as Gaussian process. I am
willing to outsource those to a first taker :-)

for non-basic ones, as i mentioned, please scan the world :-) Topical stuff
would be nice to port back, like LDA CVB0 (although i think i read a paper
that basically goes back to gibbs sampling technique and now it is somehow
more fashionable way than variational bayes again for some reason:)


On Tue, Jun 16, 2015 at 1:34 PM, Nikolis Galerakis <ni...@gmail.com>
wrote:

> Hello
>
> I am really interested on Cholesky Decomposition is there any process that
> I should follow to get assigned
> this task or I should  just dive into it ?
>
> Nikos
>
>
> 2015-06-16 20:48 GMT+02:00 Sreenivas Raghavan <
> sreenivas.raghavan7@gmail.com
> >:
>
> > Sir,
> >     I am interested in such kind of mathematical problems. Can you stat
> few
> > more?
> >
> > On Tue, Jun 16, 2015 at 10:29 PM, Dmitriy Lyubimov <dl...@gmail.com>
> > wrote:
> >
> > > (1) Yes, making spark shell work with spark 1.3+ on 0.11-snapshot would
> > be
> > > an awesome help.
> > > (2) I was thinking, if you are still into math problem, we have, in my
> > > view, a problem in CholeskyDecomposition.
> > >
> > > This needs a little research. This involves methods solveRight,
> > solveLeft.
> > > (2a) solveLeft claims to do forward substitution (which it does), and
> > > solveRight claims to do back substitution, which it probably does too.
> > But
> > > in reality it solves a different problem it is supposed to. In classic
> > > scheme of things, if AX=B is positive (semi)definite, and A=LL'
> Cholesky
> > > decomposition, then forward substitution is supposed to solve LY=B for
> Y
> > > and back substitution is supposed to solve L'X=Y, i.e. back
> substitution
> > is
> > > supposed to compute result of L'^-1Y. But current implementation does
> > > something that can be shown to be essentially equivalent to solveLeft()
> > > rather than solution for L'X=Y. This needs to be looked at more
> carefully
> > >
> > > (2b) I also believe the whole names ofr solveLeft, solveRight are
> > > misleading. In all other cases, solve() methods traditionally denote
> > > solution of AX=B or XA=B for X. In Cholesky, neither of these methods
> > > actually provides a solution for AX=B, but rather provides a part of
> the
> > > solution. Therefore, i think, these methods should be renamed to
> > something
> > > like forwardSubs(), backSubs(), or better yet, name exactly what they
> are
> > > doing, e.g. computeLtInvZ(mxZ:Matrix). more over, it is probably
> > beneficial
> > > to have solve methods that actually do compute full solution of Ax=b or
> > xA
> > > = b' by combining forward and back substitutions properly.
> > >
> > > I hope some of this fits, it takes time to write this.
> > >
> > > -Dmitriy
> > >
> > > On Tue, Jun 16, 2015 at 4:17 AM, Rohit Shinde <
> > rohit.shinde12194@gmail.com
> > > >
> > > wrote:
> > >
> > > > Okay, it seems that methodology is a bit too advanced for me. I would
> > go
> > > > with framework/engineering tasks. So should I start with fixing the
> > > mahout
> > > > spark shell?
> > > >
> > > > On Tue, Jun 16, 2015 at 11:20 AM, Dmitriy Lyubimov <
> dlieu.7@gmail.com>
> > > > wrote:
> > > >
> > > > > As i said, in methodology you can pick _anything_ that you think
> has
> > > > merit
> > > > > and not yet in the roadmap or done.
> > > > >
> > > > > For example, do you feel like you might research PSVM or interior
> > point
> > > > > SVM? Actually, any flavor of non-linear SVM that is different from
> a
> > > > simple
> > > > > hinge loss?
> > > > > Do you think you can fit it in our algebraic engine?
> > > > >
> > > > > I think we also need a fair amount of port of MR methods -- like
> > > > seq2sparse
> > > > > and cvb0 lda.
> > > > >
> > > > > i would still look at framework performance tasks, they are badly
> > > needed.
> > > > > Just today listened about flyby matrix multiplication approach for
> > > spark
> > > > > for medium-sized matrices which probably beats our since even
> though
> > we
> > > > do
> > > > > not use cartesian (god forbid), our implementation is somewhat
> closer
> > > to
> > > > > what the speaker described as "massively mapside join" -- which
> > > > eventually,
> > > > > according to him, is supposed to gain over flyby multiply, but
> > there's
> > > a
> > > > > fair amount of tasks when it is not .
> > > > >
> > > > > similarly bolting on hardware libraries for in-core operations is
> > > still a
> > > > > big undecided issue.
> > > > >
> > > > > unfortunately a lot of known outstanding issues are still about
> > > > > engineering.
> > > > >
> > > > >
> > > > > On Mon, Jun 15, 2015 at 10:27 PM, Rohit Shinde <
> > > > > rohit.shinde12194@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > I would prefer some methodology work if it falls within my
> > > > capabilities.
> > > > > If
> > > > > > it doesn't then your suggestion is a good one and I'll take it
> up.
> > > > > > Substantial according to me means a task where I can get quite
> > > familiar
> > > > > > with as much of the code base as possible.
> > > > > >
> > > > > > On Tue, Jun 16, 2015 at 10:49 AM, Dmitriy Lyubimov <
> > > dlieu.7@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > I gave you 3 types of problems. Define substantial.
> > > > > > >
> > > > > > > Say, does fixing mahout spark shell sound substantial enough?
> > > > > > >
> > > > > > > On Mon, Jun 15, 2015 at 10:11 PM, Rohit Shinde <
> > > > > > > rohit.shinde12194@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > So do you have any suggestions for getting started? I would
> > like
> > > to
> > > > > > > > contribute to something substantial that is going on, after
> > > getting
> > > > > > > > familiar with the required part of the codebase.
> > > > > > > >
> > > > > > > > On Mon, Jun 15, 2015 at 11:39 PM, Dmitriy Lyubimov <
> > > > > dlieu.7@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > i don't think there's a formal list published anywhere.
> > > > > > > > >
> > > > > > > > > There is an informal roadmap.
> > > > > > > > >
> > > > > > > > > The contributions are, the way i see it, mainly can be in 3
> > > > areas:
> > > > > > (1)
> > > > > > > > > project support issues like for example fixing shell
> > > > compatibility
> > > > > > with
> > > > > > > > > spark 1.3; (2) framework support problems like for example
> > > > > > performance
> > > > > > > > and
> > > > > > > > > integrating 3rd party hardware accelerated linalg
> libraries;
> > > (3)
> > > > > > > > > methodology work.
> > > > > > > > >
> > > > > > > > > We have some pending items for (1) and (2) i think but for
> > > > > > methodology
> > > > > > > > > items (3) we simply can't compile the list of everything
> that
> > > can
> > > > > > > > possibly
> > > > > > > > > be done and contriubted. We just don't have that much
> > > expertise,
> > > > > > > > combined.
> > > > > > > > > No one has [1]. The way it works is usually people would
> come
> > > up
> > > > > with
> > > > > > > > > pieces that they were missing on their own for some reason;
> > and
> > > > > they
> > > > > > > need
> > > > > > > > > to propose methodology, parallelization strategy, maybe
> even
> > a
> > > > code
> > > > > > > > sketch
> > > > > > > > > -- that all will be fine.
> > > > > > > > >
> > > > > > > > > [1] http://matt.might.net/articles/phd-school-in-pictures/
> > > > > > > > >
> > > > > > > > > On Sun, Jun 14, 2015 at 11:49 PM, Rohit Shinde <
> > > > > > > > > rohit.shinde12194@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > But is there a list of projects that new people could
> take
> > > up?
> > > > > > Even I
> > > > > > > > am
> > > > > > > > > a
> > > > > > > > > > student interested in contributing to the machine
> learning
> > > and
> > > > > data
> > > > > > > > > mining
> > > > > > > > > > parts of Apache Mahout.
> > > > > > > > > >
> > > > > > > > > > I am familiar with Scala and Java, Python and C++.
> > > > > > > > > >
> > > > > > > > > > What can I contribute to?
> > > > > > > > > >
> > > > > > > > > > On Mon, Jun 15, 2015 at 10:24 AM, Dmitriy Lyubimov <
> > > > > > > dlieu.7@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Well we are predominantly Scala shop now. Being fluent
> in
> > > > Scala
> > > > > > > seems
> > > > > > > > > > like
> > > > > > > > > > > one prerequisite.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Sat, Jun 13, 2015 at 1:17 AM, Sreenivas Raghavan <
> > > > > > > > > > > sreenivas.raghavan7@gmail.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hello everyone,
> > > > > > > > > > > >                   I am interested in contributing to
> > > mahout
> > > > > > > > project.
> > > > > > > > > I
> > > > > > > > > > am
> > > > > > > > > > > > interested in algorithms, machine learning and linear
> > > > > algebra.
> > > > > > > > Please
> > > > > > > > > > > give
> > > > > > > > > > > > me some idea as where to start and how to start. I
> know
> > > > > python
> > > > > > > and
> > > > > > > > > some
> > > > > > > > > > > > parts of Java, so please tell me is this knowledge of
> > > > > languages
> > > > > > > > > enough
> > > > > > > > > > > for
> > > > > > > > > > > > writing and optimizing codes
> > > > > > > > > > > > --
> > > > > > > > > > > >
> > > > > > > > > > > > *With Regards,*
> > > > > > > > > > > > *K.S.Sreenivasa Raghavan*
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> >
> > *With Regards,*
> > *K.S.Sreenivasa Raghavan*
> >
>

Re: Contribution

Posted by Nikolis Galerakis <ni...@gmail.com>.
Hello

I am really interested on Cholesky Decomposition is there any process that
I should follow to get assigned
this task or I should  just dive into it ?

Nikos


2015-06-16 20:48 GMT+02:00 Sreenivas Raghavan <sreenivas.raghavan7@gmail.com
>:

> Sir,
>     I am interested in such kind of mathematical problems. Can you stat few
> more?
>
> On Tue, Jun 16, 2015 at 10:29 PM, Dmitriy Lyubimov <dl...@gmail.com>
> wrote:
>
> > (1) Yes, making spark shell work with spark 1.3+ on 0.11-snapshot would
> be
> > an awesome help.
> > (2) I was thinking, if you are still into math problem, we have, in my
> > view, a problem in CholeskyDecomposition.
> >
> > This needs a little research. This involves methods solveRight,
> solveLeft.
> > (2a) solveLeft claims to do forward substitution (which it does), and
> > solveRight claims to do back substitution, which it probably does too.
> But
> > in reality it solves a different problem it is supposed to. In classic
> > scheme of things, if AX=B is positive (semi)definite, and A=LL' Cholesky
> > decomposition, then forward substitution is supposed to solve LY=B for Y
> > and back substitution is supposed to solve L'X=Y, i.e. back substitution
> is
> > supposed to compute result of L'^-1Y. But current implementation does
> > something that can be shown to be essentially equivalent to solveLeft()
> > rather than solution for L'X=Y. This needs to be looked at more carefully
> >
> > (2b) I also believe the whole names ofr solveLeft, solveRight are
> > misleading. In all other cases, solve() methods traditionally denote
> > solution of AX=B or XA=B for X. In Cholesky, neither of these methods
> > actually provides a solution for AX=B, but rather provides a part of the
> > solution. Therefore, i think, these methods should be renamed to
> something
> > like forwardSubs(), backSubs(), or better yet, name exactly what they are
> > doing, e.g. computeLtInvZ(mxZ:Matrix). more over, it is probably
> beneficial
> > to have solve methods that actually do compute full solution of Ax=b or
> xA
> > = b' by combining forward and back substitutions properly.
> >
> > I hope some of this fits, it takes time to write this.
> >
> > -Dmitriy
> >
> > On Tue, Jun 16, 2015 at 4:17 AM, Rohit Shinde <
> rohit.shinde12194@gmail.com
> > >
> > wrote:
> >
> > > Okay, it seems that methodology is a bit too advanced for me. I would
> go
> > > with framework/engineering tasks. So should I start with fixing the
> > mahout
> > > spark shell?
> > >
> > > On Tue, Jun 16, 2015 at 11:20 AM, Dmitriy Lyubimov <dl...@gmail.com>
> > > wrote:
> > >
> > > > As i said, in methodology you can pick _anything_ that you think has
> > > merit
> > > > and not yet in the roadmap or done.
> > > >
> > > > For example, do you feel like you might research PSVM or interior
> point
> > > > SVM? Actually, any flavor of non-linear SVM that is different from a
> > > simple
> > > > hinge loss?
> > > > Do you think you can fit it in our algebraic engine?
> > > >
> > > > I think we also need a fair amount of port of MR methods -- like
> > > seq2sparse
> > > > and cvb0 lda.
> > > >
> > > > i would still look at framework performance tasks, they are badly
> > needed.
> > > > Just today listened about flyby matrix multiplication approach for
> > spark
> > > > for medium-sized matrices which probably beats our since even though
> we
> > > do
> > > > not use cartesian (god forbid), our implementation is somewhat closer
> > to
> > > > what the speaker described as "massively mapside join" -- which
> > > eventually,
> > > > according to him, is supposed to gain over flyby multiply, but
> there's
> > a
> > > > fair amount of tasks when it is not .
> > > >
> > > > similarly bolting on hardware libraries for in-core operations is
> > still a
> > > > big undecided issue.
> > > >
> > > > unfortunately a lot of known outstanding issues are still about
> > > > engineering.
> > > >
> > > >
> > > > On Mon, Jun 15, 2015 at 10:27 PM, Rohit Shinde <
> > > > rohit.shinde12194@gmail.com>
> > > > wrote:
> > > >
> > > > > I would prefer some methodology work if it falls within my
> > > capabilities.
> > > > If
> > > > > it doesn't then your suggestion is a good one and I'll take it up.
> > > > > Substantial according to me means a task where I can get quite
> > familiar
> > > > > with as much of the code base as possible.
> > > > >
> > > > > On Tue, Jun 16, 2015 at 10:49 AM, Dmitriy Lyubimov <
> > dlieu.7@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > I gave you 3 types of problems. Define substantial.
> > > > > >
> > > > > > Say, does fixing mahout spark shell sound substantial enough?
> > > > > >
> > > > > > On Mon, Jun 15, 2015 at 10:11 PM, Rohit Shinde <
> > > > > > rohit.shinde12194@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > So do you have any suggestions for getting started? I would
> like
> > to
> > > > > > > contribute to something substantial that is going on, after
> > getting
> > > > > > > familiar with the required part of the codebase.
> > > > > > >
> > > > > > > On Mon, Jun 15, 2015 at 11:39 PM, Dmitriy Lyubimov <
> > > > dlieu.7@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > i don't think there's a formal list published anywhere.
> > > > > > > >
> > > > > > > > There is an informal roadmap.
> > > > > > > >
> > > > > > > > The contributions are, the way i see it, mainly can be in 3
> > > areas:
> > > > > (1)
> > > > > > > > project support issues like for example fixing shell
> > > compatibility
> > > > > with
> > > > > > > > spark 1.3; (2) framework support problems like for example
> > > > > performance
> > > > > > > and
> > > > > > > > integrating 3rd party hardware accelerated linalg libraries;
> > (3)
> > > > > > > > methodology work.
> > > > > > > >
> > > > > > > > We have some pending items for (1) and (2) i think but for
> > > > > methodology
> > > > > > > > items (3) we simply can't compile the list of everything that
> > can
> > > > > > > possibly
> > > > > > > > be done and contriubted. We just don't have that much
> > expertise,
> > > > > > > combined.
> > > > > > > > No one has [1]. The way it works is usually people would come
> > up
> > > > with
> > > > > > > > pieces that they were missing on their own for some reason;
> and
> > > > they
> > > > > > need
> > > > > > > > to propose methodology, parallelization strategy, maybe even
> a
> > > code
> > > > > > > sketch
> > > > > > > > -- that all will be fine.
> > > > > > > >
> > > > > > > > [1] http://matt.might.net/articles/phd-school-in-pictures/
> > > > > > > >
> > > > > > > > On Sun, Jun 14, 2015 at 11:49 PM, Rohit Shinde <
> > > > > > > > rohit.shinde12194@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > But is there a list of projects that new people could take
> > up?
> > > > > Even I
> > > > > > > am
> > > > > > > > a
> > > > > > > > > student interested in contributing to the machine learning
> > and
> > > > data
> > > > > > > > mining
> > > > > > > > > parts of Apache Mahout.
> > > > > > > > >
> > > > > > > > > I am familiar with Scala and Java, Python and C++.
> > > > > > > > >
> > > > > > > > > What can I contribute to?
> > > > > > > > >
> > > > > > > > > On Mon, Jun 15, 2015 at 10:24 AM, Dmitriy Lyubimov <
> > > > > > dlieu.7@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Well we are predominantly Scala shop now. Being fluent in
> > > Scala
> > > > > > seems
> > > > > > > > > like
> > > > > > > > > > one prerequisite.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Sat, Jun 13, 2015 at 1:17 AM, Sreenivas Raghavan <
> > > > > > > > > > sreenivas.raghavan7@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > > Hello everyone,
> > > > > > > > > > >                   I am interested in contributing to
> > mahout
> > > > > > > project.
> > > > > > > > I
> > > > > > > > > am
> > > > > > > > > > > interested in algorithms, machine learning and linear
> > > > algebra.
> > > > > > > Please
> > > > > > > > > > give
> > > > > > > > > > > me some idea as where to start and how to start. I know
> > > > python
> > > > > > and
> > > > > > > > some
> > > > > > > > > > > parts of Java, so please tell me is this knowledge of
> > > > languages
> > > > > > > > enough
> > > > > > > > > > for
> > > > > > > > > > > writing and optimizing codes
> > > > > > > > > > > --
> > > > > > > > > > >
> > > > > > > > > > > *With Regards,*
> > > > > > > > > > > *K.S.Sreenivasa Raghavan*
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
>
> --
>
> *With Regards,*
> *K.S.Sreenivasa Raghavan*
>

Re: Contribution

Posted by Sreenivas Raghavan <sr...@gmail.com>.
Sir,
    I am interested in such kind of mathematical problems. Can you stat few
more?

On Tue, Jun 16, 2015 at 10:29 PM, Dmitriy Lyubimov <dl...@gmail.com>
wrote:

> (1) Yes, making spark shell work with spark 1.3+ on 0.11-snapshot would be
> an awesome help.
> (2) I was thinking, if you are still into math problem, we have, in my
> view, a problem in CholeskyDecomposition.
>
> This needs a little research. This involves methods solveRight, solveLeft.
> (2a) solveLeft claims to do forward substitution (which it does), and
> solveRight claims to do back substitution, which it probably does too. But
> in reality it solves a different problem it is supposed to. In classic
> scheme of things, if AX=B is positive (semi)definite, and A=LL' Cholesky
> decomposition, then forward substitution is supposed to solve LY=B for Y
> and back substitution is supposed to solve L'X=Y, i.e. back substitution is
> supposed to compute result of L'^-1Y. But current implementation does
> something that can be shown to be essentially equivalent to solveLeft()
> rather than solution for L'X=Y. This needs to be looked at more carefully
>
> (2b) I also believe the whole names ofr solveLeft, solveRight are
> misleading. In all other cases, solve() methods traditionally denote
> solution of AX=B or XA=B for X. In Cholesky, neither of these methods
> actually provides a solution for AX=B, but rather provides a part of the
> solution. Therefore, i think, these methods should be renamed to something
> like forwardSubs(), backSubs(), or better yet, name exactly what they are
> doing, e.g. computeLtInvZ(mxZ:Matrix). more over, it is probably beneficial
> to have solve methods that actually do compute full solution of Ax=b or xA
> = b' by combining forward and back substitutions properly.
>
> I hope some of this fits, it takes time to write this.
>
> -Dmitriy
>
> On Tue, Jun 16, 2015 at 4:17 AM, Rohit Shinde <rohit.shinde12194@gmail.com
> >
> wrote:
>
> > Okay, it seems that methodology is a bit too advanced for me. I would go
> > with framework/engineering tasks. So should I start with fixing the
> mahout
> > spark shell?
> >
> > On Tue, Jun 16, 2015 at 11:20 AM, Dmitriy Lyubimov <dl...@gmail.com>
> > wrote:
> >
> > > As i said, in methodology you can pick _anything_ that you think has
> > merit
> > > and not yet in the roadmap or done.
> > >
> > > For example, do you feel like you might research PSVM or interior point
> > > SVM? Actually, any flavor of non-linear SVM that is different from a
> > simple
> > > hinge loss?
> > > Do you think you can fit it in our algebraic engine?
> > >
> > > I think we also need a fair amount of port of MR methods -- like
> > seq2sparse
> > > and cvb0 lda.
> > >
> > > i would still look at framework performance tasks, they are badly
> needed.
> > > Just today listened about flyby matrix multiplication approach for
> spark
> > > for medium-sized matrices which probably beats our since even though we
> > do
> > > not use cartesian (god forbid), our implementation is somewhat closer
> to
> > > what the speaker described as "massively mapside join" -- which
> > eventually,
> > > according to him, is supposed to gain over flyby multiply, but there's
> a
> > > fair amount of tasks when it is not .
> > >
> > > similarly bolting on hardware libraries for in-core operations is
> still a
> > > big undecided issue.
> > >
> > > unfortunately a lot of known outstanding issues are still about
> > > engineering.
> > >
> > >
> > > On Mon, Jun 15, 2015 at 10:27 PM, Rohit Shinde <
> > > rohit.shinde12194@gmail.com>
> > > wrote:
> > >
> > > > I would prefer some methodology work if it falls within my
> > capabilities.
> > > If
> > > > it doesn't then your suggestion is a good one and I'll take it up.
> > > > Substantial according to me means a task where I can get quite
> familiar
> > > > with as much of the code base as possible.
> > > >
> > > > On Tue, Jun 16, 2015 at 10:49 AM, Dmitriy Lyubimov <
> dlieu.7@gmail.com>
> > > > wrote:
> > > >
> > > > > I gave you 3 types of problems. Define substantial.
> > > > >
> > > > > Say, does fixing mahout spark shell sound substantial enough?
> > > > >
> > > > > On Mon, Jun 15, 2015 at 10:11 PM, Rohit Shinde <
> > > > > rohit.shinde12194@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > So do you have any suggestions for getting started? I would like
> to
> > > > > > contribute to something substantial that is going on, after
> getting
> > > > > > familiar with the required part of the codebase.
> > > > > >
> > > > > > On Mon, Jun 15, 2015 at 11:39 PM, Dmitriy Lyubimov <
> > > dlieu.7@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > i don't think there's a formal list published anywhere.
> > > > > > >
> > > > > > > There is an informal roadmap.
> > > > > > >
> > > > > > > The contributions are, the way i see it, mainly can be in 3
> > areas:
> > > > (1)
> > > > > > > project support issues like for example fixing shell
> > compatibility
> > > > with
> > > > > > > spark 1.3; (2) framework support problems like for example
> > > > performance
> > > > > > and
> > > > > > > integrating 3rd party hardware accelerated linalg libraries;
> (3)
> > > > > > > methodology work.
> > > > > > >
> > > > > > > We have some pending items for (1) and (2) i think but for
> > > > methodology
> > > > > > > items (3) we simply can't compile the list of everything that
> can
> > > > > > possibly
> > > > > > > be done and contriubted. We just don't have that much
> expertise,
> > > > > > combined.
> > > > > > > No one has [1]. The way it works is usually people would come
> up
> > > with
> > > > > > > pieces that they were missing on their own for some reason; and
> > > they
> > > > > need
> > > > > > > to propose methodology, parallelization strategy, maybe even a
> > code
> > > > > > sketch
> > > > > > > -- that all will be fine.
> > > > > > >
> > > > > > > [1] http://matt.might.net/articles/phd-school-in-pictures/
> > > > > > >
> > > > > > > On Sun, Jun 14, 2015 at 11:49 PM, Rohit Shinde <
> > > > > > > rohit.shinde12194@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > But is there a list of projects that new people could take
> up?
> > > > Even I
> > > > > > am
> > > > > > > a
> > > > > > > > student interested in contributing to the machine learning
> and
> > > data
> > > > > > > mining
> > > > > > > > parts of Apache Mahout.
> > > > > > > >
> > > > > > > > I am familiar with Scala and Java, Python and C++.
> > > > > > > >
> > > > > > > > What can I contribute to?
> > > > > > > >
> > > > > > > > On Mon, Jun 15, 2015 at 10:24 AM, Dmitriy Lyubimov <
> > > > > dlieu.7@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Well we are predominantly Scala shop now. Being fluent in
> > Scala
> > > > > seems
> > > > > > > > like
> > > > > > > > > one prerequisite.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Sat, Jun 13, 2015 at 1:17 AM, Sreenivas Raghavan <
> > > > > > > > > sreenivas.raghavan7@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > > Hello everyone,
> > > > > > > > > >                   I am interested in contributing to
> mahout
> > > > > > project.
> > > > > > > I
> > > > > > > > am
> > > > > > > > > > interested in algorithms, machine learning and linear
> > > algebra.
> > > > > > Please
> > > > > > > > > give
> > > > > > > > > > me some idea as where to start and how to start. I know
> > > python
> > > > > and
> > > > > > > some
> > > > > > > > > > parts of Java, so please tell me is this knowledge of
> > > languages
> > > > > > > enough
> > > > > > > > > for
> > > > > > > > > > writing and optimizing codes
> > > > > > > > > > --
> > > > > > > > > >
> > > > > > > > > > *With Regards,*
> > > > > > > > > > *K.S.Sreenivasa Raghavan*
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>



-- 

*With Regards,*
*K.S.Sreenivasa Raghavan*

Re: Contribution

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
(1) Yes, making spark shell work with spark 1.3+ on 0.11-snapshot would be
an awesome help.
(2) I was thinking, if you are still into math problem, we have, in my
view, a problem in CholeskyDecomposition.

This needs a little research. This involves methods solveRight, solveLeft.
(2a) solveLeft claims to do forward substitution (which it does), and
solveRight claims to do back substitution, which it probably does too. But
in reality it solves a different problem it is supposed to. In classic
scheme of things, if AX=B is positive (semi)definite, and A=LL' Cholesky
decomposition, then forward substitution is supposed to solve LY=B for Y
and back substitution is supposed to solve L'X=Y, i.e. back substitution is
supposed to compute result of L'^-1Y. But current implementation does
something that can be shown to be essentially equivalent to solveLeft()
rather than solution for L'X=Y. This needs to be looked at more carefully

(2b) I also believe the whole names ofr solveLeft, solveRight are
misleading. In all other cases, solve() methods traditionally denote
solution of AX=B or XA=B for X. In Cholesky, neither of these methods
actually provides a solution for AX=B, but rather provides a part of the
solution. Therefore, i think, these methods should be renamed to something
like forwardSubs(), backSubs(), or better yet, name exactly what they are
doing, e.g. computeLtInvZ(mxZ:Matrix). more over, it is probably beneficial
to have solve methods that actually do compute full solution of Ax=b or xA
= b' by combining forward and back substitutions properly.

I hope some of this fits, it takes time to write this.

-Dmitriy

On Tue, Jun 16, 2015 at 4:17 AM, Rohit Shinde <ro...@gmail.com>
wrote:

> Okay, it seems that methodology is a bit too advanced for me. I would go
> with framework/engineering tasks. So should I start with fixing the mahout
> spark shell?
>
> On Tue, Jun 16, 2015 at 11:20 AM, Dmitriy Lyubimov <dl...@gmail.com>
> wrote:
>
> > As i said, in methodology you can pick _anything_ that you think has
> merit
> > and not yet in the roadmap or done.
> >
> > For example, do you feel like you might research PSVM or interior point
> > SVM? Actually, any flavor of non-linear SVM that is different from a
> simple
> > hinge loss?
> > Do you think you can fit it in our algebraic engine?
> >
> > I think we also need a fair amount of port of MR methods -- like
> seq2sparse
> > and cvb0 lda.
> >
> > i would still look at framework performance tasks, they are badly needed.
> > Just today listened about flyby matrix multiplication approach for spark
> > for medium-sized matrices which probably beats our since even though we
> do
> > not use cartesian (god forbid), our implementation is somewhat closer to
> > what the speaker described as "massively mapside join" -- which
> eventually,
> > according to him, is supposed to gain over flyby multiply, but there's a
> > fair amount of tasks when it is not .
> >
> > similarly bolting on hardware libraries for in-core operations is still a
> > big undecided issue.
> >
> > unfortunately a lot of known outstanding issues are still about
> > engineering.
> >
> >
> > On Mon, Jun 15, 2015 at 10:27 PM, Rohit Shinde <
> > rohit.shinde12194@gmail.com>
> > wrote:
> >
> > > I would prefer some methodology work if it falls within my
> capabilities.
> > If
> > > it doesn't then your suggestion is a good one and I'll take it up.
> > > Substantial according to me means a task where I can get quite familiar
> > > with as much of the code base as possible.
> > >
> > > On Tue, Jun 16, 2015 at 10:49 AM, Dmitriy Lyubimov <dl...@gmail.com>
> > > wrote:
> > >
> > > > I gave you 3 types of problems. Define substantial.
> > > >
> > > > Say, does fixing mahout spark shell sound substantial enough?
> > > >
> > > > On Mon, Jun 15, 2015 at 10:11 PM, Rohit Shinde <
> > > > rohit.shinde12194@gmail.com>
> > > > wrote:
> > > >
> > > > > So do you have any suggestions for getting started? I would like to
> > > > > contribute to something substantial that is going on, after getting
> > > > > familiar with the required part of the codebase.
> > > > >
> > > > > On Mon, Jun 15, 2015 at 11:39 PM, Dmitriy Lyubimov <
> > dlieu.7@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > i don't think there's a formal list published anywhere.
> > > > > >
> > > > > > There is an informal roadmap.
> > > > > >
> > > > > > The contributions are, the way i see it, mainly can be in 3
> areas:
> > > (1)
> > > > > > project support issues like for example fixing shell
> compatibility
> > > with
> > > > > > spark 1.3; (2) framework support problems like for example
> > > performance
> > > > > and
> > > > > > integrating 3rd party hardware accelerated linalg libraries; (3)
> > > > > > methodology work.
> > > > > >
> > > > > > We have some pending items for (1) and (2) i think but for
> > > methodology
> > > > > > items (3) we simply can't compile the list of everything that can
> > > > > possibly
> > > > > > be done and contriubted. We just don't have that much expertise,
> > > > > combined.
> > > > > > No one has [1]. The way it works is usually people would come up
> > with
> > > > > > pieces that they were missing on their own for some reason; and
> > they
> > > > need
> > > > > > to propose methodology, parallelization strategy, maybe even a
> code
> > > > > sketch
> > > > > > -- that all will be fine.
> > > > > >
> > > > > > [1] http://matt.might.net/articles/phd-school-in-pictures/
> > > > > >
> > > > > > On Sun, Jun 14, 2015 at 11:49 PM, Rohit Shinde <
> > > > > > rohit.shinde12194@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > But is there a list of projects that new people could take up?
> > > Even I
> > > > > am
> > > > > > a
> > > > > > > student interested in contributing to the machine learning and
> > data
> > > > > > mining
> > > > > > > parts of Apache Mahout.
> > > > > > >
> > > > > > > I am familiar with Scala and Java, Python and C++.
> > > > > > >
> > > > > > > What can I contribute to?
> > > > > > >
> > > > > > > On Mon, Jun 15, 2015 at 10:24 AM, Dmitriy Lyubimov <
> > > > dlieu.7@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Well we are predominantly Scala shop now. Being fluent in
> Scala
> > > > seems
> > > > > > > like
> > > > > > > > one prerequisite.
> > > > > > > >
> > > > > > > >
> > > > > > > > On Sat, Jun 13, 2015 at 1:17 AM, Sreenivas Raghavan <
> > > > > > > > sreenivas.raghavan7@gmail.com> wrote:
> > > > > > > >
> > > > > > > > > Hello everyone,
> > > > > > > > >                   I am interested in contributing to mahout
> > > > > project.
> > > > > > I
> > > > > > > am
> > > > > > > > > interested in algorithms, machine learning and linear
> > algebra.
> > > > > Please
> > > > > > > > give
> > > > > > > > > me some idea as where to start and how to start. I know
> > python
> > > > and
> > > > > > some
> > > > > > > > > parts of Java, so please tell me is this knowledge of
> > languages
> > > > > > enough
> > > > > > > > for
> > > > > > > > > writing and optimizing codes
> > > > > > > > > --
> > > > > > > > >
> > > > > > > > > *With Regards,*
> > > > > > > > > *K.S.Sreenivasa Raghavan*
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Contribution

Posted by Rohit Shinde <ro...@gmail.com>.
Okay, it seems that methodology is a bit too advanced for me. I would go
with framework/engineering tasks. So should I start with fixing the mahout
spark shell?

On Tue, Jun 16, 2015 at 11:20 AM, Dmitriy Lyubimov <dl...@gmail.com>
wrote:

> As i said, in methodology you can pick _anything_ that you think has merit
> and not yet in the roadmap or done.
>
> For example, do you feel like you might research PSVM or interior point
> SVM? Actually, any flavor of non-linear SVM that is different from a simple
> hinge loss?
> Do you think you can fit it in our algebraic engine?
>
> I think we also need a fair amount of port of MR methods -- like seq2sparse
> and cvb0 lda.
>
> i would still look at framework performance tasks, they are badly needed.
> Just today listened about flyby matrix multiplication approach for spark
> for medium-sized matrices which probably beats our since even though we do
> not use cartesian (god forbid), our implementation is somewhat closer to
> what the speaker described as "massively mapside join" -- which eventually,
> according to him, is supposed to gain over flyby multiply, but there's a
> fair amount of tasks when it is not .
>
> similarly bolting on hardware libraries for in-core operations is still a
> big undecided issue.
>
> unfortunately a lot of known outstanding issues are still about
> engineering.
>
>
> On Mon, Jun 15, 2015 at 10:27 PM, Rohit Shinde <
> rohit.shinde12194@gmail.com>
> wrote:
>
> > I would prefer some methodology work if it falls within my capabilities.
> If
> > it doesn't then your suggestion is a good one and I'll take it up.
> > Substantial according to me means a task where I can get quite familiar
> > with as much of the code base as possible.
> >
> > On Tue, Jun 16, 2015 at 10:49 AM, Dmitriy Lyubimov <dl...@gmail.com>
> > wrote:
> >
> > > I gave you 3 types of problems. Define substantial.
> > >
> > > Say, does fixing mahout spark shell sound substantial enough?
> > >
> > > On Mon, Jun 15, 2015 at 10:11 PM, Rohit Shinde <
> > > rohit.shinde12194@gmail.com>
> > > wrote:
> > >
> > > > So do you have any suggestions for getting started? I would like to
> > > > contribute to something substantial that is going on, after getting
> > > > familiar with the required part of the codebase.
> > > >
> > > > On Mon, Jun 15, 2015 at 11:39 PM, Dmitriy Lyubimov <
> dlieu.7@gmail.com>
> > > > wrote:
> > > >
> > > > > i don't think there's a formal list published anywhere.
> > > > >
> > > > > There is an informal roadmap.
> > > > >
> > > > > The contributions are, the way i see it, mainly can be in 3 areas:
> > (1)
> > > > > project support issues like for example fixing shell compatibility
> > with
> > > > > spark 1.3; (2) framework support problems like for example
> > performance
> > > > and
> > > > > integrating 3rd party hardware accelerated linalg libraries; (3)
> > > > > methodology work.
> > > > >
> > > > > We have some pending items for (1) and (2) i think but for
> > methodology
> > > > > items (3) we simply can't compile the list of everything that can
> > > > possibly
> > > > > be done and contriubted. We just don't have that much expertise,
> > > > combined.
> > > > > No one has [1]. The way it works is usually people would come up
> with
> > > > > pieces that they were missing on their own for some reason; and
> they
> > > need
> > > > > to propose methodology, parallelization strategy, maybe even a code
> > > > sketch
> > > > > -- that all will be fine.
> > > > >
> > > > > [1] http://matt.might.net/articles/phd-school-in-pictures/
> > > > >
> > > > > On Sun, Jun 14, 2015 at 11:49 PM, Rohit Shinde <
> > > > > rohit.shinde12194@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > But is there a list of projects that new people could take up?
> > Even I
> > > > am
> > > > > a
> > > > > > student interested in contributing to the machine learning and
> data
> > > > > mining
> > > > > > parts of Apache Mahout.
> > > > > >
> > > > > > I am familiar with Scala and Java, Python and C++.
> > > > > >
> > > > > > What can I contribute to?
> > > > > >
> > > > > > On Mon, Jun 15, 2015 at 10:24 AM, Dmitriy Lyubimov <
> > > dlieu.7@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Well we are predominantly Scala shop now. Being fluent in Scala
> > > seems
> > > > > > like
> > > > > > > one prerequisite.
> > > > > > >
> > > > > > >
> > > > > > > On Sat, Jun 13, 2015 at 1:17 AM, Sreenivas Raghavan <
> > > > > > > sreenivas.raghavan7@gmail.com> wrote:
> > > > > > >
> > > > > > > > Hello everyone,
> > > > > > > >                   I am interested in contributing to mahout
> > > > project.
> > > > > I
> > > > > > am
> > > > > > > > interested in algorithms, machine learning and linear
> algebra.
> > > > Please
> > > > > > > give
> > > > > > > > me some idea as where to start and how to start. I know
> python
> > > and
> > > > > some
> > > > > > > > parts of Java, so please tell me is this knowledge of
> languages
> > > > > enough
> > > > > > > for
> > > > > > > > writing and optimizing codes
> > > > > > > > --
> > > > > > > >
> > > > > > > > *With Regards,*
> > > > > > > > *K.S.Sreenivasa Raghavan*
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Contribution

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
As i said, in methodology you can pick _anything_ that you think has merit
and not yet in the roadmap or done.

For example, do you feel like you might research PSVM or interior point
SVM? Actually, any flavor of non-linear SVM that is different from a simple
hinge loss?
Do you think you can fit it in our algebraic engine?

I think we also need a fair amount of port of MR methods -- like seq2sparse
and cvb0 lda.

i would still look at framework performance tasks, they are badly needed.
Just today listened about flyby matrix multiplication approach for spark
for medium-sized matrices which probably beats our since even though we do
not use cartesian (god forbid), our implementation is somewhat closer to
what the speaker described as "massively mapside join" -- which eventually,
according to him, is supposed to gain over flyby multiply, but there's a
fair amount of tasks when it is not .

similarly bolting on hardware libraries for in-core operations is still a
big undecided issue.

unfortunately a lot of known outstanding issues are still about engineering.


On Mon, Jun 15, 2015 at 10:27 PM, Rohit Shinde <ro...@gmail.com>
wrote:

> I would prefer some methodology work if it falls within my capabilities. If
> it doesn't then your suggestion is a good one and I'll take it up.
> Substantial according to me means a task where I can get quite familiar
> with as much of the code base as possible.
>
> On Tue, Jun 16, 2015 at 10:49 AM, Dmitriy Lyubimov <dl...@gmail.com>
> wrote:
>
> > I gave you 3 types of problems. Define substantial.
> >
> > Say, does fixing mahout spark shell sound substantial enough?
> >
> > On Mon, Jun 15, 2015 at 10:11 PM, Rohit Shinde <
> > rohit.shinde12194@gmail.com>
> > wrote:
> >
> > > So do you have any suggestions for getting started? I would like to
> > > contribute to something substantial that is going on, after getting
> > > familiar with the required part of the codebase.
> > >
> > > On Mon, Jun 15, 2015 at 11:39 PM, Dmitriy Lyubimov <dl...@gmail.com>
> > > wrote:
> > >
> > > > i don't think there's a formal list published anywhere.
> > > >
> > > > There is an informal roadmap.
> > > >
> > > > The contributions are, the way i see it, mainly can be in 3 areas:
> (1)
> > > > project support issues like for example fixing shell compatibility
> with
> > > > spark 1.3; (2) framework support problems like for example
> performance
> > > and
> > > > integrating 3rd party hardware accelerated linalg libraries; (3)
> > > > methodology work.
> > > >
> > > > We have some pending items for (1) and (2) i think but for
> methodology
> > > > items (3) we simply can't compile the list of everything that can
> > > possibly
> > > > be done and contriubted. We just don't have that much expertise,
> > > combined.
> > > > No one has [1]. The way it works is usually people would come up with
> > > > pieces that they were missing on their own for some reason; and they
> > need
> > > > to propose methodology, parallelization strategy, maybe even a code
> > > sketch
> > > > -- that all will be fine.
> > > >
> > > > [1] http://matt.might.net/articles/phd-school-in-pictures/
> > > >
> > > > On Sun, Jun 14, 2015 at 11:49 PM, Rohit Shinde <
> > > > rohit.shinde12194@gmail.com>
> > > > wrote:
> > > >
> > > > > But is there a list of projects that new people could take up?
> Even I
> > > am
> > > > a
> > > > > student interested in contributing to the machine learning and data
> > > > mining
> > > > > parts of Apache Mahout.
> > > > >
> > > > > I am familiar with Scala and Java, Python and C++.
> > > > >
> > > > > What can I contribute to?
> > > > >
> > > > > On Mon, Jun 15, 2015 at 10:24 AM, Dmitriy Lyubimov <
> > dlieu.7@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Well we are predominantly Scala shop now. Being fluent in Scala
> > seems
> > > > > like
> > > > > > one prerequisite.
> > > > > >
> > > > > >
> > > > > > On Sat, Jun 13, 2015 at 1:17 AM, Sreenivas Raghavan <
> > > > > > sreenivas.raghavan7@gmail.com> wrote:
> > > > > >
> > > > > > > Hello everyone,
> > > > > > >                   I am interested in contributing to mahout
> > > project.
> > > > I
> > > > > am
> > > > > > > interested in algorithms, machine learning and linear algebra.
> > > Please
> > > > > > give
> > > > > > > me some idea as where to start and how to start. I know python
> > and
> > > > some
> > > > > > > parts of Java, so please tell me is this knowledge of languages
> > > > enough
> > > > > > for
> > > > > > > writing and optimizing codes
> > > > > > > --
> > > > > > >
> > > > > > > *With Regards,*
> > > > > > > *K.S.Sreenivasa Raghavan*
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Contribution

Posted by Rohit Shinde <ro...@gmail.com>.
I would prefer some methodology work if it falls within my capabilities. If
it doesn't then your suggestion is a good one and I'll take it up.
Substantial according to me means a task where I can get quite familiar
with as much of the code base as possible.

On Tue, Jun 16, 2015 at 10:49 AM, Dmitriy Lyubimov <dl...@gmail.com>
wrote:

> I gave you 3 types of problems. Define substantial.
>
> Say, does fixing mahout spark shell sound substantial enough?
>
> On Mon, Jun 15, 2015 at 10:11 PM, Rohit Shinde <
> rohit.shinde12194@gmail.com>
> wrote:
>
> > So do you have any suggestions for getting started? I would like to
> > contribute to something substantial that is going on, after getting
> > familiar with the required part of the codebase.
> >
> > On Mon, Jun 15, 2015 at 11:39 PM, Dmitriy Lyubimov <dl...@gmail.com>
> > wrote:
> >
> > > i don't think there's a formal list published anywhere.
> > >
> > > There is an informal roadmap.
> > >
> > > The contributions are, the way i see it, mainly can be in 3 areas: (1)
> > > project support issues like for example fixing shell compatibility with
> > > spark 1.3; (2) framework support problems like for example performance
> > and
> > > integrating 3rd party hardware accelerated linalg libraries; (3)
> > > methodology work.
> > >
> > > We have some pending items for (1) and (2) i think but for methodology
> > > items (3) we simply can't compile the list of everything that can
> > possibly
> > > be done and contriubted. We just don't have that much expertise,
> > combined.
> > > No one has [1]. The way it works is usually people would come up with
> > > pieces that they were missing on their own for some reason; and they
> need
> > > to propose methodology, parallelization strategy, maybe even a code
> > sketch
> > > -- that all will be fine.
> > >
> > > [1] http://matt.might.net/articles/phd-school-in-pictures/
> > >
> > > On Sun, Jun 14, 2015 at 11:49 PM, Rohit Shinde <
> > > rohit.shinde12194@gmail.com>
> > > wrote:
> > >
> > > > But is there a list of projects that new people could take up? Even I
> > am
> > > a
> > > > student interested in contributing to the machine learning and data
> > > mining
> > > > parts of Apache Mahout.
> > > >
> > > > I am familiar with Scala and Java, Python and C++.
> > > >
> > > > What can I contribute to?
> > > >
> > > > On Mon, Jun 15, 2015 at 10:24 AM, Dmitriy Lyubimov <
> dlieu.7@gmail.com>
> > > > wrote:
> > > >
> > > > > Well we are predominantly Scala shop now. Being fluent in Scala
> seems
> > > > like
> > > > > one prerequisite.
> > > > >
> > > > >
> > > > > On Sat, Jun 13, 2015 at 1:17 AM, Sreenivas Raghavan <
> > > > > sreenivas.raghavan7@gmail.com> wrote:
> > > > >
> > > > > > Hello everyone,
> > > > > >                   I am interested in contributing to mahout
> > project.
> > > I
> > > > am
> > > > > > interested in algorithms, machine learning and linear algebra.
> > Please
> > > > > give
> > > > > > me some idea as where to start and how to start. I know python
> and
> > > some
> > > > > > parts of Java, so please tell me is this knowledge of languages
> > > enough
> > > > > for
> > > > > > writing and optimizing codes
> > > > > > --
> > > > > >
> > > > > > *With Regards,*
> > > > > > *K.S.Sreenivasa Raghavan*
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Contribution

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
I gave you 3 types of problems. Define substantial.

Say, does fixing mahout spark shell sound substantial enough?

On Mon, Jun 15, 2015 at 10:11 PM, Rohit Shinde <ro...@gmail.com>
wrote:

> So do you have any suggestions for getting started? I would like to
> contribute to something substantial that is going on, after getting
> familiar with the required part of the codebase.
>
> On Mon, Jun 15, 2015 at 11:39 PM, Dmitriy Lyubimov <dl...@gmail.com>
> wrote:
>
> > i don't think there's a formal list published anywhere.
> >
> > There is an informal roadmap.
> >
> > The contributions are, the way i see it, mainly can be in 3 areas: (1)
> > project support issues like for example fixing shell compatibility with
> > spark 1.3; (2) framework support problems like for example performance
> and
> > integrating 3rd party hardware accelerated linalg libraries; (3)
> > methodology work.
> >
> > We have some pending items for (1) and (2) i think but for methodology
> > items (3) we simply can't compile the list of everything that can
> possibly
> > be done and contriubted. We just don't have that much expertise,
> combined.
> > No one has [1]. The way it works is usually people would come up with
> > pieces that they were missing on their own for some reason; and they need
> > to propose methodology, parallelization strategy, maybe even a code
> sketch
> > -- that all will be fine.
> >
> > [1] http://matt.might.net/articles/phd-school-in-pictures/
> >
> > On Sun, Jun 14, 2015 at 11:49 PM, Rohit Shinde <
> > rohit.shinde12194@gmail.com>
> > wrote:
> >
> > > But is there a list of projects that new people could take up? Even I
> am
> > a
> > > student interested in contributing to the machine learning and data
> > mining
> > > parts of Apache Mahout.
> > >
> > > I am familiar with Scala and Java, Python and C++.
> > >
> > > What can I contribute to?
> > >
> > > On Mon, Jun 15, 2015 at 10:24 AM, Dmitriy Lyubimov <dl...@gmail.com>
> > > wrote:
> > >
> > > > Well we are predominantly Scala shop now. Being fluent in Scala seems
> > > like
> > > > one prerequisite.
> > > >
> > > >
> > > > On Sat, Jun 13, 2015 at 1:17 AM, Sreenivas Raghavan <
> > > > sreenivas.raghavan7@gmail.com> wrote:
> > > >
> > > > > Hello everyone,
> > > > >                   I am interested in contributing to mahout
> project.
> > I
> > > am
> > > > > interested in algorithms, machine learning and linear algebra.
> Please
> > > > give
> > > > > me some idea as where to start and how to start. I know python and
> > some
> > > > > parts of Java, so please tell me is this knowledge of languages
> > enough
> > > > for
> > > > > writing and optimizing codes
> > > > > --
> > > > >
> > > > > *With Regards,*
> > > > > *K.S.Sreenivasa Raghavan*
> > > > >
> > > >
> > >
> >
>

Re: Contribution

Posted by Rohit Shinde <ro...@gmail.com>.
So do you have any suggestions for getting started? I would like to
contribute to something substantial that is going on, after getting
familiar with the required part of the codebase.

On Mon, Jun 15, 2015 at 11:39 PM, Dmitriy Lyubimov <dl...@gmail.com>
wrote:

> i don't think there's a formal list published anywhere.
>
> There is an informal roadmap.
>
> The contributions are, the way i see it, mainly can be in 3 areas: (1)
> project support issues like for example fixing shell compatibility with
> spark 1.3; (2) framework support problems like for example performance and
> integrating 3rd party hardware accelerated linalg libraries; (3)
> methodology work.
>
> We have some pending items for (1) and (2) i think but for methodology
> items (3) we simply can't compile the list of everything that can possibly
> be done and contriubted. We just don't have that much expertise, combined.
> No one has [1]. The way it works is usually people would come up with
> pieces that they were missing on their own for some reason; and they need
> to propose methodology, parallelization strategy, maybe even a code sketch
> -- that all will be fine.
>
> [1] http://matt.might.net/articles/phd-school-in-pictures/
>
> On Sun, Jun 14, 2015 at 11:49 PM, Rohit Shinde <
> rohit.shinde12194@gmail.com>
> wrote:
>
> > But is there a list of projects that new people could take up? Even I am
> a
> > student interested in contributing to the machine learning and data
> mining
> > parts of Apache Mahout.
> >
> > I am familiar with Scala and Java, Python and C++.
> >
> > What can I contribute to?
> >
> > On Mon, Jun 15, 2015 at 10:24 AM, Dmitriy Lyubimov <dl...@gmail.com>
> > wrote:
> >
> > > Well we are predominantly Scala shop now. Being fluent in Scala seems
> > like
> > > one prerequisite.
> > >
> > >
> > > On Sat, Jun 13, 2015 at 1:17 AM, Sreenivas Raghavan <
> > > sreenivas.raghavan7@gmail.com> wrote:
> > >
> > > > Hello everyone,
> > > >                   I am interested in contributing to mahout project.
> I
> > am
> > > > interested in algorithms, machine learning and linear algebra. Please
> > > give
> > > > me some idea as where to start and how to start. I know python and
> some
> > > > parts of Java, so please tell me is this knowledge of languages
> enough
> > > for
> > > > writing and optimizing codes
> > > > --
> > > >
> > > > *With Regards,*
> > > > *K.S.Sreenivasa Raghavan*
> > > >
> > >
> >
>

Re: Contribution

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
i don't think there's a formal list published anywhere.

There is an informal roadmap.

The contributions are, the way i see it, mainly can be in 3 areas: (1)
project support issues like for example fixing shell compatibility with
spark 1.3; (2) framework support problems like for example performance and
integrating 3rd party hardware accelerated linalg libraries; (3)
methodology work.

We have some pending items for (1) and (2) i think but for methodology
items (3) we simply can't compile the list of everything that can possibly
be done and contriubted. We just don't have that much expertise, combined.
No one has [1]. The way it works is usually people would come up with
pieces that they were missing on their own for some reason; and they need
to propose methodology, parallelization strategy, maybe even a code sketch
-- that all will be fine.

[1] http://matt.might.net/articles/phd-school-in-pictures/

On Sun, Jun 14, 2015 at 11:49 PM, Rohit Shinde <ro...@gmail.com>
wrote:

> But is there a list of projects that new people could take up? Even I am a
> student interested in contributing to the machine learning and data mining
> parts of Apache Mahout.
>
> I am familiar with Scala and Java, Python and C++.
>
> What can I contribute to?
>
> On Mon, Jun 15, 2015 at 10:24 AM, Dmitriy Lyubimov <dl...@gmail.com>
> wrote:
>
> > Well we are predominantly Scala shop now. Being fluent in Scala seems
> like
> > one prerequisite.
> >
> >
> > On Sat, Jun 13, 2015 at 1:17 AM, Sreenivas Raghavan <
> > sreenivas.raghavan7@gmail.com> wrote:
> >
> > > Hello everyone,
> > >                   I am interested in contributing to mahout project. I
> am
> > > interested in algorithms, machine learning and linear algebra. Please
> > give
> > > me some idea as where to start and how to start. I know python and some
> > > parts of Java, so please tell me is this knowledge of languages enough
> > for
> > > writing and optimizing codes
> > > --
> > >
> > > *With Regards,*
> > > *K.S.Sreenivasa Raghavan*
> > >
> >
>

Re: Contribution

Posted by Rohit Shinde <ro...@gmail.com>.
But is there a list of projects that new people could take up? Even I am a
student interested in contributing to the machine learning and data mining
parts of Apache Mahout.

I am familiar with Scala and Java, Python and C++.

What can I contribute to?

On Mon, Jun 15, 2015 at 10:24 AM, Dmitriy Lyubimov <dl...@gmail.com>
wrote:

> Well we are predominantly Scala shop now. Being fluent in Scala seems like
> one prerequisite.
>
>
> On Sat, Jun 13, 2015 at 1:17 AM, Sreenivas Raghavan <
> sreenivas.raghavan7@gmail.com> wrote:
>
> > Hello everyone,
> >                   I am interested in contributing to mahout project. I am
> > interested in algorithms, machine learning and linear algebra. Please
> give
> > me some idea as where to start and how to start. I know python and some
> > parts of Java, so please tell me is this knowledge of languages enough
> for
> > writing and optimizing codes
> > --
> >
> > *With Regards,*
> > *K.S.Sreenivasa Raghavan*
> >
>

Re: Contribution

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
Well we are predominantly Scala shop now. Being fluent in Scala seems like
one prerequisite.


On Sat, Jun 13, 2015 at 1:17 AM, Sreenivas Raghavan <
sreenivas.raghavan7@gmail.com> wrote:

> Hello everyone,
>                   I am interested in contributing to mahout project. I am
> interested in algorithms, machine learning and linear algebra. Please give
> me some idea as where to start and how to start. I know python and some
> parts of Java, so please tell me is this knowledge of languages enough for
> writing and optimizing codes
> --
>
> *With Regards,*
> *K.S.Sreenivasa Raghavan*
>