You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Олександр Ольгашко <al...@gmail.com> on 2013/11/26 18:19:16 UTC

A theme to work

Hello,

I am a student, interested in data analysis, also i have chosen this theme
for my diploma work. As mentioned here
https://cwiki.apache.org/confluence/display/MAHOUT/Algorithms, there are
some open algorithms, for example, in Dimension reduction section.

So, how can i start develop them? I have some theoretical background, but i
think, there may be some unknown problems. Mb somebody is working on these
algorithms. Can you give some tips for start?

I searched in JIRA for Independent Component Analysis, found nothing.

Thanks in advance.

Re: A theme to work

Posted by Oleksandr Olgashko <al...@gmail.com>.
Forgot to ask in prev message: are there any open problems/tasks in
recommendation algorithms?


2013/11/27 Oleksandr Olgashko <al...@gmail.com>

> Could you please formalize reqs for ICA? I mean, what actually should be
> done.
> Parallelization strategy is a bit general concept.
>
>
> 2013/11/26 Dmitriy Lyubimov <dl...@gmail.com>
>
>> On Tue, Nov 26, 2013 at 1:11 PM, Олександр Ольгашко <
>> alexandrolgash@gmail.com> wrote:
>>
>> > I may need unknown period of time to get familiar with Mahout project
>> > structure.
>> > I'd like to make some research about ICA's parallelization strategy, it
>> is
>> > quite interesting.
>> > Not sure, if i can help somehow with MAHOUT-1346, never worked with such
>> > things before.
>> >
>> > Should i use mail lists or smth else for arising questions and other
>> > communication?
>> >
>> yes. there's probably no better place as far as Mahout is concerned.
>>
>> >
>> >
>> > 2013/11/26 Dmitriy Lyubimov <dl...@gmail.com>
>> >
>> > > Dimension reduction is addressed with PCA which is an option of SSVD
>> > > method.
>> > > However, if you can research/offer parallelization strategy for ICA,
>> i'd
>> > be
>> > > all ears.
>> > >
>> > > there's also ongoing push to create a DSL environment for mahout
>> > > distributed matrices to Spark which i personally think is one of the
>> most
>> > > promising recent developments. It is also an treasure chest (or a can
>> of
>> > > worms depending on how you view it) for new people to chime in. DSL
>> > > environment issue is MAHOUT-1346, with everything else pretty much
>> > > dependent on it
>> > >
>> > > -d
>> > >
>> > >
>> > >
>> > >
>> > > On Tue, Nov 26, 2013 at 9:19 AM, Олександр Ольгашко <
>> > > alexandrolgash@gmail.com> wrote:
>> > >
>> > > > Hello,
>> > > >
>> > > > I am a student, interested in data analysis, also i have chosen this
>> > > theme
>> > > > for my diploma work. As mentioned here
>> > > > https://cwiki.apache.org/confluence/display/MAHOUT/Algorithms,
>> there
>> > are
>> > > > some open algorithms, for example, in Dimension reduction section.
>> > > >
>> > > > So, how can i start develop them? I have some theoretical
>> background,
>> > > but i
>> > > > think, there may be some unknown problems. Mb somebody is working on
>> > > these
>> > > > algorithms. Can you give some tips for start?
>> > > >
>> > > > I searched in JIRA for Independent Component Analysis, found
>> nothing.
>> > > >
>> > > > Thanks in advance.
>> > > >
>> > >
>> >
>>
>
>

Re: A theme to work

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
On Wed, Nov 27, 2013 at 10:17 AM, Dmitriy Lyubimov <dl...@gmail.com>wrote:

>
>
>
> On Wed, Nov 27, 2013 at 9:09 AM, Oleksandr Olgashko <
> alexandrolgash@gmail.com> wrote:
>
>> Could you please formalize reqs for ICA? I mean, what actually should be
>> done.
>> Parallelization strategy is a bit general concept.
>>
>
> No, it is not really. Not general enough so that you couldn't do it on
> your own.
>
> You can think of it as a fairly free-style TDD for how to do  things on MR
> or Pregel so the majority of reviewers here could understand.
>

I guess i need to be a bit more specific: Hadoop MR or Spark/Bagel apis .
we don't really pull in any other frameworks at the moment.


> Not ideal example but hope it helps --look at the attachment for
> https://issues.apache.org/jira/browse/MAHOUT-1365
>
> -d
>
>
>>
>> 2013/11/26 Dmitriy Lyubimov <dl...@gmail.com>
>>
>> > On Tue, Nov 26, 2013 at 1:11 PM, Олександр Ольгашко <
>> > alexandrolgash@gmail.com> wrote:
>> >
>> > > I may need unknown period of time to get familiar with Mahout project
>> > > structure.
>> > > I'd like to make some research about ICA's parallelization strategy,
>> it
>> > is
>> > > quite interesting.
>> > > Not sure, if i can help somehow with MAHOUT-1346, never worked with
>> such
>> > > things before.
>> > >
>> > > Should i use mail lists or smth else for arising questions and other
>> > > communication?
>> > >
>> > yes. there's probably no better place as far as Mahout is concerned.
>> >
>> > >
>> > >
>> > > 2013/11/26 Dmitriy Lyubimov <dl...@gmail.com>
>> > >
>> > > > Dimension reduction is addressed with PCA which is an option of SSVD
>> > > > method.
>> > > > However, if you can research/offer parallelization strategy for ICA,
>> > i'd
>> > > be
>> > > > all ears.
>> > > >
>> > > > there's also ongoing push to create a DSL environment for mahout
>> > > > distributed matrices to Spark which i personally think is one of the
>> > most
>> > > > promising recent developments. It is also an treasure chest (or a
>> can
>> > of
>> > > > worms depending on how you view it) for new people to chime in. DSL
>> > > > environment issue is MAHOUT-1346, with everything else pretty much
>> > > > dependent on it
>> > > >
>> > > > -d
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > On Tue, Nov 26, 2013 at 9:19 AM, Олександр Ольгашко <
>> > > > alexandrolgash@gmail.com> wrote:
>> > > >
>> > > > > Hello,
>> > > > >
>> > > > > I am a student, interested in data analysis, also i have chosen
>> this
>> > > > theme
>> > > > > for my diploma work. As mentioned here
>> > > > > https://cwiki.apache.org/confluence/display/MAHOUT/Algorithms,
>> there
>> > > are
>> > > > > some open algorithms, for example, in Dimension reduction section.
>> > > > >
>> > > > > So, how can i start develop them? I have some theoretical
>> background,
>> > > > but i
>> > > > > think, there may be some unknown problems. Mb somebody is working
>> on
>> > > > these
>> > > > > algorithms. Can you give some tips for start?
>> > > > >
>> > > > > I searched in JIRA for Independent Component Analysis, found
>> nothing.
>> > > > >
>> > > > > Thanks in advance.
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: A theme to work

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
On Wed, Nov 27, 2013 at 9:09 AM, Oleksandr Olgashko <
alexandrolgash@gmail.com> wrote:

> Could you please formalize reqs for ICA? I mean, what actually should be
> done.
> Parallelization strategy is a bit general concept.
>

No, it is not really. Not general enough so that you couldn't do it on your
own.

You can think of it as a fairly free-style TDD for how to do  things on MR
or Pregel so the majority of reviewers here could understand.

Not ideal example but hope it helps --look at the attachment for
https://issues.apache.org/jira/browse/MAHOUT-1365

-d


>
> 2013/11/26 Dmitriy Lyubimov <dl...@gmail.com>
>
> > On Tue, Nov 26, 2013 at 1:11 PM, Олександр Ольгашко <
> > alexandrolgash@gmail.com> wrote:
> >
> > > I may need unknown period of time to get familiar with Mahout project
> > > structure.
> > > I'd like to make some research about ICA's parallelization strategy, it
> > is
> > > quite interesting.
> > > Not sure, if i can help somehow with MAHOUT-1346, never worked with
> such
> > > things before.
> > >
> > > Should i use mail lists or smth else for arising questions and other
> > > communication?
> > >
> > yes. there's probably no better place as far as Mahout is concerned.
> >
> > >
> > >
> > > 2013/11/26 Dmitriy Lyubimov <dl...@gmail.com>
> > >
> > > > Dimension reduction is addressed with PCA which is an option of SSVD
> > > > method.
> > > > However, if you can research/offer parallelization strategy for ICA,
> > i'd
> > > be
> > > > all ears.
> > > >
> > > > there's also ongoing push to create a DSL environment for mahout
> > > > distributed matrices to Spark which i personally think is one of the
> > most
> > > > promising recent developments. It is also an treasure chest (or a can
> > of
> > > > worms depending on how you view it) for new people to chime in. DSL
> > > > environment issue is MAHOUT-1346, with everything else pretty much
> > > > dependent on it
> > > >
> > > > -d
> > > >
> > > >
> > > >
> > > >
> > > > On Tue, Nov 26, 2013 at 9:19 AM, Олександр Ольгашко <
> > > > alexandrolgash@gmail.com> wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > I am a student, interested in data analysis, also i have chosen
> this
> > > > theme
> > > > > for my diploma work. As mentioned here
> > > > > https://cwiki.apache.org/confluence/display/MAHOUT/Algorithms,
> there
> > > are
> > > > > some open algorithms, for example, in Dimension reduction section.
> > > > >
> > > > > So, how can i start develop them? I have some theoretical
> background,
> > > > but i
> > > > > think, there may be some unknown problems. Mb somebody is working
> on
> > > > these
> > > > > algorithms. Can you give some tips for start?
> > > > >
> > > > > I searched in JIRA for Independent Component Analysis, found
> nothing.
> > > > >
> > > > > Thanks in advance.
> > > > >
> > > >
> > >
> >
>

Re: A theme to work

Posted by Oleksandr Olgashko <al...@gmail.com>.
Could you please formalize reqs for ICA? I mean, what actually should be
done.
Parallelization strategy is a bit general concept.


2013/11/26 Dmitriy Lyubimov <dl...@gmail.com>

> On Tue, Nov 26, 2013 at 1:11 PM, Олександр Ольгашко <
> alexandrolgash@gmail.com> wrote:
>
> > I may need unknown period of time to get familiar with Mahout project
> > structure.
> > I'd like to make some research about ICA's parallelization strategy, it
> is
> > quite interesting.
> > Not sure, if i can help somehow with MAHOUT-1346, never worked with such
> > things before.
> >
> > Should i use mail lists or smth else for arising questions and other
> > communication?
> >
> yes. there's probably no better place as far as Mahout is concerned.
>
> >
> >
> > 2013/11/26 Dmitriy Lyubimov <dl...@gmail.com>
> >
> > > Dimension reduction is addressed with PCA which is an option of SSVD
> > > method.
> > > However, if you can research/offer parallelization strategy for ICA,
> i'd
> > be
> > > all ears.
> > >
> > > there's also ongoing push to create a DSL environment for mahout
> > > distributed matrices to Spark which i personally think is one of the
> most
> > > promising recent developments. It is also an treasure chest (or a can
> of
> > > worms depending on how you view it) for new people to chime in. DSL
> > > environment issue is MAHOUT-1346, with everything else pretty much
> > > dependent on it
> > >
> > > -d
> > >
> > >
> > >
> > >
> > > On Tue, Nov 26, 2013 at 9:19 AM, Олександр Ольгашко <
> > > alexandrolgash@gmail.com> wrote:
> > >
> > > > Hello,
> > > >
> > > > I am a student, interested in data analysis, also i have chosen this
> > > theme
> > > > for my diploma work. As mentioned here
> > > > https://cwiki.apache.org/confluence/display/MAHOUT/Algorithms, there
> > are
> > > > some open algorithms, for example, in Dimension reduction section.
> > > >
> > > > So, how can i start develop them? I have some theoretical background,
> > > but i
> > > > think, there may be some unknown problems. Mb somebody is working on
> > > these
> > > > algorithms. Can you give some tips for start?
> > > >
> > > > I searched in JIRA for Independent Component Analysis, found nothing.
> > > >
> > > > Thanks in advance.
> > > >
> > >
> >
>

Re: A theme to work

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
On Tue, Nov 26, 2013 at 1:11 PM, Олександр Ольгашко <
alexandrolgash@gmail.com> wrote:

> I may need unknown period of time to get familiar with Mahout project
> structure.
> I'd like to make some research about ICA's parallelization strategy, it is
> quite interesting.
> Not sure, if i can help somehow with MAHOUT-1346, never worked with such
> things before.
>
> Should i use mail lists or smth else for arising questions and other
> communication?
>
yes. there's probably no better place as far as Mahout is concerned.

>
>
> 2013/11/26 Dmitriy Lyubimov <dl...@gmail.com>
>
> > Dimension reduction is addressed with PCA which is an option of SSVD
> > method.
> > However, if you can research/offer parallelization strategy for ICA, i'd
> be
> > all ears.
> >
> > there's also ongoing push to create a DSL environment for mahout
> > distributed matrices to Spark which i personally think is one of the most
> > promising recent developments. It is also an treasure chest (or a can of
> > worms depending on how you view it) for new people to chime in. DSL
> > environment issue is MAHOUT-1346, with everything else pretty much
> > dependent on it
> >
> > -d
> >
> >
> >
> >
> > On Tue, Nov 26, 2013 at 9:19 AM, Олександр Ольгашко <
> > alexandrolgash@gmail.com> wrote:
> >
> > > Hello,
> > >
> > > I am a student, interested in data analysis, also i have chosen this
> > theme
> > > for my diploma work. As mentioned here
> > > https://cwiki.apache.org/confluence/display/MAHOUT/Algorithms, there
> are
> > > some open algorithms, for example, in Dimension reduction section.
> > >
> > > So, how can i start develop them? I have some theoretical background,
> > but i
> > > think, there may be some unknown problems. Mb somebody is working on
> > these
> > > algorithms. Can you give some tips for start?
> > >
> > > I searched in JIRA for Independent Component Analysis, found nothing.
> > >
> > > Thanks in advance.
> > >
> >
>

Re: A theme to work

Posted by Олександр Ольгашко <al...@gmail.com>.
I may need unknown period of time to get familiar with Mahout project
structure.
I'd like to make some research about ICA's parallelization strategy, it is
quite interesting.
Not sure, if i can help somehow with MAHOUT-1346, never worked with such
things before.

Should i use mail lists or smth else for arising questions and other
communication?


2013/11/26 Dmitriy Lyubimov <dl...@gmail.com>

> Dimension reduction is addressed with PCA which is an option of SSVD
> method.
> However, if you can research/offer parallelization strategy for ICA, i'd be
> all ears.
>
> there's also ongoing push to create a DSL environment for mahout
> distributed matrices to Spark which i personally think is one of the most
> promising recent developments. It is also an treasure chest (or a can of
> worms depending on how you view it) for new people to chime in. DSL
> environment issue is MAHOUT-1346, with everything else pretty much
> dependent on it
>
> -d
>
>
>
>
> On Tue, Nov 26, 2013 at 9:19 AM, Олександр Ольгашко <
> alexandrolgash@gmail.com> wrote:
>
> > Hello,
> >
> > I am a student, interested in data analysis, also i have chosen this
> theme
> > for my diploma work. As mentioned here
> > https://cwiki.apache.org/confluence/display/MAHOUT/Algorithms, there are
> > some open algorithms, for example, in Dimension reduction section.
> >
> > So, how can i start develop them? I have some theoretical background,
> but i
> > think, there may be some unknown problems. Mb somebody is working on
> these
> > algorithms. Can you give some tips for start?
> >
> > I searched in JIRA for Independent Component Analysis, found nothing.
> >
> > Thanks in advance.
> >
>

Re: A theme to work

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
Dimension reduction is addressed with PCA which is an option of SSVD
method.
However, if you can research/offer parallelization strategy for ICA, i'd be
all ears.

there's also ongoing push to create a DSL environment for mahout
distributed matrices to Spark which i personally think is one of the most
promising recent developments. It is also an treasure chest (or a can of
worms depending on how you view it) for new people to chime in. DSL
environment issue is MAHOUT-1346, with everything else pretty much
dependent on it

-d




On Tue, Nov 26, 2013 at 9:19 AM, Олександр Ольгашко <
alexandrolgash@gmail.com> wrote:

> Hello,
>
> I am a student, interested in data analysis, also i have chosen this theme
> for my diploma work. As mentioned here
> https://cwiki.apache.org/confluence/display/MAHOUT/Algorithms, there are
> some open algorithms, for example, in Dimension reduction section.
>
> So, how can i start develop them? I have some theoretical background, but i
> think, there may be some unknown problems. Mb somebody is working on these
> algorithms. Can you give some tips for start?
>
> I searched in JIRA for Independent Component Analysis, found nothing.
>
> Thanks in advance.
>