You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hama.apache.org by "Edward J. Yoon" <ed...@apache.org> on 2015/06/03 09:48:25 UTC

[DISCUSS] Things I'd like to focus on next

Hey,

Here's few things I'd like to focus on next.

1. Add stream input format for listening messages coming from 3rd
party applications, and incremental learning algorithms.
2. Improve reliability of system e.g., fault tolerance, HA, ..., etc.
3. More machine learning algorithms, such as ensemble classifier, SVM,
DNN, ..., etc

Do you have any other suggestions?

Thanks!

-- 
Best Regards, Edward J. Yoon

Re: [DISCUSS] Things I'd like to focus on next

Posted by "Edward J. Yoon" <ed...@apache.org>.

+1

On Thursday, June 4, 2015, Júlio Pires <ju...@gmail.com> wrote:

> Hi,
>
> I would add two suggestions:
>
> 1) Load Balancing [a] and [b];
> 2) Improve the documentation.
>
> a)
> https://thegraphsblog.files.wordpress.com/2012/11/eurosys13_197_mizan.pdf
> b) http://arxiv.org/pdf/1309.1049.pdf
>
> Thanks!
>
> 2015-06-03 4:48 GMT-03:00 Edward J. Yoon <edwardyoon@apache.org
> <javascript:;>>:
>
> > Hey,
> >
> > Here's few things I'd like to focus on next.
> >
> > 1. Add stream input format for listening messages coming from 3rd
> > party applications, and incremental learning algorithms.
> > 2. Improve reliability of system e.g., fault tolerance, HA, ..., etc.
> > 3. More machine learning algorithms, such as ensemble classifier, SVM,
> > DNN, ..., etc
> >
> > Do you have any other suggestions?
> >
> > Thanks!
> >
> > --
> > Best Regards, Edward J. Yoon
> >
>


-- 
Best Regards, Edward J. Yoon

Re: [DISCUSS] Things I'd like to focus on next

Posted by Júlio Pires <ju...@gmail.com>.

Hi,

I would add two suggestions:

1) Load Balancing [a] and [b];
2) Improve the documentation.

a) https://thegraphsblog.files.wordpress.com/2012/11/eurosys13_197_mizan.pdf
b) http://arxiv.org/pdf/1309.1049.pdf

Thanks!

2015-06-03 4:48 GMT-03:00 Edward J. Yoon <ed...@apache.org>:

> Hey,
>
> Here's few things I'd like to focus on next.
>
> 1. Add stream input format for listening messages coming from 3rd
> party applications, and incremental learning algorithms.
> 2. Improve reliability of system e.g., fault tolerance, HA, ..., etc.
> 3. More machine learning algorithms, such as ensemble classifier, SVM,
> DNN, ..., etc
>
> Do you have any other suggestions?
>
> Thanks!
>
> --
> Best Regards, Edward J. Yoon
>

Re: [DISCUSS] Things I'd like to focus on next

Posted by Chia-Hung Lin <cl...@googlemail.com>.

I suppose that decoupling key/ value from BSP interface may be needed
because not all inputs are in key/ value format.

On 4 June 2015 at 21:24, Behroz Sikander <be...@gmail.com> wrote:
> Hi,
> +1.
> Yes documentation needs improvement. I also saw that a book on Hama is also
> under progress. I can help with the documentation. I only found the
> following open issue
> https://issues.apache.org/jira/browse/HAMA-960.
>
> Something like MLBase or Mahout on top of Hama would be really nice and
> will boost the project. Regarding machine learning algorithms can we use
> ADMM(a) to implement the algorithms ?
> Like https://issues.apache.org/jira/browse/SPARK-1543
>
> a) https://web.stanford.edu/~boyd/papers/pdf/admm_distr_stats.pdf
>
> Regards,
> Behroz Sikander
>
> On Wed, Jun 3, 2015 at 9:48 AM, Edward J. Yoon <ed...@apache.org>
> wrote:
>
>> Hey,
>>
>> Here's few things I'd like to focus on next.
>>
>> 1. Add stream input format for listening messages coming from 3rd
>> party applications, and incremental learning algorithms.
>> 2. Improve reliability of system e.g., fault tolerance, HA, ..., etc.
>> 3. More machine learning algorithms, such as ensemble classifier, SVM,
>> DNN, ..., etc
>>
>> Do you have any other suggestions?
>>
>> Thanks!
>>
>> --
>> Best Regards, Edward J. Yoon
>>

Re: [DISCUSS] Things I'd like to focus on next

Posted by Tommaso Teofili <to...@gmail.com>.

2015-06-05 17:49 GMT+02:00 Behroz Sikander <be...@gmail.com>:

> Hi,
> *>>Please feel free to contribute documentation to the Apache Hama
> wiki[1]!*
> Ok. I am new to open source world so quite new to the procedure. Whenever I
> will find something missing, I will edit it.
>
> *>>We also maybe work together on it but I have no idea yet. Custom
> “Modern” or*
> *“Classic” Style? Maven website again?*
> Ok. I do not quite understand what do you mean by Modern or Classic style.
> Does Apache provides some kind of CMS to manage the hosted project websites
> ?
>
> *>>ADDM is quite interesting, and it looks like more fit into BSP than
> MapReduce*
> *(even if HBase(?) or memory-based shared storage is used). *
> Yes ADMM seems to be a natural fit for BSP model because ADMM algorithms
> are iterative. In each iteration, different machines process and exchange
> data and the algorithm keep running unless a convergence criteria is met.
>
> Check out Chapter 10 (Page 78) of following ADMM paper:
> https://web.stanford.edu/~boyd/papers/pdf/admm_distr_stats.pdf
>
> It discusses the implementation details of ADMM on BigData systems.
>
> *>>But I don't fully understand *
> My understanding is also limited but if the cost function of ML algorithms
> is Convex then the cost function can be converted to ADMM form. Once in
> ADMM form we can run it on a distributed system like Hama.
>
> >>*and so don't know whether it can be used as abstraction layer of **many
> ML algorithms. We'll need more investigation.*
>
> Yes, more investigation is needed. Here are a few ML algorithms already in
> ADMM form (a,b,c).
>
> a) *L1 Linear Regression -*
> https://www.dtc.umn.edu/s/resources/tsp2010oct-dlasso.pdf
> b) *L2-Logistic Regression:*
>
> https://intentmedia.github.io/assets/2013-10-09-presenting-at-ieee-big-data/pld_js_ieee_bigdata_2013_admm.pdf
> c) *SVM* - http://www.jmlr.org/papers/volume11/forero10a/forero10a.pdf
>
>
I don't know ADMM myself but what you say sounds pretty much similar to how
we implemented gradient descent and linear / logistic regression [1] on top
of it.
Any improvement there would be of course highly appreciated, so feel free
to open Jira issues and attach patches accordingly.

Regards,
Tommaso

[1] :
https://github.com/apache/hama/tree/trunk/ml/src/main/java/org/apache/hama/ml/regression


>
> Regards,
> Behroz Sikander
>
>
>
>
> On Fri, Jun 5, 2015 at 3:19 AM, Edward J. Yoon <ed...@samsung.com>
> wrote:
>
> > Please feel free to contribute documentation to the Apache Hama wiki[1]!
> > Ultimately, I'm considering improving our official website[2] on
> HAMA-960.
> > We
> > also maybe work together on it but I have no idea yet. Custom “Modern” or
> > “Classic” Style? Maven website again?
> >
> > ADDM is quite interesting, and it looks like more fit into BSP than
> > MapReduce
> > (even if HBase(?) or memory-based shared storage is used). But I don't
> > fully
> > understand and so don't know whether it can be used as abstraction layer
> of
> > many ML algorithms. We'll need more investigation.
> >
> >
> > 1. https://wiki.apache.org/hama
> > 2. https://hama.apache.org/
> >
> > --
> > Best Regards, Edward J. Yoon
> >
> > -----Original Message-----
> > From: Behroz Sikander [mailto:behroz89@gmail.com]
> > Sent: Thursday, June 04, 2015 10:24 PM
> > To: dev@hama.apache.org
> > Subject: Re: [DISCUSS] Things I'd like to focus on next
> >
> > Hi,
> > +1.
> > Yes documentation needs improvement. I also saw that a book on Hama is
> also
> > under progress. I can help with the documentation. I only found the
> > following open issuehttps://issues.apache.org/jira/browse/HAMA-960.
> >
> > Something like MLBase or Mahout on top of Hama would be really nice and
> > will boost the project. Regarding machine learning algorithms can we use
> > ADMM(a) to implement the algorithms ?
> > Like https://issues.apache.org/jira/browse/SPARK-1543
> >
> > a) https://web.stanford.edu/~boyd/papers/pdf/admm_distr_stats.pdf
> >
> > Regards,
> > Behroz Sikander
> >
> > On Wed, Jun 3, 2015 at 9:48 AM, Edward J. Yoon <ed...@apache.org>
> > wrote:
> >
> > > Hey,
> > >
> > > Here's few things I'd like to focus on next.
> > >
> > > 1. Add stream input format for listening messages coming from 3rd
> > > party applications, and incremental learning algorithms.
> > > 2. Improve reliability of system e.g., fault tolerance, HA, ..., etc.
> > > 3. More machine learning algorithms, such as ensemble classifier, SVM,
> > > DNN, ..., etc
> > >
> > > Do you have any other suggestions?
> > >
> > > Thanks!
> > >
> > > --
> > > Best Regards, Edward J. Yoon
> > >
> >
> >
> >
>

Re: [DISCUSS] Things I'd like to focus on next

Posted by Behroz Sikander <be...@gmail.com>.

Hi,
*>>Please feel free to contribute documentation to the Apache Hama wiki[1]!*
Ok. I am new to open source world so quite new to the procedure. Whenever I
will find something missing, I will edit it.

*>>We also maybe work together on it but I have no idea yet. Custom
“Modern” or*
*“Classic” Style? Maven website again?*
Ok. I do not quite understand what do you mean by Modern or Classic style.
Does Apache provides some kind of CMS to manage the hosted project websites
?

*>>ADDM is quite interesting, and it looks like more fit into BSP than
MapReduce*
*(even if HBase(?) or memory-based shared storage is used). *
Yes ADMM seems to be a natural fit for BSP model because ADMM algorithms
are iterative. In each iteration, different machines process and exchange
data and the algorithm keep running unless a convergence criteria is met.

Check out Chapter 10 (Page 78) of following ADMM paper:
https://web.stanford.edu/~boyd/papers/pdf/admm_distr_stats.pdf

It discusses the implementation details of ADMM on BigData systems.

*>>But I don't fully understand *
My understanding is also limited but if the cost function of ML algorithms
is Convex then the cost function can be converted to ADMM form. Once in
ADMM form we can run it on a distributed system like Hama.

>>*and so don't know whether it can be used as abstraction layer of **many
ML algorithms. We'll need more investigation.*

Yes, more investigation is needed. Here are a few ML algorithms already in
ADMM form (a,b,c).

a) *L1 Linear Regression -*
https://www.dtc.umn.edu/s/resources/tsp2010oct-dlasso.pdf
b) *L2-Logistic Regression:*
https://intentmedia.github.io/assets/2013-10-09-presenting-at-ieee-big-data/pld_js_ieee_bigdata_2013_admm.pdf
c) *SVM* - http://www.jmlr.org/papers/volume11/forero10a/forero10a.pdf

Regards,
Behroz Sikander

On Fri, Jun 5, 2015 at 3:19 AM, Edward J. Yoon <ed...@samsung.com>
wrote:

> Please feel free to contribute documentation to the Apache Hama wiki[1]!
> Ultimately, I'm considering improving our official website[2] on HAMA-960.
> We
> also maybe work together on it but I have no idea yet. Custom “Modern” or
> “Classic” Style? Maven website again?
>
> ADDM is quite interesting, and it looks like more fit into BSP than
> MapReduce
> (even if HBase(?) or memory-based shared storage is used). But I don't
> fully
> understand and so don't know whether it can be used as abstraction layer of
> many ML algorithms. We'll need more investigation.
>
>
> 1. https://wiki.apache.org/hama
> 2. https://hama.apache.org/
>
> --
> Best Regards, Edward J. Yoon
>
> -----Original Message-----
> From: Behroz Sikander [mailto:behroz89@gmail.com]
> Sent: Thursday, June 04, 2015 10:24 PM
> To: dev@hama.apache.org
> Subject: Re: [DISCUSS] Things I'd like to focus on next
>
> Hi,
> +1.
> Yes documentation needs improvement. I also saw that a book on Hama is also
> under progress. I can help with the documentation. I only found the
> following open issuehttps://issues.apache.org/jira/browse/HAMA-960.
>
> Something like MLBase or Mahout on top of Hama would be really nice and
> will boost the project. Regarding machine learning algorithms can we use
> ADMM(a) to implement the algorithms ?
> Like https://issues.apache.org/jira/browse/SPARK-1543
>
> a) https://web.stanford.edu/~boyd/papers/pdf/admm_distr_stats.pdf
>
> Regards,
> Behroz Sikander
>
> On Wed, Jun 3, 2015 at 9:48 AM, Edward J. Yoon <ed...@apache.org>
> wrote:
>
> > Hey,
> >
> > Here's few things I'd like to focus on next.
> >
> > 1. Add stream input format for listening messages coming from 3rd
> > party applications, and incremental learning algorithms.
> > 2. Improve reliability of system e.g., fault tolerance, HA, ..., etc.
> > 3. More machine learning algorithms, such as ensemble classifier, SVM,
> > DNN, ..., etc
> >
> > Do you have any other suggestions?
> >
> > Thanks!
> >
> > --
> > Best Regards, Edward J. Yoon
> >
>
>
>

RE: [DISCUSS] Things I'd like to focus on next

Posted by "Edward J. Yoon" <ed...@samsung.com>.

Please feel free to contribute documentation to the Apache Hama wiki[1]! 
Ultimately, I'm considering improving our official website[2] on HAMA-960. We 
also maybe work together on it but I have no idea yet. Custom “Modern” or 
“Classic” Style? Maven website again?

ADDM is quite interesting, and it looks like more fit into BSP than MapReduce 
(even if HBase(?) or memory-based shared storage is used). But I don't fully 
understand and so don't know whether it can be used as abstraction layer of 
many ML algorithms. We'll need more investigation.

1. https://wiki.apache.org/hama
2. https://hama.apache.org/

--
Best Regards, Edward J. Yoon

-----Original Message-----
From: Behroz Sikander [mailto:behroz89@gmail.com]
Sent: Thursday, June 04, 2015 10:24 PM
To: dev@hama.apache.org
Subject: Re: [DISCUSS] Things I'd like to focus on next

Hi,
+1.
Yes documentation needs improvement. I also saw that a book on Hama is also
under progress. I can help with the documentation. I only found the
following open issuehttps://issues.apache.org/jira/browse/HAMA-960.

Something like MLBase or Mahout on top of Hama would be really nice and
will boost the project. Regarding machine learning algorithms can we use
ADMM(a) to implement the algorithms ?
Like https://issues.apache.org/jira/browse/SPARK-1543

a) https://web.stanford.edu/~boyd/papers/pdf/admm_distr_stats.pdf

Regards,
Behroz Sikander

On Wed, Jun 3, 2015 at 9:48 AM, Edward J. Yoon <ed...@apache.org>
wrote:

> Hey,
>
> Here's few things I'd like to focus on next.
>
> 1. Add stream input format for listening messages coming from 3rd
> party applications, and incremental learning algorithms.
> 2. Improve reliability of system e.g., fault tolerance, HA, ..., etc.
> 3. More machine learning algorithms, such as ensemble classifier, SVM,
> DNN, ..., etc
>
> Do you have any other suggestions?
>
> Thanks!
>
> --
> Best Regards, Edward J. Yoon
>

Re: [DISCUSS] Things I'd like to focus on next

Posted by Behroz Sikander <be...@gmail.com>.

Hi,
+1.
Yes documentation needs improvement. I also saw that a book on Hama is also
under progress. I can help with the documentation. I only found the
following open issue
https://issues.apache.org/jira/browse/HAMA-960.

Something like MLBase or Mahout on top of Hama would be really nice and
will boost the project. Regarding machine learning algorithms can we use
ADMM(a) to implement the algorithms ?
Like https://issues.apache.org/jira/browse/SPARK-1543

a) https://web.stanford.edu/~boyd/papers/pdf/admm_distr_stats.pdf

Regards,
Behroz Sikander

On Wed, Jun 3, 2015 at 9:48 AM, Edward J. Yoon <ed...@apache.org>
wrote:

> Hey,
>
> Here's few things I'd like to focus on next.
>
> 1. Add stream input format for listening messages coming from 3rd
> party applications, and incremental learning algorithms.
> 2. Improve reliability of system e.g., fault tolerance, HA, ..., etc.
> 3. More machine learning algorithms, such as ensemble classifier, SVM,
> DNN, ..., etc
>
> Do you have any other suggestions?
>
> Thanks!
>
> --
> Best Regards, Edward J. Yoon
>