You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hama.apache.org by "Edward J. Yoon" <ed...@apache.org> on 2015/08/05 05:52:40 UTC

[DISCUSSION] Spinoff ANN package

Guys,

I plan to submit a 'DNN platform on top of Apache Hama' proposal as
below. I know Hama community is somewhat small, but the main reason is
that this domain-specific project is not fit for Apache Hama
community. Recruiting volunteers is also hard problem. I expect this
will become a very nice use-case of Apache Hama.

If you have any suggestions or other opinions, Please let me know.
Also, if you want to participate in this project, Pls feel free to add
your name here.

Thanks!

--
== Abstract ==

(tentatively named "Horn [hɔ:n]", korean meaning of Horn is a
"Spirit") is a neuron-centric programming APIs and execution framework
for large-scale deep learning, built on top of Apache Hama.

== Proposal ==

It is a goal of the Horn to provide a neuron-centric programming APIs
which allows user to easily define the characteristic of artificial
neural network model and its structure, and its execution framework
that leverages the heterogeneous resources on Hama and Hadoop YARN
cluster.

== Background ==

The initial ANN code was developed at Apache Hama project by a
committer, Yexi Jiang (Facebook) in 2013. The motivation behind this
work is to build a framework that provides more intuitive programming
APIs like Google's MapReduce or Pregel and supports applications
needing large model with huge memory consumptions in distributed way.

== Rationale ==

While many of deep learning open source softwares are still data or
model parallel only, we aim to support both data and model parallelism
and also fault-tolerant system design. The basic idea of data and
model parallelism is use of the remote parameter server to parallelize
model creation and distribute training across machines, and the BSP
framework of Apache Hama for performing asynchronous mini-batches.
Within single BSP job, each task group works asynchronously using
region barrier synchronization instead of global barrier
synchronization, and trains large-scale neural network model using
assigned data sets in BSP paradigm. This architecture is inspired by
Google's DistBelief (Jeff Dean et al, 2012).

== Initial Goals ==

Some current goals include:

 * builds new community
 * provides more intuitive programming APIs
 * needs both data and model parallelism support
 * must run natively on both Hama and Hadoop2
 * needs also GPUs and InfiniBand support

== Current Status ==

=== Meritocracy ===

The core developers understand what it means to have a process based
on meritocracy. We will provide continuous efforts to build an
environment that supports this, encouraging community members to
contribute.

=== Community ===

A small community has formed within the Apache Hama project and some
companies such as instant messenger service company and mobile
manufacturing company. And many people are interested in the
large-scale deep learning platform itself. By bringing Horn into
Apache, we believe that the community will grow even bigger.

=== Core Developers ===

Edward J. Yoon, Thomas Jungblut, and Dongjin Lee

== Known Risks ==

=== Orphaned Products ===

Apache Hama is already a core open source component at Samsung
Electronics, and Horn also will be used by Samsung Electronics, and so
there is no direct risk for this project to be orphaned.

=== Inexperience with Open Source ===

Some are very new and the others have experience using and/or working
on Apache open source projects.

=== Homogeneous Developers ===

The initial committers are from different organizations such as,
Microsoft, Samsung Electronics, and Line Plus.

=== Reliance on Salaried Developers ===

Other developers will also start working on the project in their spare time.

=== Relationships with Other Apache Products ===

 * Horn is based on Apache Hama
 * Apache Zookeeper is used for distributed locking service
 * Natively run on Apache Hadoop and Mesos
 * Horn can be somewhat overlapped with Singa podling.

=== An Excessive Fascination with the Apache Brand ===

Horn itself will hopefully have benefits from Apache, in terms of
attracting a community and establishing a solid group of developers,
but also the relation with Apache Hama, a general-purpose BSP
computing engine. These are the main reasons for us to send this
proposal.

== Documentation ==

Initial plan about Horn can be found at
http://blog.udanax.org/2015/06/googles-distbelief-clone-project-on.html

== Initial Source ==

The initial source code has been release as part of Apache Hama
project developed under Apache Software Foundation. The source code is
currently hosted at
https://svn.apache.org/repos/asf/hama/trunk/ml/src/main/java/org/apache/hama/ml/ann/

== Cryptography ==

Not applicable.

== Required Resources ==

Mailing Lists

 * horn-private
 * horn-dev

Subversion Directory

 * Git is the preferred source control system: git://git.apache.org/horn

Issue Tracking

 * a JIRA issue tracker, HORN

== Initial Committers and Affiliations ==

 * Thomas Jungblut (tjungblut at apache dot org)
 * Edward J. Yoon (edwardyoon at apache dot org)
 * Dongjin Lee (dongjin.lee.kr at gmail dot com)
 * Minho Kim (minwise.kim at samsung dot com)
 * TODO

== Affiliations ==

 * Thomas Jungblut (Microsoft)
 * Edward J. Yoon (Samsung Electronics)
 * Donjin Lee (LINE Plus)
 * Minho Kim (Samsung Electronics)
 * TODO

== Sponsors ==

Champion

 * Edward J. Yoon <edwardyoon at apache dot org>

Nominated Mentors

 * TODO

Sponsoring Entity

The Apache Incubator

-- 
Best Regards, Edward J. Yoon

Re: [DISCUSSION] Spinoff ANN package

Posted by "Edward J. Yoon" <ed...@apache.org>.

As you already might know, the proposal has accepted by Apache
Incubator PMC. :) I'll migrate the o.a.h.ml.ann and
o.a.h.ml.perceptron to Apache Horn (Incubating) repository.


Thanks.

On Thu, Aug 6, 2015 at 11:20 AM, Minho Kim <mi...@apache.org> wrote:
> +1
> I would like to participate in too. :-)
>
> 2015-08-06 6:12 GMT+09:00 Behroz Sikander <be...@gmail.com>:
>
>> +1
>> I would also like to participate :)
>>
>> On Wed, Aug 5, 2015 at 5:52 AM, Edward J. Yoon <ed...@apache.org>
>> wrote:
>>
>> > Guys,
>> >
>> > I plan to submit a 'DNN platform on top of Apache Hama' proposal as
>> > below. I know Hama community is somewhat small, but the main reason is
>> > that this domain-specific project is not fit for Apache Hama
>> > community. Recruiting volunteers is also hard problem. I expect this
>> > will become a very nice use-case of Apache Hama.
>> >
>> > If you have any suggestions or other opinions, Please let me know.
>> > Also, if you want to participate in this project, Pls feel free to add
>> > your name here.
>> >
>> > Thanks!
>> >
>> > --
>> > == Abstract ==
>> >
>> > (tentatively named "Horn [hɔ:n]", korean meaning of Horn is a
>> > "Spirit") is a neuron-centric programming APIs and execution framework
>> > for large-scale deep learning, built on top of Apache Hama.
>> >
>> > == Proposal ==
>> >
>> > It is a goal of the Horn to provide a neuron-centric programming APIs
>> > which allows user to easily define the characteristic of artificial
>> > neural network model and its structure, and its execution framework
>> > that leverages the heterogeneous resources on Hama and Hadoop YARN
>> > cluster.
>> >
>> > == Background ==
>> >
>> > The initial ANN code was developed at Apache Hama project by a
>> > committer, Yexi Jiang (Facebook) in 2013. The motivation behind this
>> > work is to build a framework that provides more intuitive programming
>> > APIs like Google's MapReduce or Pregel and supports applications
>> > needing large model with huge memory consumptions in distributed way.
>> >
>> > == Rationale ==
>> >
>> > While many of deep learning open source softwares are still data or
>> > model parallel only, we aim to support both data and model parallelism
>> > and also fault-tolerant system design. The basic idea of data and
>> > model parallelism is use of the remote parameter server to parallelize
>> > model creation and distribute training across machines, and the BSP
>> > framework of Apache Hama for performing asynchronous mini-batches.
>> > Within single BSP job, each task group works asynchronously using
>> > region barrier synchronization instead of global barrier
>> > synchronization, and trains large-scale neural network model using
>> > assigned data sets in BSP paradigm. This architecture is inspired by
>> > Google's DistBelief (Jeff Dean et al, 2012).
>> >
>> > == Initial Goals ==
>> >
>> > Some current goals include:
>> >
>> >  * builds new community
>> >  * provides more intuitive programming APIs
>> >  * needs both data and model parallelism support
>> >  * must run natively on both Hama and Hadoop2
>> >  * needs also GPUs and InfiniBand support
>> >
>> > == Current Status ==
>> >
>> > === Meritocracy ===
>> >
>> > The core developers understand what it means to have a process based
>> > on meritocracy. We will provide continuous efforts to build an
>> > environment that supports this, encouraging community members to
>> > contribute.
>> >
>> > === Community ===
>> >
>> > A small community has formed within the Apache Hama project and some
>> > companies such as instant messenger service company and mobile
>> > manufacturing company. And many people are interested in the
>> > large-scale deep learning platform itself. By bringing Horn into
>> > Apache, we believe that the community will grow even bigger.
>> >
>> > === Core Developers ===
>> >
>> > Edward J. Yoon, Thomas Jungblut, and Dongjin Lee
>> >
>> > == Known Risks ==
>> >
>> > === Orphaned Products ===
>> >
>> > Apache Hama is already a core open source component at Samsung
>> > Electronics, and Horn also will be used by Samsung Electronics, and so
>> > there is no direct risk for this project to be orphaned.
>> >
>> > === Inexperience with Open Source ===
>> >
>> > Some are very new and the others have experience using and/or working
>> > on Apache open source projects.
>> >
>> > === Homogeneous Developers ===
>> >
>> > The initial committers are from different organizations such as,
>> > Microsoft, Samsung Electronics, and Line Plus.
>> >
>> > === Reliance on Salaried Developers ===
>> >
>> > Other developers will also start working on the project in their spare
>> > time.
>> >
>> > === Relationships with Other Apache Products ===
>> >
>> >  * Horn is based on Apache Hama
>> >  * Apache Zookeeper is used for distributed locking service
>> >  * Natively run on Apache Hadoop and Mesos
>> >  * Horn can be somewhat overlapped with Singa podling.
>> >
>> > === An Excessive Fascination with the Apache Brand ===
>> >
>> > Horn itself will hopefully have benefits from Apache, in terms of
>> > attracting a community and establishing a solid group of developers,
>> > but also the relation with Apache Hama, a general-purpose BSP
>> > computing engine. These are the main reasons for us to send this
>> > proposal.
>> >
>> > == Documentation ==
>> >
>> > Initial plan about Horn can be found at
>> > http://blog.udanax.org/2015/06/googles-distbelief-clone-project-on.html
>> >
>> > == Initial Source ==
>> >
>> > The initial source code has been release as part of Apache Hama
>> > project developed under Apache Software Foundation. The source code is
>> > currently hosted at
>> >
>> >
>> https://svn.apache.org/repos/asf/hama/trunk/ml/src/main/java/org/apache/hama/ml/ann/
>> >
>> > == Cryptography ==
>> >
>> > Not applicable.
>> >
>> > == Required Resources ==
>> >
>> > Mailing Lists
>> >
>> >  * horn-private
>> >  * horn-dev
>> >
>> > Subversion Directory
>> >
>> >  * Git is the preferred source control system: git://git.apache.org/horn
>> >
>> > Issue Tracking
>> >
>> >  * a JIRA issue tracker, HORN
>> >
>> > == Initial Committers and Affiliations ==
>> >
>> >  * Thomas Jungblut (tjungblut at apache dot org)
>> >  * Edward J. Yoon (edwardyoon at apache dot org)
>> >  * Dongjin Lee (dongjin.lee.kr at gmail dot com)
>> >  * Minho Kim (minwise.kim at samsung dot com)
>> >  * TODO
>> >
>> > == Affiliations ==
>> >
>> >  * Thomas Jungblut (Microsoft)
>> >  * Edward J. Yoon (Samsung Electronics)
>> >  * Donjin Lee (LINE Plus)
>> >  * Minho Kim (Samsung Electronics)
>> >  * TODO
>> >
>> > == Sponsors ==
>> >
>> > Champion
>> >
>> >  * Edward J. Yoon <edwardyoon at apache dot org>
>> >
>> > Nominated Mentors
>> >
>> >  * TODO
>> >
>> > Sponsoring Entity
>> >
>> > The Apache Incubator
>> >
>> > --
>> > Best Regards, Edward J. Yoon
>> >
>>



-- 
Best Regards, Edward J. Yoon

Re: [DISCUSSION] Spinoff ANN package

Posted by Minho Kim <mi...@apache.org>.

+1
I would like to participate in too. :-)

2015-08-06 6:12 GMT+09:00 Behroz Sikander <be...@gmail.com>:

> +1
> I would also like to participate :)
>
> On Wed, Aug 5, 2015 at 5:52 AM, Edward J. Yoon <ed...@apache.org>
> wrote:
>
> > Guys,
> >
> > I plan to submit a 'DNN platform on top of Apache Hama' proposal as
> > below. I know Hama community is somewhat small, but the main reason is
> > that this domain-specific project is not fit for Apache Hama
> > community. Recruiting volunteers is also hard problem. I expect this
> > will become a very nice use-case of Apache Hama.
> >
> > If you have any suggestions or other opinions, Please let me know.
> > Also, if you want to participate in this project, Pls feel free to add
> > your name here.
> >
> > Thanks!
> >
> > --
> > == Abstract ==
> >
> > (tentatively named "Horn [hɔ:n]", korean meaning of Horn is a
> > "Spirit") is a neuron-centric programming APIs and execution framework
> > for large-scale deep learning, built on top of Apache Hama.
> >
> > == Proposal ==
> >
> > It is a goal of the Horn to provide a neuron-centric programming APIs
> > which allows user to easily define the characteristic of artificial
> > neural network model and its structure, and its execution framework
> > that leverages the heterogeneous resources on Hama and Hadoop YARN
> > cluster.
> >
> > == Background ==
> >
> > The initial ANN code was developed at Apache Hama project by a
> > committer, Yexi Jiang (Facebook) in 2013. The motivation behind this
> > work is to build a framework that provides more intuitive programming
> > APIs like Google's MapReduce or Pregel and supports applications
> > needing large model with huge memory consumptions in distributed way.
> >
> > == Rationale ==
> >
> > While many of deep learning open source softwares are still data or
> > model parallel only, we aim to support both data and model parallelism
> > and also fault-tolerant system design. The basic idea of data and
> > model parallelism is use of the remote parameter server to parallelize
> > model creation and distribute training across machines, and the BSP
> > framework of Apache Hama for performing asynchronous mini-batches.
> > Within single BSP job, each task group works asynchronously using
> > region barrier synchronization instead of global barrier
> > synchronization, and trains large-scale neural network model using
> > assigned data sets in BSP paradigm. This architecture is inspired by
> > Google's DistBelief (Jeff Dean et al, 2012).
> >
> > == Initial Goals ==
> >
> > Some current goals include:
> >
> >  * builds new community
> >  * provides more intuitive programming APIs
> >  * needs both data and model parallelism support
> >  * must run natively on both Hama and Hadoop2
> >  * needs also GPUs and InfiniBand support
> >
> > == Current Status ==
> >
> > === Meritocracy ===
> >
> > The core developers understand what it means to have a process based
> > on meritocracy. We will provide continuous efforts to build an
> > environment that supports this, encouraging community members to
> > contribute.
> >
> > === Community ===
> >
> > A small community has formed within the Apache Hama project and some
> > companies such as instant messenger service company and mobile
> > manufacturing company. And many people are interested in the
> > large-scale deep learning platform itself. By bringing Horn into
> > Apache, we believe that the community will grow even bigger.
> >
> > === Core Developers ===
> >
> > Edward J. Yoon, Thomas Jungblut, and Dongjin Lee
> >
> > == Known Risks ==
> >
> > === Orphaned Products ===
> >
> > Apache Hama is already a core open source component at Samsung
> > Electronics, and Horn also will be used by Samsung Electronics, and so
> > there is no direct risk for this project to be orphaned.
> >
> > === Inexperience with Open Source ===
> >
> > Some are very new and the others have experience using and/or working
> > on Apache open source projects.
> >
> > === Homogeneous Developers ===
> >
> > The initial committers are from different organizations such as,
> > Microsoft, Samsung Electronics, and Line Plus.
> >
> > === Reliance on Salaried Developers ===
> >
> > Other developers will also start working on the project in their spare
> > time.
> >
> > === Relationships with Other Apache Products ===
> >
> >  * Horn is based on Apache Hama
> >  * Apache Zookeeper is used for distributed locking service
> >  * Natively run on Apache Hadoop and Mesos
> >  * Horn can be somewhat overlapped with Singa podling.
> >
> > === An Excessive Fascination with the Apache Brand ===
> >
> > Horn itself will hopefully have benefits from Apache, in terms of
> > attracting a community and establishing a solid group of developers,
> > but also the relation with Apache Hama, a general-purpose BSP
> > computing engine. These are the main reasons for us to send this
> > proposal.
> >
> > == Documentation ==
> >
> > Initial plan about Horn can be found at
> > http://blog.udanax.org/2015/06/googles-distbelief-clone-project-on.html
> >
> > == Initial Source ==
> >
> > The initial source code has been release as part of Apache Hama
> > project developed under Apache Software Foundation. The source code is
> > currently hosted at
> >
> >
> https://svn.apache.org/repos/asf/hama/trunk/ml/src/main/java/org/apache/hama/ml/ann/
> >
> > == Cryptography ==
> >
> > Not applicable.
> >
> > == Required Resources ==
> >
> > Mailing Lists
> >
> >  * horn-private
> >  * horn-dev
> >
> > Subversion Directory
> >
> >  * Git is the preferred source control system: git://git.apache.org/horn
> >
> > Issue Tracking
> >
> >  * a JIRA issue tracker, HORN
> >
> > == Initial Committers and Affiliations ==
> >
> >  * Thomas Jungblut (tjungblut at apache dot org)
> >  * Edward J. Yoon (edwardyoon at apache dot org)
> >  * Dongjin Lee (dongjin.lee.kr at gmail dot com)
> >  * Minho Kim (minwise.kim at samsung dot com)
> >  * TODO
> >
> > == Affiliations ==
> >
> >  * Thomas Jungblut (Microsoft)
> >  * Edward J. Yoon (Samsung Electronics)
> >  * Donjin Lee (LINE Plus)
> >  * Minho Kim (Samsung Electronics)
> >  * TODO
> >
> > == Sponsors ==
> >
> > Champion
> >
> >  * Edward J. Yoon <edwardyoon at apache dot org>
> >
> > Nominated Mentors
> >
> >  * TODO
> >
> > Sponsoring Entity
> >
> > The Apache Incubator
> >
> > --
> > Best Regards, Edward J. Yoon
> >
>

Re: [DISCUSSION] Spinoff ANN package

Posted by Behroz Sikander <be...@gmail.com>.

+1
I would also like to participate :)

On Wed, Aug 5, 2015 at 5:52 AM, Edward J. Yoon <ed...@apache.org>
wrote:

> Guys,
>
> I plan to submit a 'DNN platform on top of Apache Hama' proposal as
> below. I know Hama community is somewhat small, but the main reason is
> that this domain-specific project is not fit for Apache Hama
> community. Recruiting volunteers is also hard problem. I expect this
> will become a very nice use-case of Apache Hama.
>
> If you have any suggestions or other opinions, Please let me know.
> Also, if you want to participate in this project, Pls feel free to add
> your name here.
>
> Thanks!
>
> --
> == Abstract ==
>
> (tentatively named "Horn [hɔ:n]", korean meaning of Horn is a
> "Spirit") is a neuron-centric programming APIs and execution framework
> for large-scale deep learning, built on top of Apache Hama.
>
> == Proposal ==
>
> It is a goal of the Horn to provide a neuron-centric programming APIs
> which allows user to easily define the characteristic of artificial
> neural network model and its structure, and its execution framework
> that leverages the heterogeneous resources on Hama and Hadoop YARN
> cluster.
>
> == Background ==
>
> The initial ANN code was developed at Apache Hama project by a
> committer, Yexi Jiang (Facebook) in 2013. The motivation behind this
> work is to build a framework that provides more intuitive programming
> APIs like Google's MapReduce or Pregel and supports applications
> needing large model with huge memory consumptions in distributed way.
>
> == Rationale ==
>
> While many of deep learning open source softwares are still data or
> model parallel only, we aim to support both data and model parallelism
> and also fault-tolerant system design. The basic idea of data and
> model parallelism is use of the remote parameter server to parallelize
> model creation and distribute training across machines, and the BSP
> framework of Apache Hama for performing asynchronous mini-batches.
> Within single BSP job, each task group works asynchronously using
> region barrier synchronization instead of global barrier
> synchronization, and trains large-scale neural network model using
> assigned data sets in BSP paradigm. This architecture is inspired by
> Google's DistBelief (Jeff Dean et al, 2012).
>
> == Initial Goals ==
>
> Some current goals include:
>
>  * builds new community
>  * provides more intuitive programming APIs
>  * needs both data and model parallelism support
>  * must run natively on both Hama and Hadoop2
>  * needs also GPUs and InfiniBand support
>
> == Current Status ==
>
> === Meritocracy ===
>
> The core developers understand what it means to have a process based
> on meritocracy. We will provide continuous efforts to build an
> environment that supports this, encouraging community members to
> contribute.
>
> === Community ===
>
> A small community has formed within the Apache Hama project and some
> companies such as instant messenger service company and mobile
> manufacturing company. And many people are interested in the
> large-scale deep learning platform itself. By bringing Horn into
> Apache, we believe that the community will grow even bigger.
>
> === Core Developers ===
>
> Edward J. Yoon, Thomas Jungblut, and Dongjin Lee
>
> == Known Risks ==
>
> === Orphaned Products ===
>
> Apache Hama is already a core open source component at Samsung
> Electronics, and Horn also will be used by Samsung Electronics, and so
> there is no direct risk for this project to be orphaned.
>
> === Inexperience with Open Source ===
>
> Some are very new and the others have experience using and/or working
> on Apache open source projects.
>
> === Homogeneous Developers ===
>
> The initial committers are from different organizations such as,
> Microsoft, Samsung Electronics, and Line Plus.
>
> === Reliance on Salaried Developers ===
>
> Other developers will also start working on the project in their spare
> time.
>
> === Relationships with Other Apache Products ===
>
>  * Horn is based on Apache Hama
>  * Apache Zookeeper is used for distributed locking service
>  * Natively run on Apache Hadoop and Mesos
>  * Horn can be somewhat overlapped with Singa podling.
>
> === An Excessive Fascination with the Apache Brand ===
>
> Horn itself will hopefully have benefits from Apache, in terms of
> attracting a community and establishing a solid group of developers,
> but also the relation with Apache Hama, a general-purpose BSP
> computing engine. These are the main reasons for us to send this
> proposal.
>
> == Documentation ==
>
> Initial plan about Horn can be found at
> http://blog.udanax.org/2015/06/googles-distbelief-clone-project-on.html
>
> == Initial Source ==
>
> The initial source code has been release as part of Apache Hama
> project developed under Apache Software Foundation. The source code is
> currently hosted at
>
> https://svn.apache.org/repos/asf/hama/trunk/ml/src/main/java/org/apache/hama/ml/ann/
>
> == Cryptography ==
>
> Not applicable.
>
> == Required Resources ==
>
> Mailing Lists
>
>  * horn-private
>  * horn-dev
>
> Subversion Directory
>
>  * Git is the preferred source control system: git://git.apache.org/horn
>
> Issue Tracking
>
>  * a JIRA issue tracker, HORN
>
> == Initial Committers and Affiliations ==
>
>  * Thomas Jungblut (tjungblut at apache dot org)
>  * Edward J. Yoon (edwardyoon at apache dot org)
>  * Dongjin Lee (dongjin.lee.kr at gmail dot com)
>  * Minho Kim (minwise.kim at samsung dot com)
>  * TODO
>
> == Affiliations ==
>
>  * Thomas Jungblut (Microsoft)
>  * Edward J. Yoon (Samsung Electronics)
>  * Donjin Lee (LINE Plus)
>  * Minho Kim (Samsung Electronics)
>  * TODO
>
> == Sponsors ==
>
> Champion
>
>  * Edward J. Yoon <edwardyoon at apache dot org>
>
> Nominated Mentors
>
>  * TODO
>
> Sponsoring Entity
>
> The Apache Incubator
>
> --
> Best Regards, Edward J. Yoon
>

RE: [DISCUSSION] Spinoff ANN package

Posted by "Edward J. Yoon" <ed...@samsung.com>.

Thanks for your volunteering! I'll add you. BTW, please don't forget: you're 
currently a chair of Apache Hama and it's important role. :-)

--
Best Regards, Edward J. Yoon


-----Original Message-----
From: Chia-Hung Lin [mailto:clin4j@googlemail.com]
Sent: Wednesday, August 05, 2015 1:08 PM
To: dev@hama.apache.org
Subject: Re: [DISCUSSION] Spinoff ANN package

+1 That looks interesting. I would like to participate in this project.

On 5 August 2015 at 11:52, Edward J. Yoon <ed...@apache.org> wrote:
> Guys,
>
> I plan to submit a 'DNN platform on top of Apache Hama' proposal as
> below. I know Hama community is somewhat small, but the main reason is
> that this domain-specific project is not fit for Apache Hama
> community. Recruiting volunteers is also hard problem. I expect this
> will become a very nice use-case of Apache Hama.
>
> If you have any suggestions or other opinions, Please let me know.
> Also, if you want to participate in this project, Pls feel free to add
> your name here.
>
> Thanks!
>
> --
> == Abstract ==
>
> (tentatively named "Horn [h?:n]", korean meaning of Horn is a
> "Spirit") is a neuron-centric programming APIs and execution framework
> for large-scale deep learning, built on top of Apache Hama.
>
> == Proposal ==
>
> It is a goal of the Horn to provide a neuron-centric programming APIs
> which allows user to easily define the characteristic of artificial
> neural network model and its structure, and its execution framework
> that leverages the heterogeneous resources on Hama and Hadoop YARN
> cluster.
>
> == Background ==
>
> The initial ANN code was developed at Apache Hama project by a
> committer, Yexi Jiang (Facebook) in 2013. The motivation behind this
> work is to build a framework that provides more intuitive programming
> APIs like Google's MapReduce or Pregel and supports applications
> needing large model with huge memory consumptions in distributed way.
>
> == Rationale ==
>
> While many of deep learning open source softwares are still data or
> model parallel only, we aim to support both data and model parallelism
> and also fault-tolerant system design. The basic idea of data and
> model parallelism is use of the remote parameter server to parallelize
> model creation and distribute training across machines, and the BSP
> framework of Apache Hama for performing asynchronous mini-batches.
> Within single BSP job, each task group works asynchronously using
> region barrier synchronization instead of global barrier
> synchronization, and trains large-scale neural network model using
> assigned data sets in BSP paradigm. This architecture is inspired by
> Google's DistBelief (Jeff Dean et al, 2012).
>
> == Initial Goals ==
>
> Some current goals include:
>
>  * builds new community
>  * provides more intuitive programming APIs
>  * needs both data and model parallelism support
>  * must run natively on both Hama and Hadoop2
>  * needs also GPUs and InfiniBand support
>
> == Current Status ==
>
> === Meritocracy ===
>
> The core developers understand what it means to have a process based
> on meritocracy. We will provide continuous efforts to build an
> environment that supports this, encouraging community members to
> contribute.
>
> === Community ===
>
> A small community has formed within the Apache Hama project and some
> companies such as instant messenger service company and mobile
> manufacturing company. And many people are interested in the
> large-scale deep learning platform itself. By bringing Horn into
> Apache, we believe that the community will grow even bigger.
>
> === Core Developers ===
>
> Edward J. Yoon, Thomas Jungblut, and Dongjin Lee
>
> == Known Risks ==
>
> === Orphaned Products ===
>
> Apache Hama is already a core open source component at Samsung
> Electronics, and Horn also will be used by Samsung Electronics, and so
> there is no direct risk for this project to be orphaned.
>
> === Inexperience with Open Source ===
>
> Some are very new and the others have experience using and/or working
> on Apache open source projects.
>
> === Homogeneous Developers ===
>
> The initial committers are from different organizations such as,
> Microsoft, Samsung Electronics, and Line Plus.
>
> === Reliance on Salaried Developers ===
>
> Other developers will also start working on the project in their spare time.
>
> === Relationships with Other Apache Products ===
>
>  * Horn is based on Apache Hama
>  * Apache Zookeeper is used for distributed locking service
>  * Natively run on Apache Hadoop and Mesos
>  * Horn can be somewhat overlapped with Singa podling.
>
> === An Excessive Fascination with the Apache Brand ===
>
> Horn itself will hopefully have benefits from Apache, in terms of
> attracting a community and establishing a solid group of developers,
> but also the relation with Apache Hama, a general-purpose BSP
> computing engine. These are the main reasons for us to send this
> proposal.
>
> == Documentation ==
>
> Initial plan about Horn can be found at
> http://blog.udanax.org/2015/06/googles-distbelief-clone-project-on.html
>
> == Initial Source ==
>
> The initial source code has been release as part of Apache Hama
> project developed under Apache Software Foundation. The source code is
> currently hosted at
> https://svn.apache.org/repos/asf/hama/trunk/ml/src/main/java/org/apache/hama/ml/ann/
>
> == Cryptography ==
>
> Not applicable.
>
> == Required Resources ==
>
> Mailing Lists
>
>  * horn-private
>  * horn-dev
>
> Subversion Directory
>
>  * Git is the preferred source control system: git://git.apache.org/horn
>
> Issue Tracking
>
>  * a JIRA issue tracker, HORN
>
> == Initial Committers and Affiliations ==
>
>  * Thomas Jungblut (tjungblut at apache dot org)
>  * Edward J. Yoon (edwardyoon at apache dot org)
>  * Dongjin Lee (dongjin.lee.kr at gmail dot com)
>  * Minho Kim (minwise.kim at samsung dot com)
>  * TODO
>
> == Affiliations ==
>
>  * Thomas Jungblut (Microsoft)
>  * Edward J. Yoon (Samsung Electronics)
>  * Donjin Lee (LINE Plus)
>  * Minho Kim (Samsung Electronics)
>  * TODO
>
> == Sponsors ==
>
> Champion
>
>  * Edward J. Yoon <edwardyoon at apache dot org>
>
> Nominated Mentors
>
>  * TODO
>
> Sponsoring Entity
>
> The Apache Incubator
>
> --
> Best Regards, Edward J. Yoon

Re: [DISCUSSION] Spinoff ANN package

Posted by Chia-Hung Lin <cl...@googlemail.com>.

+1 That looks interesting. I would like to participate in this project.

On 5 August 2015 at 11:52, Edward J. Yoon <ed...@apache.org> wrote:
> Guys,
>
> I plan to submit a 'DNN platform on top of Apache Hama' proposal as
> below. I know Hama community is somewhat small, but the main reason is
> that this domain-specific project is not fit for Apache Hama
> community. Recruiting volunteers is also hard problem. I expect this
> will become a very nice use-case of Apache Hama.
>
> If you have any suggestions or other opinions, Please let me know.
> Also, if you want to participate in this project, Pls feel free to add
> your name here.
>
> Thanks!
>
> --
> == Abstract ==
>
> (tentatively named "Horn [hɔ:n]", korean meaning of Horn is a
> "Spirit") is a neuron-centric programming APIs and execution framework
> for large-scale deep learning, built on top of Apache Hama.
>
> == Proposal ==
>
> It is a goal of the Horn to provide a neuron-centric programming APIs
> which allows user to easily define the characteristic of artificial
> neural network model and its structure, and its execution framework
> that leverages the heterogeneous resources on Hama and Hadoop YARN
> cluster.
>
> == Background ==
>
> The initial ANN code was developed at Apache Hama project by a
> committer, Yexi Jiang (Facebook) in 2013. The motivation behind this
> work is to build a framework that provides more intuitive programming
> APIs like Google's MapReduce or Pregel and supports applications
> needing large model with huge memory consumptions in distributed way.
>
> == Rationale ==
>
> While many of deep learning open source softwares are still data or
> model parallel only, we aim to support both data and model parallelism
> and also fault-tolerant system design. The basic idea of data and
> model parallelism is use of the remote parameter server to parallelize
> model creation and distribute training across machines, and the BSP
> framework of Apache Hama for performing asynchronous mini-batches.
> Within single BSP job, each task group works asynchronously using
> region barrier synchronization instead of global barrier
> synchronization, and trains large-scale neural network model using
> assigned data sets in BSP paradigm. This architecture is inspired by
> Google's DistBelief (Jeff Dean et al, 2012).
>
> == Initial Goals ==
>
> Some current goals include:
>
>  * builds new community
>  * provides more intuitive programming APIs
>  * needs both data and model parallelism support
>  * must run natively on both Hama and Hadoop2
>  * needs also GPUs and InfiniBand support
>
> == Current Status ==
>
> === Meritocracy ===
>
> The core developers understand what it means to have a process based
> on meritocracy. We will provide continuous efforts to build an
> environment that supports this, encouraging community members to
> contribute.
>
> === Community ===
>
> A small community has formed within the Apache Hama project and some
> companies such as instant messenger service company and mobile
> manufacturing company. And many people are interested in the
> large-scale deep learning platform itself. By bringing Horn into
> Apache, we believe that the community will grow even bigger.
>
> === Core Developers ===
>
> Edward J. Yoon, Thomas Jungblut, and Dongjin Lee
>
> == Known Risks ==
>
> === Orphaned Products ===
>
> Apache Hama is already a core open source component at Samsung
> Electronics, and Horn also will be used by Samsung Electronics, and so
> there is no direct risk for this project to be orphaned.
>
> === Inexperience with Open Source ===
>
> Some are very new and the others have experience using and/or working
> on Apache open source projects.
>
> === Homogeneous Developers ===
>
> The initial committers are from different organizations such as,
> Microsoft, Samsung Electronics, and Line Plus.
>
> === Reliance on Salaried Developers ===
>
> Other developers will also start working on the project in their spare time.
>
> === Relationships with Other Apache Products ===
>
>  * Horn is based on Apache Hama
>  * Apache Zookeeper is used for distributed locking service
>  * Natively run on Apache Hadoop and Mesos
>  * Horn can be somewhat overlapped with Singa podling.
>
> === An Excessive Fascination with the Apache Brand ===
>
> Horn itself will hopefully have benefits from Apache, in terms of
> attracting a community and establishing a solid group of developers,
> but also the relation with Apache Hama, a general-purpose BSP
> computing engine. These are the main reasons for us to send this
> proposal.
>
> == Documentation ==
>
> Initial plan about Horn can be found at
> http://blog.udanax.org/2015/06/googles-distbelief-clone-project-on.html
>
> == Initial Source ==
>
> The initial source code has been release as part of Apache Hama
> project developed under Apache Software Foundation. The source code is
> currently hosted at
> https://svn.apache.org/repos/asf/hama/trunk/ml/src/main/java/org/apache/hama/ml/ann/
>
> == Cryptography ==
>
> Not applicable.
>
> == Required Resources ==
>
> Mailing Lists
>
>  * horn-private
>  * horn-dev
>
> Subversion Directory
>
>  * Git is the preferred source control system: git://git.apache.org/horn
>
> Issue Tracking
>
>  * a JIRA issue tracker, HORN
>
> == Initial Committers and Affiliations ==
>
>  * Thomas Jungblut (tjungblut at apache dot org)
>  * Edward J. Yoon (edwardyoon at apache dot org)
>  * Dongjin Lee (dongjin.lee.kr at gmail dot com)
>  * Minho Kim (minwise.kim at samsung dot com)
>  * TODO
>
> == Affiliations ==
>
>  * Thomas Jungblut (Microsoft)
>  * Edward J. Yoon (Samsung Electronics)
>  * Donjin Lee (LINE Plus)
>  * Minho Kim (Samsung Electronics)
>  * TODO
>
> == Sponsors ==
>
> Champion
>
>  * Edward J. Yoon <edwardyoon at apache dot org>
>
> Nominated Mentors
>
>  * TODO
>
> Sponsoring Entity
>
> The Apache Incubator
>
> --
> Best Regards, Edward J. Yoon