You are viewing a plain text version of this content. The canonical link for it is here.

Posted to general@incubator.apache.org by Thejas Nair <th...@gmail.com> on 2015/03/10 15:33:06 UTC

[VOTE] Accept Apache Singa as incubator project

The Singa Incubator Proposal document has been updated based on
feedback in the proposal thread.

This vote is proposing the inclusion of Apache Singa as incubator project.
The vote will run for at least 72 hours.

[ ] +1 Accept Apache Singa into the Incubator
[ ] +0 Don’t care.
[ ] -1 Don’t accept Apache Singa into the Incubator because..

Please vote !

Here is my +1 .

Link to version of proposal being voted on :
https://wiki.apache.org/incubator/SingaProposal?action=recall&rev=10

The text is below
----------------------------------------------

= Singa Incubator Proposal =
== Abstract ==
SINGA is a distributed deep learning platform.

== Proposal ==
SINGA is an efficient, scalable and easy-to-use distributed platform
for training deep learning models, e.g., Deep Convolutional Neural Network and
Deep Belief Network. It parallelizes the computation (i.e., training) onto a
cluster of nodes by distributing the training data and model automatically to
speed up the training. Built-in training algorithms like Back-Propagation and
Contrastive Divergence are implemented based on common abstractions of deep
learning models. Users can train their own deep learning models by simply
customizing these abstractions like implementing the Mapper and
Reducer in Hadoop.

== Background ==
Deep learning refers to a set of feature (or representation) learning models
that consist of multiple (non-linear) layers, where different layers learn
different levels of abstractions (representations) of the raw input data.
Larger (in terms of model parameters) and deeper (in terms of number of layers)
models have shown better performance, e.g., lower image classification error in
Large Scale Visual Recognition Challenge. However, a larger model requires more
memory and larger training data to reduce over-fitting. Complex
numeric operations
make the training computation intensive. In practice, training large
deep learning
models takes weeks or months on a single node (even with GPU).

== Rational ==
Deep learning has gained a lot of attraction in both academia and
industry due to
its success in a wide range of areas such as computer vision and
speech recognition.
However, training of such models is computationally expensive,
especially for large
and deep models (e.g., with billions of parameters and more than 10
layers). Both
Google and Microsoft have developed distributed deep learning systems
to make the
training more efficient by distributing the computations within a
cluster of nodes.
However, these systems are closed source softwares. Our goal is to leverage the
community of open source developers to make SINGA efficient, scalable
and easy to
use. SINGA is a full fledged distributed platform, that could benefit the
community and also benefit from the community in their involvement in
contributing
to the further work in this area. We believe the nature of SINGA and our visions
for the system fit naturally to Apache's philosophy and development framework.

== Initial Goals ==
We have developed a system for SINGA running on a commodity computer
cluster. The initial goals include,
 * improving the system in terms of scalability and efficiency, e.g.,
using Infiniband for network communication and multi-threading for one
node computation. We would consider extending SINGA to GPU clusters
later.
 * benchmarking with larger datasets (hundreds of millions of training
instances) and models (billions of parameters).
 * adding more built-in deep learning models. Users can train the
built-in models on their datasets directly.


== Current Status ==
=== Meritocracy ===
We would like to follow ASF meritocratic principles to encourage more developers
to contribute in this project. We know that only active and excellent developers
can make SINGA a successful project. The committer list and PMC will be updated
based on developers' performance and commitment. We are also improving the
documentation and code to help new developers get started quickly.

=== Community ===
SINGA is currently being developed in the Database System Research Lab at the
National University of Singapore (NUS) in collaboration with Zhejiang
University in China.
Our lab has extensive experience in building database related systems, including
distributed systems. Six PhD students and research assistants (Jinyang Gao,
Kaiping Zheng, Sheng Wang, Wei Wang, Zhaojing Luo and Zhongle Xie) , a research
fellow (Anh Dinh) and three professors (Beng Chin Ooi, Gang Chen, Kian Lee Tan)
have been working for a year on this project. We are open to recruiting more
developers from diverse backgrounds.

=== Core Developers ===
Beng Chin Ooi, Gang Chen and Kian Lee Tan are professors who have worked on
distributed systems for more than 20 years. They have collaborated with the
industry and have built various large scale systems. Anh Dinh's research is also
on distributed systems, albeit with more focus on security aspects. Wei Wang's
research is on deep learning problems including deep learning applications and
large scale training. Sheng Wang and Jinyang are working on efficient indexing,
querying of large scale data and machine learning. Kaiping, Zhaojing and Zhongle
are new PhD students who jointed SINGA recently. They will work on this project
for a longer time (next 4-5 years). While we share common research interests,
each member also brings diverse expertise to the team.

=== Alignment ===
ASF is already the home of many distributed platforms, e.g., Hadoop, Spark and
Mahout, each of which targets a different application domain. SINGA, being a
distributed platform for large-scale deep learning, focuses on another important
domain for which there still lacks a robust and scalable open-source platform.
The recent success of deep learning models especially for vision and speech
recognition tasks has generated interests in both applying existing
deep learning
models and in developing new ones. Thus, an open-source platform for deep
learning will be able to attract a large community of users and developers.
SINGA is a complex system needing many iterations of design, implementation and
testing. Apache's collaboration framework which encourages active contribution
from developers will inevitably help improve the quality of the system, as shown
in the success of Hadoop, Spark, etc.. Equally important is the community of
users which helps identify real-life applications of deep learning, and helps
to evaluate the system's performance and ease-of-use. We hope to
leverage ASF for
coordinating and promoting both communities, and in return benefit the
communities
with another useful tool.

== Known Risks ==
=== Orphaned products ===
Four core developers (Anh, Wei Wang, Jinyang and Sheng Wang) may leave the
lab in two to four years time. It is possible that some of them may
not have enough
time to focus on this project after that. But, SINGA is part of our other bigger
research projects on building an infrastructure for data intensive applications,
which include health-care analytics and brain-inspired computing. Beng Chin and
Kian Lee would continue working on it and getting more people
involved. For example,
three new developers (Kaiping, Zhaojing and Zhongle) joined us recently.
Individual developers are welcome to make SINGA a diverse community
that is robust and independent from any single developer.

=== Inexperience with Open Source ===
All the developers are active users and followers of open source projects. Our
research lab has a strong commitment to open source, and has released the source
code of several systems under open source license as a way of contributing back
to the open source community. But we do not have much real experience
in open source
projects with large and well organized communities like those in Apache. This is
one reason we choose Apache which is experienced in open source
project incubation.
We hope to get the help from Apache (e.g., champion and mentors) to establish a
healthy path for SINGA.

=== Homogenous Developers ===
Although the current developers are researchers in the universities, they have
different research interests and project experiences, as mentioned in
the section
that introduces the core developers. We know that a diverse community
is helpful.
Hence we are open to the idea of recruiting developers from other
regions and organizations.

=== Reliance on Salaried Developers ===
As a research project in the university, SINGA's current developing community
consists of professors, PhD students, research assistants and
postdoctoral fellows.
They are driven by their interests to work on this project and have contributed
actively since the start of the project. The research assistants and fellows are
expected to leave when their contracts expire. However, they are keen
to continue
to work on the project voluntarily. Moreover, as a long term research
project, new
research assistants and fellows are likely to join the project.

=== A Excessive Fascination with the Apache Brand ===
We choose Apache not for publicity. We have two purposes. First, we want to
leverage Apache's reputation to recruit more developers to make a diverse
community. Second, we hope that Apache can help us to establish a healthy path
in developing SINGA. Beng Chin and Kian-Lee are established database and
distributed system researchers, and together with the other contributors, they
sincerely believe that there is a need for a widely accepted open source
distributed deep learning platform. The field of deep learning is still at its
infancy, and an open source platform will fuel the research in the
area. Moreover,
such a platform will enable researchers to develop new models  and algorithms,
rather than spending time implementing a deep learning system from scratch.
Furthermore, the need for scalability for such a platform is obvious.

=== Relationship with Other Apache Products ===
Apache Mahout and Apache Spark's ML-LIB are general machine learning
systems. Deep
learning algorithm can thus be implemented on these two platforms as
well. However, the there are differences in training efficiency,
scalability and
usability. Mahout and Spark ML-LIB follow models where their
nodes run synchronously. This is the fundamental difference to Singa who
follows the parameter server framework (like Google Brain and Microsoft
Adam). Singa can run synchronously or asynchronously. The asynchronous mode
is superior than the synchronous mode in terms of scalability. In
addition, Singa has some optimizations towards deep learning models
(e.g., model
parallelism, data parallelism and hybrid-parallelism) which make Singa
more efficient. We also provide ease of use programming model for deep
learning algorithms.

There are also plans for integration with Apache Hadoop's HDFS as
storage, to  handle large training data.
Specifically, we store the training data (e.g., images or raw features of
images) in HDFS, then (pre-)fetch them online.
We will also explore integration with Hadoop's Yarn and Apache Mesos
to do resource management.


== Documentation ==
The project is hosted at
http://www.comp.nus.edu.sg/~dbsystem/project/singa.html.
Documentations can be found at the Github Wiki Page:
https://github.com/nusinga/singa/wiki.
We continue to refine and improve the documentation.

== Initial Source ==
We use Github to maintain our source code, https://github.com/nusinga/singa

== Source and Intellectual Property Submission Plan ==
We plan to make our code base be under Apache License, Version 2.0.

== External Dependencies ==
 * required by the core code base: glog, gflags, google protobuf,
open-blas, mpich, armci-mpi.
 * required by data preparation and preprocessing: opencv, hdfs, python.

== Cryptography ==
Not Applicable

== Required Resources ==
=== Mailing Lists ===
Currently, we use google group for internal discussion. The mailing address is
nusinga@googlegroup.com. We will migrate the content to the apache mailing
lists in the future.

 * singa-dev
 * singa-user
 * singa-commits
 * singa-private (for private discussion within PCM)

=== Git Repository ===
We want to continue using git for version control. Hence, a git repo
is required.

=== Issue Tracking ===
JIRA Singa (SINGA)

== Initial Committers ==
 * Beng Chin Ooi (ooibc @comp.nus.edu.sg)
 * Kian Lee Tan (tankl @comp.nus.edu.sg)
 * Gang Chen (cg @zju.edu.cn)
 * Wei Wang (wangwei @comp.nus.edu.sg)
 * Dinh Tien Tuan Anh (dinhtta @comp.nus.edu.sg)
 * Jinyang Gao (jinyang.gao @comp.nus.edu.sg)
 * Sheng Wang (wangsh @comp.nus.edu.sg)
 * Kaiping Zheng (kaiping @comp.nus.edu.sg)
 * Zhaojing Luo (zhaojing @comp.nus.edu.sg)
 * Zhongle Xie (zhongle @comp.nus.edu.sg)

== Affiliations ==
 * Beng Chin Ooi, National University of Singapore
 * Kian Lee Tan, National University of Singapore
 * Gang Chen, Zhejiang University
 * Wei Wang, National University of Singapore
 * Dinh Tien Tuan Anh, National University of Singapore
 * Jinyang Gao, National University of Singapore
 * Sheng Wang, National University of Singapore
 * Kaiping Zheng, National University of Singapore
 * Zhaojing Luo, National University of Singapore
 * Zhongle Xie, National University of Singapore

== Sponsors ==
===  Champion ===
Thejas Nair (thejas at apache.org)

=== Nominated Mentors ===
 * Thejas Nair (thejas at apache.org)
 * Alan Gates (gates at apache dot org)
 * Daniel Dai (daijy at apache dot org)
 * Ted Dunning (tdunning at apache dot org)

=== Sponsoring Entity ===
We are requesting the Incubator to sponsor this project.

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [VOTE] Accept Apache Singa as incubator project

Posted by Thejas Nair <th...@gmail.com>.

Thanks for the input on the name. Naming is always tricky!
We can look into that as the project goes through incubation and
before it graduates.
Apache guideline on naming - http://www.apache.org/foundation/marks/naming.html

On Tue, Mar 10, 2015 at 3:52 PM, Olemis Lang <ol...@gmail.com> wrote:
> On 3/10/15, Thejas Nair <th...@gmail.com> wrote:
>> The Singa Incubator Proposal document has been updated based on
>> feedback in the proposal thread.
>>
>
> I do not know if this matters at all but JFYI , <<singa>> is considered
> as an obscene word by native Spanish speakers in quite a few regions .
>
> [...]
>
> --
> Regards,
>
> Olemis - @olemislc
>
> Apache(tm) Bloodhound contributor
> http://issues.apache.org/bloodhound
> http://blood-hound.net
>
> Blog ES: http://simelo-es.blogspot.com/
> Blog EN: http://simelo-en.blogspot.com/
>
> Featured article:
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [VOTE] Accept Apache Singa as incubator project

Posted by Thejas Nair <th...@gmail.com>.

Thanks for the feedback Bertrand!
Yes, I agree it makes sense to start with a single user and dev
mailing list. I got this feedback during the proposal phase, but I
forgot to update the proposal.


On Wed, Mar 11, 2015 at 2:42 AM, Bertrand Delacretaz
<bd...@apache.org> wrote:
> On Tue, Mar 10, 2015 at 11:52 PM, Olemis Lang <ol...@gmail.com> wrote:
>> ...I do not know if this matters at all but JFYI , <<singa>> is considered
>> as an obscene word by native Spanish speakers in quite a few regions ....
>
> It does matter in terms of "marketing" IMO.
>
> Also, dunno if that's been discussed already and it's just a detail
> but in general I recommend starting without a user mailing list, and
> creating only if dev list traffic becomes a problem.
>
> Apart from that +1 to incubation.
>
> -Bertrand
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [VOTE] Accept Apache Singa as incubator project

Posted by Bertrand Delacretaz <bd...@apache.org>.

On Tue, Mar 10, 2015 at 11:52 PM, Olemis Lang <ol...@gmail.com> wrote:
> ...I do not know if this matters at all but JFYI , <<singa>> is considered
> as an obscene word by native Spanish speakers in quite a few regions ....

It does matter in terms of "marketing" IMO.

Also, dunno if that's been discussed already and it's just a detail
but in general I recommend starting without a user mailing list, and
creating only if dev list traffic becomes a problem.

Apart from that +1 to incubation.

-Bertrand

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [VOTE] Accept Apache Singa as incubator project

Posted by Olemis Lang <ol...@gmail.com>.

On 3/10/15, Thejas Nair <th...@gmail.com> wrote:
> The Singa Incubator Proposal document has been updated based on
> feedback in the proposal thread.
>

I do not know if this matters at all but JFYI , <<singa>> is considered
as an obscene word by native Spanish speakers in quite a few regions .

[...]

-- 
Regards,

Olemis - @olemislc

Apache(tm) Bloodhound contributor
http://issues.apache.org/bloodhound
http://blood-hound.net

Blog ES: http://simelo-es.blogspot.com/
Blog EN: http://simelo-en.blogspot.com/

Featured article:

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [VOTE] Accept Apache Singa as incubator project

Posted by Daniel Dai <da...@gmail.com>.

+1

On Tue, Mar 10, 2015 at 7:33 AM, Thejas Nair <th...@gmail.com> wrote:

> The Singa Incubator Proposal document has been updated based on
> feedback in the proposal thread.
>
> This vote is proposing the inclusion of Apache Singa as incubator project.
> The vote will run for at least 72 hours.
>
> [ ] +1 Accept Apache Singa into the Incubator
> [ ] +0 Don’t care.
> [ ] -1 Don’t accept Apache Singa into the Incubator because..
>
> Please vote !
>
> Here is my +1 .
>
> Link to version of proposal being voted on :
> https://wiki.apache.org/incubator/SingaProposal?action=recall&rev=10
>
> The text is below
> ----------------------------------------------
>
> = Singa Incubator Proposal =
> == Abstract ==
> SINGA is a distributed deep learning platform.
>
> == Proposal ==
> SINGA is an efficient, scalable and easy-to-use distributed platform
> for training deep learning models, e.g., Deep Convolutional Neural Network
> and
> Deep Belief Network. It parallelizes the computation (i.e., training) onto
> a
> cluster of nodes by distributing the training data and model automatically
> to
> speed up the training. Built-in training algorithms like Back-Propagation
> and
> Contrastive Divergence are implemented based on common abstractions of deep
> learning models. Users can train their own deep learning models by simply
> customizing these abstractions like implementing the Mapper and
> Reducer in Hadoop.
>
> == Background ==
> Deep learning refers to a set of feature (or representation) learning
> models
> that consist of multiple (non-linear) layers, where different layers learn
> different levels of abstractions (representations) of the raw input data.
> Larger (in terms of model parameters) and deeper (in terms of number of
> layers)
> models have shown better performance, e.g., lower image classification
> error in
> Large Scale Visual Recognition Challenge. However, a larger model requires
> more
> memory and larger training data to reduce over-fitting. Complex
> numeric operations
> make the training computation intensive. In practice, training large
> deep learning
> models takes weeks or months on a single node (even with GPU).
>
> == Rational ==
> Deep learning has gained a lot of attraction in both academia and
> industry due to
> its success in a wide range of areas such as computer vision and
> speech recognition.
> However, training of such models is computationally expensive,
> especially for large
> and deep models (e.g., with billions of parameters and more than 10
> layers). Both
> Google and Microsoft have developed distributed deep learning systems
> to make the
> training more efficient by distributing the computations within a
> cluster of nodes.
> However, these systems are closed source softwares. Our goal is to
> leverage the
> community of open source developers to make SINGA efficient, scalable
> and easy to
> use. SINGA is a full fledged distributed platform, that could benefit the
> community and also benefit from the community in their involvement in
> contributing
> to the further work in this area. We believe the nature of SINGA and our
> visions
> for the system fit naturally to Apache's philosophy and development
> framework.
>
> == Initial Goals ==
> We have developed a system for SINGA running on a commodity computer
> cluster. The initial goals include,
>  * improving the system in terms of scalability and efficiency, e.g.,
> using Infiniband for network communication and multi-threading for one
> node computation. We would consider extending SINGA to GPU clusters
> later.
>  * benchmarking with larger datasets (hundreds of millions of training
> instances) and models (billions of parameters).
>  * adding more built-in deep learning models. Users can train the
> built-in models on their datasets directly.
>
>
> == Current Status ==
> === Meritocracy ===
> We would like to follow ASF meritocratic principles to encourage more
> developers
> to contribute in this project. We know that only active and excellent
> developers
> can make SINGA a successful project. The committer list and PMC will be
> updated
> based on developers' performance and commitment. We are also improving the
> documentation and code to help new developers get started quickly.
>
> === Community ===
> SINGA is currently being developed in the Database System Research Lab at
> the
> National University of Singapore (NUS) in collaboration with Zhejiang
> University in China.
> Our lab has extensive experience in building database related systems,
> including
> distributed systems. Six PhD students and research assistants (Jinyang Gao,
> Kaiping Zheng, Sheng Wang, Wei Wang, Zhaojing Luo and Zhongle Xie) , a
> research
> fellow (Anh Dinh) and three professors (Beng Chin Ooi, Gang Chen, Kian Lee
> Tan)
> have been working for a year on this project. We are open to recruiting
> more
> developers from diverse backgrounds.
>
> === Core Developers ===
> Beng Chin Ooi, Gang Chen and Kian Lee Tan are professors who have worked on
> distributed systems for more than 20 years. They have collaborated with the
> industry and have built various large scale systems. Anh Dinh's research
> is also
> on distributed systems, albeit with more focus on security aspects. Wei
> Wang's
> research is on deep learning problems including deep learning applications
> and
> large scale training. Sheng Wang and Jinyang are working on efficient
> indexing,
> querying of large scale data and machine learning. Kaiping, Zhaojing and
> Zhongle
> are new PhD students who jointed SINGA recently. They will work on this
> project
> for a longer time (next 4-5 years). While we share common research
> interests,
> each member also brings diverse expertise to the team.
>
> === Alignment ===
> ASF is already the home of many distributed platforms, e.g., Hadoop, Spark
> and
> Mahout, each of which targets a different application domain. SINGA, being
> a
> distributed platform for large-scale deep learning, focuses on another
> important
> domain for which there still lacks a robust and scalable open-source
> platform.
> The recent success of deep learning models especially for vision and speech
> recognition tasks has generated interests in both applying existing
> deep learning
> models and in developing new ones. Thus, an open-source platform for deep
> learning will be able to attract a large community of users and developers.
> SINGA is a complex system needing many iterations of design,
> implementation and
> testing. Apache's collaboration framework which encourages active
> contribution
> from developers will inevitably help improve the quality of the system, as
> shown
> in the success of Hadoop, Spark, etc.. Equally important is the community
> of
> users which helps identify real-life applications of deep learning, and
> helps
> to evaluate the system's performance and ease-of-use. We hope to
> leverage ASF for
> coordinating and promoting both communities, and in return benefit the
> communities
> with another useful tool.
>
> == Known Risks ==
> === Orphaned products ===
> Four core developers (Anh, Wei Wang, Jinyang and Sheng Wang) may leave the
> lab in two to four years time. It is possible that some of them may
> not have enough
> time to focus on this project after that. But, SINGA is part of our other
> bigger
> research projects on building an infrastructure for data intensive
> applications,
> which include health-care analytics and brain-inspired computing. Beng
> Chin and
> Kian Lee would continue working on it and getting more people
> involved. For example,
> three new developers (Kaiping, Zhaojing and Zhongle) joined us recently.
> Individual developers are welcome to make SINGA a diverse community
> that is robust and independent from any single developer.
>
> === Inexperience with Open Source ===
> All the developers are active users and followers of open source projects.
> Our
> research lab has a strong commitment to open source, and has released the
> source
> code of several systems under open source license as a way of contributing
> back
> to the open source community. But we do not have much real experience
> in open source
> projects with large and well organized communities like those in Apache.
> This is
> one reason we choose Apache which is experienced in open source
> project incubation.
> We hope to get the help from Apache (e.g., champion and mentors) to
> establish a
> healthy path for SINGA.
>
> === Homogenous Developers ===
> Although the current developers are researchers in the universities, they
> have
> different research interests and project experiences, as mentioned in
> the section
> that introduces the core developers. We know that a diverse community
> is helpful.
> Hence we are open to the idea of recruiting developers from other
> regions and organizations.
>
> === Reliance on Salaried Developers ===
> As a research project in the university, SINGA's current developing
> community
> consists of professors, PhD students, research assistants and
> postdoctoral fellows.
> They are driven by their interests to work on this project and have
> contributed
> actively since the start of the project. The research assistants and
> fellows are
> expected to leave when their contracts expire. However, they are keen
> to continue
> to work on the project voluntarily. Moreover, as a long term research
> project, new
> research assistants and fellows are likely to join the project.
>
> === A Excessive Fascination with the Apache Brand ===
> We choose Apache not for publicity. We have two purposes. First, we want to
> leverage Apache's reputation to recruit more developers to make a diverse
> community. Second, we hope that Apache can help us to establish a healthy
> path
> in developing SINGA. Beng Chin and Kian-Lee are established database and
> distributed system researchers, and together with the other contributors,
> they
> sincerely believe that there is a need for a widely accepted open source
> distributed deep learning platform. The field of deep learning is still at
> its
> infancy, and an open source platform will fuel the research in the
> area. Moreover,
> such a platform will enable researchers to develop new models  and
> algorithms,
> rather than spending time implementing a deep learning system from scratch.
> Furthermore, the need for scalability for such a platform is obvious.
>
> === Relationship with Other Apache Products ===
> Apache Mahout and Apache Spark's ML-LIB are general machine learning
> systems. Deep
> learning algorithm can thus be implemented on these two platforms as
> well. However, the there are differences in training efficiency,
> scalability and
> usability. Mahout and Spark ML-LIB follow models where their
> nodes run synchronously. This is the fundamental difference to Singa who
> follows the parameter server framework (like Google Brain and Microsoft
> Adam). Singa can run synchronously or asynchronously. The asynchronous mode
> is superior than the synchronous mode in terms of scalability. In
> addition, Singa has some optimizations towards deep learning models
> (e.g., model
> parallelism, data parallelism and hybrid-parallelism) which make Singa
> more efficient. We also provide ease of use programming model for deep
> learning algorithms.
>
> There are also plans for integration with Apache Hadoop's HDFS as
> storage, to  handle large training data.
> Specifically, we store the training data (e.g., images or raw features of
> images) in HDFS, then (pre-)fetch them online.
> We will also explore integration with Hadoop's Yarn and Apache Mesos
> to do resource management.
>
>
> == Documentation ==
> The project is hosted at
> http://www.comp.nus.edu.sg/~dbsystem/project/singa.html.
> Documentations can be found at the Github Wiki Page:
> https://github.com/nusinga/singa/wiki.
> We continue to refine and improve the documentation.
>
> == Initial Source ==
> We use Github to maintain our source code,
> https://github.com/nusinga/singa
>
> == Source and Intellectual Property Submission Plan ==
> We plan to make our code base be under Apache License, Version 2.0.
>
> == External Dependencies ==
>  * required by the core code base: glog, gflags, google protobuf,
> open-blas, mpich, armci-mpi.
>  * required by data preparation and preprocessing: opencv, hdfs, python.
>
> == Cryptography ==
> Not Applicable
>
> == Required Resources ==
> === Mailing Lists ===
> Currently, we use google group for internal discussion. The mailing
> address is
> nusinga@googlegroup.com. We will migrate the content to the apache mailing
> lists in the future.
>
>  * singa-dev
>  * singa-user
>  * singa-commits
>  * singa-private (for private discussion within PCM)
>
> === Git Repository ===
> We want to continue using git for version control. Hence, a git repo
> is required.
>
> === Issue Tracking ===
> JIRA Singa (SINGA)
>
> == Initial Committers ==
>  * Beng Chin Ooi (ooibc @comp.nus.edu.sg)
>  * Kian Lee Tan (tankl @comp.nus.edu.sg)
>  * Gang Chen (cg @zju.edu.cn)
>  * Wei Wang (wangwei @comp.nus.edu.sg)
>  * Dinh Tien Tuan Anh (dinhtta @comp.nus.edu.sg)
>  * Jinyang Gao (jinyang.gao @comp.nus.edu.sg)
>  * Sheng Wang (wangsh @comp.nus.edu.sg)
>  * Kaiping Zheng (kaiping @comp.nus.edu.sg)
>  * Zhaojing Luo (zhaojing @comp.nus.edu.sg)
>  * Zhongle Xie (zhongle @comp.nus.edu.sg)
>
> == Affiliations ==
>  * Beng Chin Ooi, National University of Singapore
>  * Kian Lee Tan, National University of Singapore
>  * Gang Chen, Zhejiang University
>  * Wei Wang, National University of Singapore
>  * Dinh Tien Tuan Anh, National University of Singapore
>  * Jinyang Gao, National University of Singapore
>  * Sheng Wang, National University of Singapore
>  * Kaiping Zheng, National University of Singapore
>  * Zhaojing Luo, National University of Singapore
>  * Zhongle Xie, National University of Singapore
>
> == Sponsors ==
> ===  Champion ===
> Thejas Nair (thejas at apache.org)
>
> === Nominated Mentors ===
>  * Thejas Nair (thejas at apache.org)
>  * Alan Gates (gates at apache dot org)
>  * Daniel Dai (daijy at apache dot org)
>  * Ted Dunning (tdunning at apache dot org)
>
> === Sponsoring Entity ===
> We are requesting the Incubator to sponsor this project.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>

Re: [VOTE] Accept Apache Singa as incubator project

Posted by "Alan D. Cabrera" <li...@toolazydogs.com>.

+1


Regards,
Alan

> On Mar 10, 2015, at 7:33 AM, Thejas Nair <th...@gmail.com> wrote:
> 
> The Singa Incubator Proposal document has been updated based on
> feedback in the proposal thread.
> 
> This vote is proposing the inclusion of Apache Singa as incubator project.
> The vote will run for at least 72 hours.
> 
> [ ] +1 Accept Apache Singa into the Incubator
> [ ] +0 Don’t care.
> [ ] -1 Don’t accept Apache Singa into the Incubator because..
> 
> Please vote !
> 
> Here is my +1 .
> 
> Link to version of proposal being voted on :
> https://wiki.apache.org/incubator/SingaProposal?action=recall&rev=10
> 
> The text is below
> ----------------------------------------------
> 
> = Singa Incubator Proposal =
> == Abstract ==
> SINGA is a distributed deep learning platform.
> 
> == Proposal ==
> SINGA is an efficient, scalable and easy-to-use distributed platform
> for training deep learning models, e.g., Deep Convolutional Neural Network and
> Deep Belief Network. It parallelizes the computation (i.e., training) onto a
> cluster of nodes by distributing the training data and model automatically to
> speed up the training. Built-in training algorithms like Back-Propagation and
> Contrastive Divergence are implemented based on common abstractions of deep
> learning models. Users can train their own deep learning models by simply
> customizing these abstractions like implementing the Mapper and
> Reducer in Hadoop.
> 
> == Background ==
> Deep learning refers to a set of feature (or representation) learning models
> that consist of multiple (non-linear) layers, where different layers learn
> different levels of abstractions (representations) of the raw input data.
> Larger (in terms of model parameters) and deeper (in terms of number of layers)
> models have shown better performance, e.g., lower image classification error in
> Large Scale Visual Recognition Challenge. However, a larger model requires more
> memory and larger training data to reduce over-fitting. Complex
> numeric operations
> make the training computation intensive. In practice, training large
> deep learning
> models takes weeks or months on a single node (even with GPU).
> 
> == Rational ==
> Deep learning has gained a lot of attraction in both academia and
> industry due to
> its success in a wide range of areas such as computer vision and
> speech recognition.
> However, training of such models is computationally expensive,
> especially for large
> and deep models (e.g., with billions of parameters and more than 10
> layers). Both
> Google and Microsoft have developed distributed deep learning systems
> to make the
> training more efficient by distributing the computations within a
> cluster of nodes.
> However, these systems are closed source softwares. Our goal is to leverage the
> community of open source developers to make SINGA efficient, scalable
> and easy to
> use. SINGA is a full fledged distributed platform, that could benefit the
> community and also benefit from the community in their involvement in
> contributing
> to the further work in this area. We believe the nature of SINGA and our visions
> for the system fit naturally to Apache's philosophy and development framework.
> 
> == Initial Goals ==
> We have developed a system for SINGA running on a commodity computer
> cluster. The initial goals include,
> * improving the system in terms of scalability and efficiency, e.g.,
> using Infiniband for network communication and multi-threading for one
> node computation. We would consider extending SINGA to GPU clusters
> later.
> * benchmarking with larger datasets (hundreds of millions of training
> instances) and models (billions of parameters).
> * adding more built-in deep learning models. Users can train the
> built-in models on their datasets directly.
> 
> 
> == Current Status ==
> === Meritocracy ===
> We would like to follow ASF meritocratic principles to encourage more developers
> to contribute in this project. We know that only active and excellent developers
> can make SINGA a successful project. The committer list and PMC will be updated
> based on developers' performance and commitment. We are also improving the
> documentation and code to help new developers get started quickly.
> 
> === Community ===
> SINGA is currently being developed in the Database System Research Lab at the
> National University of Singapore (NUS) in collaboration with Zhejiang
> University in China.
> Our lab has extensive experience in building database related systems, including
> distributed systems. Six PhD students and research assistants (Jinyang Gao,
> Kaiping Zheng, Sheng Wang, Wei Wang, Zhaojing Luo and Zhongle Xie) , a research
> fellow (Anh Dinh) and three professors (Beng Chin Ooi, Gang Chen, Kian Lee Tan)
> have been working for a year on this project. We are open to recruiting more
> developers from diverse backgrounds.
> 
> === Core Developers ===
> Beng Chin Ooi, Gang Chen and Kian Lee Tan are professors who have worked on
> distributed systems for more than 20 years. They have collaborated with the
> industry and have built various large scale systems. Anh Dinh's research is also
> on distributed systems, albeit with more focus on security aspects. Wei Wang's
> research is on deep learning problems including deep learning applications and
> large scale training. Sheng Wang and Jinyang are working on efficient indexing,
> querying of large scale data and machine learning. Kaiping, Zhaojing and Zhongle
> are new PhD students who jointed SINGA recently. They will work on this project
> for a longer time (next 4-5 years). While we share common research interests,
> each member also brings diverse expertise to the team.
> 
> === Alignment ===
> ASF is already the home of many distributed platforms, e.g., Hadoop, Spark and
> Mahout, each of which targets a different application domain. SINGA, being a
> distributed platform for large-scale deep learning, focuses on another important
> domain for which there still lacks a robust and scalable open-source platform.
> The recent success of deep learning models especially for vision and speech
> recognition tasks has generated interests in both applying existing
> deep learning
> models and in developing new ones. Thus, an open-source platform for deep
> learning will be able to attract a large community of users and developers.
> SINGA is a complex system needing many iterations of design, implementation and
> testing. Apache's collaboration framework which encourages active contribution
> from developers will inevitably help improve the quality of the system, as shown
> in the success of Hadoop, Spark, etc.. Equally important is the community of
> users which helps identify real-life applications of deep learning, and helps
> to evaluate the system's performance and ease-of-use. We hope to
> leverage ASF for
> coordinating and promoting both communities, and in return benefit the
> communities
> with another useful tool.
> 
> == Known Risks ==
> === Orphaned products ===
> Four core developers (Anh, Wei Wang, Jinyang and Sheng Wang) may leave the
> lab in two to four years time. It is possible that some of them may
> not have enough
> time to focus on this project after that. But, SINGA is part of our other bigger
> research projects on building an infrastructure for data intensive applications,
> which include health-care analytics and brain-inspired computing. Beng Chin and
> Kian Lee would continue working on it and getting more people
> involved. For example,
> three new developers (Kaiping, Zhaojing and Zhongle) joined us recently.
> Individual developers are welcome to make SINGA a diverse community
> that is robust and independent from any single developer.
> 
> === Inexperience with Open Source ===
> All the developers are active users and followers of open source projects. Our
> research lab has a strong commitment to open source, and has released the source
> code of several systems under open source license as a way of contributing back
> to the open source community. But we do not have much real experience
> in open source
> projects with large and well organized communities like those in Apache. This is
> one reason we choose Apache which is experienced in open source
> project incubation.
> We hope to get the help from Apache (e.g., champion and mentors) to establish a
> healthy path for SINGA.
> 
> === Homogenous Developers ===
> Although the current developers are researchers in the universities, they have
> different research interests and project experiences, as mentioned in
> the section
> that introduces the core developers. We know that a diverse community
> is helpful.
> Hence we are open to the idea of recruiting developers from other
> regions and organizations.
> 
> === Reliance on Salaried Developers ===
> As a research project in the university, SINGA's current developing community
> consists of professors, PhD students, research assistants and
> postdoctoral fellows.
> They are driven by their interests to work on this project and have contributed
> actively since the start of the project. The research assistants and fellows are
> expected to leave when their contracts expire. However, they are keen
> to continue
> to work on the project voluntarily. Moreover, as a long term research
> project, new
> research assistants and fellows are likely to join the project.
> 
> === A Excessive Fascination with the Apache Brand ===
> We choose Apache not for publicity. We have two purposes. First, we want to
> leverage Apache's reputation to recruit more developers to make a diverse
> community. Second, we hope that Apache can help us to establish a healthy path
> in developing SINGA. Beng Chin and Kian-Lee are established database and
> distributed system researchers, and together with the other contributors, they
> sincerely believe that there is a need for a widely accepted open source
> distributed deep learning platform. The field of deep learning is still at its
> infancy, and an open source platform will fuel the research in the
> area. Moreover,
> such a platform will enable researchers to develop new models  and algorithms,
> rather than spending time implementing a deep learning system from scratch.
> Furthermore, the need for scalability for such a platform is obvious.
> 
> === Relationship with Other Apache Products ===
> Apache Mahout and Apache Spark's ML-LIB are general machine learning
> systems. Deep
> learning algorithm can thus be implemented on these two platforms as
> well. However, the there are differences in training efficiency,
> scalability and
> usability. Mahout and Spark ML-LIB follow models where their
> nodes run synchronously. This is the fundamental difference to Singa who
> follows the parameter server framework (like Google Brain and Microsoft
> Adam). Singa can run synchronously or asynchronously. The asynchronous mode
> is superior than the synchronous mode in terms of scalability. In
> addition, Singa has some optimizations towards deep learning models
> (e.g., model
> parallelism, data parallelism and hybrid-parallelism) which make Singa
> more efficient. We also provide ease of use programming model for deep
> learning algorithms.
> 
> There are also plans for integration with Apache Hadoop's HDFS as
> storage, to  handle large training data.
> Specifically, we store the training data (e.g., images or raw features of
> images) in HDFS, then (pre-)fetch them online.
> We will also explore integration with Hadoop's Yarn and Apache Mesos
> to do resource management.
> 
> 
> == Documentation ==
> The project is hosted at
> http://www.comp.nus.edu.sg/~dbsystem/project/singa.html.
> Documentations can be found at the Github Wiki Page:
> https://github.com/nusinga/singa/wiki.
> We continue to refine and improve the documentation.
> 
> == Initial Source ==
> We use Github to maintain our source code, https://github.com/nusinga/singa
> 
> == Source and Intellectual Property Submission Plan ==
> We plan to make our code base be under Apache License, Version 2.0.
> 
> == External Dependencies ==
> * required by the core code base: glog, gflags, google protobuf,
> open-blas, mpich, armci-mpi.
> * required by data preparation and preprocessing: opencv, hdfs, python.
> 
> == Cryptography ==
> Not Applicable
> 
> == Required Resources ==
> === Mailing Lists ===
> Currently, we use google group for internal discussion. The mailing address is
> nusinga@googlegroup.com. We will migrate the content to the apache mailing
> lists in the future.
> 
> * singa-dev
> * singa-user
> * singa-commits
> * singa-private (for private discussion within PCM)
> 
> === Git Repository ===
> We want to continue using git for version control. Hence, a git repo
> is required.
> 
> === Issue Tracking ===
> JIRA Singa (SINGA)
> 
> == Initial Committers ==
> * Beng Chin Ooi (ooibc @comp.nus.edu.sg)
> * Kian Lee Tan (tankl @comp.nus.edu.sg)
> * Gang Chen (cg @zju.edu.cn)
> * Wei Wang (wangwei @comp.nus.edu.sg)
> * Dinh Tien Tuan Anh (dinhtta @comp.nus.edu.sg)
> * Jinyang Gao (jinyang.gao @comp.nus.edu.sg)
> * Sheng Wang (wangsh @comp.nus.edu.sg)
> * Kaiping Zheng (kaiping @comp.nus.edu.sg)
> * Zhaojing Luo (zhaojing @comp.nus.edu.sg)
> * Zhongle Xie (zhongle @comp.nus.edu.sg)
> 
> == Affiliations ==
> * Beng Chin Ooi, National University of Singapore
> * Kian Lee Tan, National University of Singapore
> * Gang Chen, Zhejiang University
> * Wei Wang, National University of Singapore
> * Dinh Tien Tuan Anh, National University of Singapore
> * Jinyang Gao, National University of Singapore
> * Sheng Wang, National University of Singapore
> * Kaiping Zheng, National University of Singapore
> * Zhaojing Luo, National University of Singapore
> * Zhongle Xie, National University of Singapore
> 
> == Sponsors ==
> ===  Champion ===
> Thejas Nair (thejas at apache.org)
> 
> === Nominated Mentors ===
> * Thejas Nair (thejas at apache.org)
> * Alan Gates (gates at apache dot org)
> * Daniel Dai (daijy at apache dot org)
> * Ted Dunning (tdunning at apache dot org)
> 
> === Sponsoring Entity ===
> We are requesting the Incubator to sponsor this project.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [VOTE] Accept Apache Singa as incubator project

Posted by Konstantin Boudnik <co...@apache.org>.

+1

The diversity should be closely eximined if by the graduation time the
situation hasn't improved.

Cos

On Tue, Mar 10, 2015 at 12:17PM, Ted Dunning wrote:
> +1
> 
> I am not nearly as worried about the committer diversity, certainly not
> relative to entry into incubator.  This is a great project that has already
> shown some very strong willingness to work with others in the short time I
> have interacted with them.
> 
> 
> On Tue, Mar 10, 2015 at 11:49 AM, Thejas Nair <th...@gmail.com> wrote:
> 
> > Thanks for raising this issue. I agree that committer diversity is
> > important for long term success of a project. I think that should be a
> > criteria for graduation from incubator.
> > I think it is going to be more easier to find new contributors as an Apache
> > incubator project.
> >
> >
> > On Tue, Mar 10, 2015 at 9:09 AM, jan i <ja...@apache.org> wrote:
> >
> > >
> > > +0 I am really concerned about the diversity of the initial committers,
> > > what happens if the university pulls the plug. I know we all say it will
> > > never happen, but it could happen.
> > >
> > > rgds
> > > jan i.
> > >
> > >
> > > On 10 March 2015 at 16:20, Alan Gates <al...@gmail.com> wrote:
> > >
> > >> +1
> > >>
> > >> Alan.
> > >>
> > >>   Thejas Nair <th...@gmail.com>
> > >>  March 10, 2015 at 7:33
> > >> The Singa Incubator Proposal document has been updated based on
> > >> feedback in the proposal thread.
> > >>
> > >> This vote is proposing the inclusion of Apache Singa as incubator
> > project.
> > >> The vote will run for at least 72 hours.
> > >>
> > >> [ ] +1 Accept Apache Singa into the Incubator
> > >> [ ] +0 Don’t care.
> > >> [ ] -1 Don’t accept Apache Singa into the Incubator because..
> > >>
> > >> Please vote !
> > >>
> > >> Here is my +1 .
> > >>
> > >> Link to version of proposal being voted on :
> > >> https://wiki.apache.org/incubator/SingaProposal?action=recall&rev=10
> > >>
> > >> The text is below
> > >> ----------------------------------------------
> > >>
> > >> = Singa Incubator Proposal =
> > >> == Abstract ==
> > >> SINGA is a distributed deep learning platform.
> > >>
> > >> == Proposal ==
> > >> SINGA is an efficient, scalable and easy-to-use distributed platform
> > >> for training deep learning models, e.g., Deep Convolutional Neural
> > >> Network and
> > >> Deep Belief Network. It parallelizes the computation (i.e., training)
> > >> onto a
> > >> cluster of nodes by distributing the training data and model
> > >> automatically to
> > >> speed up the training. Built-in training algorithms like
> > Back-Propagation
> > >> and
> > >> Contrastive Divergence are implemented based on common abstractions of
> > >> deep
> > >> learning models. Users can train their own deep learning models by
> > simply
> > >> customizing these abstractions like implementing the Mapper and
> > >> Reducer in Hadoop.
> > >>
> > >> == Background ==
> > >> Deep learning refers to a set of feature (or representation) learning
> > >> models
> > >> that consist of multiple (non-linear) layers, where different layers
> > learn
> > >> different levels of abstractions (representations) of the raw input
> > data.
> > >> Larger (in terms of model parameters) and deeper (in terms of number of
> > >> layers)
> > >> models have shown better performance, e.g., lower image classification
> > >> error in
> > >> Large Scale Visual Recognition Challenge. However, a larger model
> > >> requires more
> > >> memory and larger training data to reduce over-fitting. Complex
> > >> numeric operations
> > >> make the training computation intensive. In practice, training large
> > >> deep learning
> > >> models takes weeks or months on a single node (even with GPU).
> > >>
> > >> == Rational ==
> > >> Deep learning has gained a lot of attraction in both academia and
> > >> industry due to
> > >> its success in a wide range of areas such as computer vision and
> > >> speech recognition.
> > >> However, training of such models is computationally expensive,
> > >> especially for large
> > >> and deep models (e.g., with billions of parameters and more than 10
> > >> layers). Both
> > >> Google and Microsoft have developed distributed deep learning systems
> > >> to make the
> > >> training more efficient by distributing the computations within a
> > >> cluster of nodes.
> > >> However, these systems are closed source softwares. Our goal is to
> > >> leverage the
> > >> community of open source developers to make SINGA efficient, scalable
> > >> and easy to
> > >> use. SINGA is a full fledged distributed platform, that could benefit
> > the
> > >> community and also benefit from the community in their involvement in
> > >> contributing
> > >> to the further work in this area. We believe the nature of SINGA and our
> > >> visions
> > >> for the system fit naturally to Apache's philosophy and development
> > >> framework.
> > >>
> > >> == Initial Goals ==
> > >> We have developed a system for SINGA running on a commodity computer
> > >> cluster. The initial goals include,
> > >> * improving the system in terms of scalability and efficiency, e.g.,
> > >> using Infiniband for network communication and multi-threading for one
> > >> node computation. We would consider extending SINGA to GPU clusters
> > >> later.
> > >> * benchmarking with larger datasets (hundreds of millions of training
> > >> instances) and models (billions of parameters).
> > >> * adding more built-in deep learning models. Users can train the
> > >> built-in models on their datasets directly.
> > >>
> > >>
> > >> == Current Status ==
> > >> === Meritocracy ===
> > >> We would like to follow ASF meritocratic principles to encourage more
> > >> developers
> > >> to contribute in this project. We know that only active and excellent
> > >> developers
> > >> can make SINGA a successful project. The committer list and PMC will be
> > >> updated
> > >> based on developers' performance and commitment. We are also improving
> > the
> > >> documentation and code to help new developers get started quickly.
> > >>
> > >> === Community ===
> > >> SINGA is currently being developed in the Database System Research Lab
> > at
> > >> the
> > >> National University of Singapore (NUS) in collaboration with Zhejiang
> > >> University in China.
> > >> Our lab has extensive experience in building database related systems,
> > >> including
> > >> distributed systems. Six PhD students and research assistants (Jinyang
> > >> Gao,
> > >> Kaiping Zheng, Sheng Wang, Wei Wang, Zhaojing Luo and Zhongle Xie) , a
> > >> research
> > >> fellow (Anh Dinh) and three professors (Beng Chin Ooi, Gang Chen, Kian
> > >> Lee Tan)
> > >> have been working for a year on this project. We are open to recruiting
> > >> more
> > >> developers from diverse backgrounds.
> > >>
> > >> === Core Developers ===
> > >> Beng Chin Ooi, Gang Chen and Kian Lee Tan are professors who have worked
> > >> on
> > >> distributed systems for more than 20 years. They have collaborated with
> > >> the
> > >> industry and have built various large scale systems. Anh Dinh's research
> > >> is also
> > >> on distributed systems, albeit with more focus on security aspects. Wei
> > >> Wang's
> > >> research is on deep learning problems including deep learning
> > >> applications and
> > >> large scale training. Sheng Wang and Jinyang are working on efficient
> > >> indexing,
> > >> querying of large scale data and machine learning. Kaiping, Zhaojing and
> > >> Zhongle
> > >> are new PhD students who jointed SINGA recently. They will work on this
> > >> project
> > >> for a longer time (next 4-5 years). While we share common research
> > >> interests,
> > >> each member also brings diverse expertise to the team.
> > >>
> > >> === Alignment ===
> > >> ASF is already the home of many distributed platforms, e.g., Hadoop,
> > >> Spark and
> > >> Mahout, each of which targets a different application domain. SINGA,
> > >> being a
> > >> distributed platform for large-scale deep learning, focuses on another
> > >> important
> > >> domain for which there still lacks a robust and scalable open-source
> > >> platform.
> > >> The recent success of deep learning models especially for vision and
> > >> speech
> > >> recognition tasks has generated interests in both applying existing
> > >> deep learning
> > >> models and in developing new ones. Thus, an open-source platform for
> > deep
> > >> learning will be able to attract a large community of users and
> > >> developers.
> > >> SINGA is a complex system needing many iterations of design,
> > >> implementation and
> > >> testing. Apache's collaboration framework which encourages active
> > >> contribution
> > >> from developers will inevitably help improve the quality of the system,
> > >> as shown
> > >> in the success of Hadoop, Spark, etc.. Equally important is the
> > community
> > >> of
> > >> users which helps identify real-life applications of deep learning, and
> > >> helps
> > >> to evaluate the system's performance and ease-of-use. We hope to
> > >> leverage ASF for
> > >> coordinating and promoting both communities, and in return benefit the
> > >> communities
> > >> with another useful tool.
> > >>
> > >> == Known Risks ==
> > >> === Orphaned products ===
> > >> Four core developers (Anh, Wei Wang, Jinyang and Sheng Wang) may leave
> > the
> > >> lab in two to four years time. It is possible that some of them may
> > >> not have enough
> > >> time to focus on this project after that. But, SINGA is part of our
> > other
> > >> bigger
> > >> research projects on building an infrastructure for data intensive
> > >> applications,
> > >> which include health-care analytics and brain-inspired computing. Beng
> > >> Chin and
> > >> Kian Lee would continue working on it and getting more people
> > >> involved. For example,
> > >> three new developers (Kaiping, Zhaojing and Zhongle) joined us recently.
> > >> Individual developers are welcome to make SINGA a diverse community
> > >> that is robust and independent from any single developer.
> > >>
> > >> === Inexperience with Open Source ===
> > >> All the developers are active users and followers of open source
> > >> projects. Our
> > >> research lab has a strong commitment to open source, and has released
> > the
> > >> source
> > >> code of several systems under open source license as a way of
> > >> contributing back
> > >> to the open source community. But we do not have much real experience
> > >> in open source
> > >> projects with large and well organized communities like those in Apache.
> > >> This is
> > >> one reason we choose Apache which is experienced in open source
> > >> project incubation.
> > >> We hope to get the help from Apache (e.g., champion and mentors) to
> > >> establish a
> > >> healthy path for SINGA.
> > >>
> > >> === Homogenous Developers ===
> > >> Although the current developers are researchers in the universities,
> > they
> > >> have
> > >> different research interests and project experiences, as mentioned in
> > >> the section
> > >> that introduces the core developers. We know that a diverse community
> > >> is helpful.
> > >> Hence we are open to the idea of recruiting developers from other
> > >> regions and organizations.
> > >>
> > >> === Reliance on Salaried Developers ===
> > >> As a research project in the university, SINGA's current developing
> > >> community
> > >> consists of professors, PhD students, research assistants and
> > >> postdoctoral fellows.
> > >> They are driven by their interests to work on this project and have
> > >> contributed
> > >> actively since the start of the project. The research assistants and
> > >> fellows are
> > >> expected to leave when their contracts expire. However, they are keen
> > >> to continue
> > >> to work on the project voluntarily. Moreover, as a long term research
> > >> project, new
> > >> research assistants and fellows are likely to join the project.
> > >>
> > >> === A Excessive Fascination with the Apache Brand ===
> > >> We choose Apache not for publicity. We have two purposes. First, we want
> > >> to
> > >> leverage Apache's reputation to recruit more developers to make a
> > diverse
> > >> community. Second, we hope that Apache can help us to establish a
> > healthy
> > >> path
> > >> in developing SINGA. Beng Chin and Kian-Lee are established database and
> > >> distributed system researchers, and together with the other
> > contributors,
> > >> they
> > >> sincerely believe that there is a need for a widely accepted open source
> > >> distributed deep learning platform. The field of deep learning is still
> > >> at its
> > >> infancy, and an open source platform will fuel the research in the
> > >> area. Moreover,
> > >> such a platform will enable researchers to develop new models and
> > >> algorithms,
> > >> rather than spending time implementing a deep learning system from
> > >> scratch.
> > >> Furthermore, the need for scalability for such a platform is obvious.
> > >>
> > >> === Relationship with Other Apache Products ===
> > >> Apache Mahout and Apache Spark's ML-LIB are general machine learning
> > >> systems. Deep
> > >> learning algorithm can thus be implemented on these two platforms as
> > >> well. However, the there are differences in training efficiency,
> > >> scalability and
> > >> usability. Mahout and Spark ML-LIB follow models where their
> > >> nodes run synchronously. This is the fundamental difference to Singa who
> > >> follows the parameter server framework (like Google Brain and Microsoft
> > >> Adam). Singa can run synchronously or asynchronously. The asynchronous
> > >> mode
> > >> is superior than the synchronous mode in terms of scalability. In
> > >> addition, Singa has some optimizations towards deep learning models
> > >> (e.g., model
> > >> parallelism, data parallelism and hybrid-parallelism) which make Singa
> > >> more efficient. We also provide ease of use programming model for deep
> > >> learning algorithms.
> > >>
> > >> There are also plans for integration with Apache Hadoop's HDFS as
> > >> storage, to handle large training data.
> > >> Specifically, we store the training data (e.g., images or raw features
> > of
> > >> images) in HDFS, then (pre-)fetch them online.
> > >> We will also explore integration with Hadoop's Yarn and Apache Mesos
> > >> to do resource management.
> > >>
> > >>
> > >> == Documentation ==
> > >> The project is hosted at
> > >> http://www.comp.nus.edu.sg/~dbsystem/project/singa.html.
> > >> Documentations can be found at the Github Wiki Page:
> > >> https://github.com/nusinga/singa/wiki.
> > >> We continue to refine and improve the documentation.
> > >>
> > >> == Initial Source ==
> > >> We use Github to maintain our source code,
> > >> https://github.com/nusinga/singa
> > >>
> > >> == Source and Intellectual Property Submission Plan ==
> > >> We plan to make our code base be under Apache License, Version 2.0.
> > >>
> > >> == External Dependencies ==
> > >> * required by the core code base: glog, gflags, google protobuf,
> > >> open-blas, mpich, armci-mpi.
> > >> * required by data preparation and preprocessing: opencv, hdfs, python.
> > >>
> > >> == Cryptography ==
> > >> Not Applicable
> > >>
> > >> == Required Resources ==
> > >> === Mailing Lists ===
> > >> Currently, we use google group for internal discussion. The mailing
> > >> address is
> > >> nusinga@googlegroup.com. We will migrate the content to the apache
> > >> mailing
> > >> lists in the future.
> > >>
> > >> * singa-dev
> > >> * singa-user
> > >> * singa-commits
> > >> * singa-private (for private discussion within PCM)
> > >>
> > >> === Git Repository ===
> > >> We want to continue using git for version control. Hence, a git repo
> > >> is required.
> > >>
> > >> === Issue Tracking ===
> > >> JIRA Singa (SINGA)
> > >>
> > >> == Initial Committers ==
> > >> * Beng Chin Ooi (ooibc @comp.nus.edu.sg)
> > >> * Kian Lee Tan (tankl @comp.nus.edu.sg)
> > >> * Gang Chen (cg @zju.edu.cn)
> > >> * Wei Wang (wangwei @comp.nus.edu.sg)
> > >> * Dinh Tien Tuan Anh (dinhtta @comp.nus.edu.sg)
> > >> * Jinyang Gao (jinyang.gao @comp.nus.edu.sg)
> > >> * Sheng Wang (wangsh @comp.nus.edu.sg)
> > >> * Kaiping Zheng (kaiping @comp.nus.edu.sg)
> > >> * Zhaojing Luo (zhaojing @comp.nus.edu.sg)
> > >> * Zhongle Xie (zhongle @comp.nus.edu.sg)
> > >>
> > >> == Affiliations ==
> > >> * Beng Chin Ooi, National University of Singapore
> > >> * Kian Lee Tan, National University of Singapore
> > >> * Gang Chen, Zhejiang University
> > >> * Wei Wang, National University of Singapore
> > >> * Dinh Tien Tuan Anh, National University of Singapore
> > >> * Jinyang Gao, National University of Singapore
> > >> * Sheng Wang, National University of Singapore
> > >> * Kaiping Zheng, National University of Singapore
> > >> * Zhaojing Luo, National University of Singapore
> > >> * Zhongle Xie, National University of Singapore
> > >>
> > >> == Sponsors ==
> > >> === Champion ===
> > >> Thejas Nair (thejas at apache.org)
> > >>
> > >> === Nominated Mentors ===
> > >> * Thejas Nair (thejas at apache.org)
> > >> * Alan Gates (gates at apache dot org)
> > >> * Daniel Dai (daijy at apache dot org)
> > >> * Ted Dunning (tdunning at apache dot org)
> > >>
> > >> === Sponsoring Entity ===
> > >> We are requesting the Incubator to sponsor this project.
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> > >> For additional commands, e-mail: general-help@incubator.apache.org
> > >>
> > >>
> > >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [VOTE] Accept Apache Singa as incubator project

Posted by Ted Dunning <te...@gmail.com>.

+1

I am not nearly as worried about the committer diversity, certainly not
relative to entry into incubator.  This is a great project that has already
shown some very strong willingness to work with others in the short time I
have interacted with them.


On Tue, Mar 10, 2015 at 11:49 AM, Thejas Nair <th...@gmail.com> wrote:

> Thanks for raising this issue. I agree that committer diversity is
> important for long term success of a project. I think that should be a
> criteria for graduation from incubator.
> I think it is going to be more easier to find new contributors as an Apache
> incubator project.
>
>
> On Tue, Mar 10, 2015 at 9:09 AM, jan i <ja...@apache.org> wrote:
>
> >
> > +0 I am really concerned about the diversity of the initial committers,
> > what happens if the university pulls the plug. I know we all say it will
> > never happen, but it could happen.
> >
> > rgds
> > jan i.
> >
> >
> > On 10 March 2015 at 16:20, Alan Gates <al...@gmail.com> wrote:
> >
> >> +1
> >>
> >> Alan.
> >>
> >>   Thejas Nair <th...@gmail.com>
> >>  March 10, 2015 at 7:33
> >> The Singa Incubator Proposal document has been updated based on
> >> feedback in the proposal thread.
> >>
> >> This vote is proposing the inclusion of Apache Singa as incubator
> project.
> >> The vote will run for at least 72 hours.
> >>
> >> [ ] +1 Accept Apache Singa into the Incubator
> >> [ ] +0 Don’t care.
> >> [ ] -1 Don’t accept Apache Singa into the Incubator because..
> >>
> >> Please vote !
> >>
> >> Here is my +1 .
> >>
> >> Link to version of proposal being voted on :
> >> https://wiki.apache.org/incubator/SingaProposal?action=recall&rev=10
> >>
> >> The text is below
> >> ----------------------------------------------
> >>
> >> = Singa Incubator Proposal =
> >> == Abstract ==
> >> SINGA is a distributed deep learning platform.
> >>
> >> == Proposal ==
> >> SINGA is an efficient, scalable and easy-to-use distributed platform
> >> for training deep learning models, e.g., Deep Convolutional Neural
> >> Network and
> >> Deep Belief Network. It parallelizes the computation (i.e., training)
> >> onto a
> >> cluster of nodes by distributing the training data and model
> >> automatically to
> >> speed up the training. Built-in training algorithms like
> Back-Propagation
> >> and
> >> Contrastive Divergence are implemented based on common abstractions of
> >> deep
> >> learning models. Users can train their own deep learning models by
> simply
> >> customizing these abstractions like implementing the Mapper and
> >> Reducer in Hadoop.
> >>
> >> == Background ==
> >> Deep learning refers to a set of feature (or representation) learning
> >> models
> >> that consist of multiple (non-linear) layers, where different layers
> learn
> >> different levels of abstractions (representations) of the raw input
> data.
> >> Larger (in terms of model parameters) and deeper (in terms of number of
> >> layers)
> >> models have shown better performance, e.g., lower image classification
> >> error in
> >> Large Scale Visual Recognition Challenge. However, a larger model
> >> requires more
> >> memory and larger training data to reduce over-fitting. Complex
> >> numeric operations
> >> make the training computation intensive. In practice, training large
> >> deep learning
> >> models takes weeks or months on a single node (even with GPU).
> >>
> >> == Rational ==
> >> Deep learning has gained a lot of attraction in both academia and
> >> industry due to
> >> its success in a wide range of areas such as computer vision and
> >> speech recognition.
> >> However, training of such models is computationally expensive,
> >> especially for large
> >> and deep models (e.g., with billions of parameters and more than 10
> >> layers). Both
> >> Google and Microsoft have developed distributed deep learning systems
> >> to make the
> >> training more efficient by distributing the computations within a
> >> cluster of nodes.
> >> However, these systems are closed source softwares. Our goal is to
> >> leverage the
> >> community of open source developers to make SINGA efficient, scalable
> >> and easy to
> >> use. SINGA is a full fledged distributed platform, that could benefit
> the
> >> community and also benefit from the community in their involvement in
> >> contributing
> >> to the further work in this area. We believe the nature of SINGA and our
> >> visions
> >> for the system fit naturally to Apache's philosophy and development
> >> framework.
> >>
> >> == Initial Goals ==
> >> We have developed a system for SINGA running on a commodity computer
> >> cluster. The initial goals include,
> >> * improving the system in terms of scalability and efficiency, e.g.,
> >> using Infiniband for network communication and multi-threading for one
> >> node computation. We would consider extending SINGA to GPU clusters
> >> later.
> >> * benchmarking with larger datasets (hundreds of millions of training
> >> instances) and models (billions of parameters).
> >> * adding more built-in deep learning models. Users can train the
> >> built-in models on their datasets directly.
> >>
> >>
> >> == Current Status ==
> >> === Meritocracy ===
> >> We would like to follow ASF meritocratic principles to encourage more
> >> developers
> >> to contribute in this project. We know that only active and excellent
> >> developers
> >> can make SINGA a successful project. The committer list and PMC will be
> >> updated
> >> based on developers' performance and commitment. We are also improving
> the
> >> documentation and code to help new developers get started quickly.
> >>
> >> === Community ===
> >> SINGA is currently being developed in the Database System Research Lab
> at
> >> the
> >> National University of Singapore (NUS) in collaboration with Zhejiang
> >> University in China.
> >> Our lab has extensive experience in building database related systems,
> >> including
> >> distributed systems. Six PhD students and research assistants (Jinyang
> >> Gao,
> >> Kaiping Zheng, Sheng Wang, Wei Wang, Zhaojing Luo and Zhongle Xie) , a
> >> research
> >> fellow (Anh Dinh) and three professors (Beng Chin Ooi, Gang Chen, Kian
> >> Lee Tan)
> >> have been working for a year on this project. We are open to recruiting
> >> more
> >> developers from diverse backgrounds.
> >>
> >> === Core Developers ===
> >> Beng Chin Ooi, Gang Chen and Kian Lee Tan are professors who have worked
> >> on
> >> distributed systems for more than 20 years. They have collaborated with
> >> the
> >> industry and have built various large scale systems. Anh Dinh's research
> >> is also
> >> on distributed systems, albeit with more focus on security aspects. Wei
> >> Wang's
> >> research is on deep learning problems including deep learning
> >> applications and
> >> large scale training. Sheng Wang and Jinyang are working on efficient
> >> indexing,
> >> querying of large scale data and machine learning. Kaiping, Zhaojing and
> >> Zhongle
> >> are new PhD students who jointed SINGA recently. They will work on this
> >> project
> >> for a longer time (next 4-5 years). While we share common research
> >> interests,
> >> each member also brings diverse expertise to the team.
> >>
> >> === Alignment ===
> >> ASF is already the home of many distributed platforms, e.g., Hadoop,
> >> Spark and
> >> Mahout, each of which targets a different application domain. SINGA,
> >> being a
> >> distributed platform for large-scale deep learning, focuses on another
> >> important
> >> domain for which there still lacks a robust and scalable open-source
> >> platform.
> >> The recent success of deep learning models especially for vision and
> >> speech
> >> recognition tasks has generated interests in both applying existing
> >> deep learning
> >> models and in developing new ones. Thus, an open-source platform for
> deep
> >> learning will be able to attract a large community of users and
> >> developers.
> >> SINGA is a complex system needing many iterations of design,
> >> implementation and
> >> testing. Apache's collaboration framework which encourages active
> >> contribution
> >> from developers will inevitably help improve the quality of the system,
> >> as shown
> >> in the success of Hadoop, Spark, etc.. Equally important is the
> community
> >> of
> >> users which helps identify real-life applications of deep learning, and
> >> helps
> >> to evaluate the system's performance and ease-of-use. We hope to
> >> leverage ASF for
> >> coordinating and promoting both communities, and in return benefit the
> >> communities
> >> with another useful tool.
> >>
> >> == Known Risks ==
> >> === Orphaned products ===
> >> Four core developers (Anh, Wei Wang, Jinyang and Sheng Wang) may leave
> the
> >> lab in two to four years time. It is possible that some of them may
> >> not have enough
> >> time to focus on this project after that. But, SINGA is part of our
> other
> >> bigger
> >> research projects on building an infrastructure for data intensive
> >> applications,
> >> which include health-care analytics and brain-inspired computing. Beng
> >> Chin and
> >> Kian Lee would continue working on it and getting more people
> >> involved. For example,
> >> three new developers (Kaiping, Zhaojing and Zhongle) joined us recently.
> >> Individual developers are welcome to make SINGA a diverse community
> >> that is robust and independent from any single developer.
> >>
> >> === Inexperience with Open Source ===
> >> All the developers are active users and followers of open source
> >> projects. Our
> >> research lab has a strong commitment to open source, and has released
> the
> >> source
> >> code of several systems under open source license as a way of
> >> contributing back
> >> to the open source community. But we do not have much real experience
> >> in open source
> >> projects with large and well organized communities like those in Apache.
> >> This is
> >> one reason we choose Apache which is experienced in open source
> >> project incubation.
> >> We hope to get the help from Apache (e.g., champion and mentors) to
> >> establish a
> >> healthy path for SINGA.
> >>
> >> === Homogenous Developers ===
> >> Although the current developers are researchers in the universities,
> they
> >> have
> >> different research interests and project experiences, as mentioned in
> >> the section
> >> that introduces the core developers. We know that a diverse community
> >> is helpful.
> >> Hence we are open to the idea of recruiting developers from other
> >> regions and organizations.
> >>
> >> === Reliance on Salaried Developers ===
> >> As a research project in the university, SINGA's current developing
> >> community
> >> consists of professors, PhD students, research assistants and
> >> postdoctoral fellows.
> >> They are driven by their interests to work on this project and have
> >> contributed
> >> actively since the start of the project. The research assistants and
> >> fellows are
> >> expected to leave when their contracts expire. However, they are keen
> >> to continue
> >> to work on the project voluntarily. Moreover, as a long term research
> >> project, new
> >> research assistants and fellows are likely to join the project.
> >>
> >> === A Excessive Fascination with the Apache Brand ===
> >> We choose Apache not for publicity. We have two purposes. First, we want
> >> to
> >> leverage Apache's reputation to recruit more developers to make a
> diverse
> >> community. Second, we hope that Apache can help us to establish a
> healthy
> >> path
> >> in developing SINGA. Beng Chin and Kian-Lee are established database and
> >> distributed system researchers, and together with the other
> contributors,
> >> they
> >> sincerely believe that there is a need for a widely accepted open source
> >> distributed deep learning platform. The field of deep learning is still
> >> at its
> >> infancy, and an open source platform will fuel the research in the
> >> area. Moreover,
> >> such a platform will enable researchers to develop new models and
> >> algorithms,
> >> rather than spending time implementing a deep learning system from
> >> scratch.
> >> Furthermore, the need for scalability for such a platform is obvious.
> >>
> >> === Relationship with Other Apache Products ===
> >> Apache Mahout and Apache Spark's ML-LIB are general machine learning
> >> systems. Deep
> >> learning algorithm can thus be implemented on these two platforms as
> >> well. However, the there are differences in training efficiency,
> >> scalability and
> >> usability. Mahout and Spark ML-LIB follow models where their
> >> nodes run synchronously. This is the fundamental difference to Singa who
> >> follows the parameter server framework (like Google Brain and Microsoft
> >> Adam). Singa can run synchronously or asynchronously. The asynchronous
> >> mode
> >> is superior than the synchronous mode in terms of scalability. In
> >> addition, Singa has some optimizations towards deep learning models
> >> (e.g., model
> >> parallelism, data parallelism and hybrid-parallelism) which make Singa
> >> more efficient. We also provide ease of use programming model for deep
> >> learning algorithms.
> >>
> >> There are also plans for integration with Apache Hadoop's HDFS as
> >> storage, to handle large training data.
> >> Specifically, we store the training data (e.g., images or raw features
> of
> >> images) in HDFS, then (pre-)fetch them online.
> >> We will also explore integration with Hadoop's Yarn and Apache Mesos
> >> to do resource management.
> >>
> >>
> >> == Documentation ==
> >> The project is hosted at
> >> http://www.comp.nus.edu.sg/~dbsystem/project/singa.html.
> >> Documentations can be found at the Github Wiki Page:
> >> https://github.com/nusinga/singa/wiki.
> >> We continue to refine and improve the documentation.
> >>
> >> == Initial Source ==
> >> We use Github to maintain our source code,
> >> https://github.com/nusinga/singa
> >>
> >> == Source and Intellectual Property Submission Plan ==
> >> We plan to make our code base be under Apache License, Version 2.0.
> >>
> >> == External Dependencies ==
> >> * required by the core code base: glog, gflags, google protobuf,
> >> open-blas, mpich, armci-mpi.
> >> * required by data preparation and preprocessing: opencv, hdfs, python.
> >>
> >> == Cryptography ==
> >> Not Applicable
> >>
> >> == Required Resources ==
> >> === Mailing Lists ===
> >> Currently, we use google group for internal discussion. The mailing
> >> address is
> >> nusinga@googlegroup.com. We will migrate the content to the apache
> >> mailing
> >> lists in the future.
> >>
> >> * singa-dev
> >> * singa-user
> >> * singa-commits
> >> * singa-private (for private discussion within PCM)
> >>
> >> === Git Repository ===
> >> We want to continue using git for version control. Hence, a git repo
> >> is required.
> >>
> >> === Issue Tracking ===
> >> JIRA Singa (SINGA)
> >>
> >> == Initial Committers ==
> >> * Beng Chin Ooi (ooibc @comp.nus.edu.sg)
> >> * Kian Lee Tan (tankl @comp.nus.edu.sg)
> >> * Gang Chen (cg @zju.edu.cn)
> >> * Wei Wang (wangwei @comp.nus.edu.sg)
> >> * Dinh Tien Tuan Anh (dinhtta @comp.nus.edu.sg)
> >> * Jinyang Gao (jinyang.gao @comp.nus.edu.sg)
> >> * Sheng Wang (wangsh @comp.nus.edu.sg)
> >> * Kaiping Zheng (kaiping @comp.nus.edu.sg)
> >> * Zhaojing Luo (zhaojing @comp.nus.edu.sg)
> >> * Zhongle Xie (zhongle @comp.nus.edu.sg)
> >>
> >> == Affiliations ==
> >> * Beng Chin Ooi, National University of Singapore
> >> * Kian Lee Tan, National University of Singapore
> >> * Gang Chen, Zhejiang University
> >> * Wei Wang, National University of Singapore
> >> * Dinh Tien Tuan Anh, National University of Singapore
> >> * Jinyang Gao, National University of Singapore
> >> * Sheng Wang, National University of Singapore
> >> * Kaiping Zheng, National University of Singapore
> >> * Zhaojing Luo, National University of Singapore
> >> * Zhongle Xie, National University of Singapore
> >>
> >> == Sponsors ==
> >> === Champion ===
> >> Thejas Nair (thejas at apache.org)
> >>
> >> === Nominated Mentors ===
> >> * Thejas Nair (thejas at apache.org)
> >> * Alan Gates (gates at apache dot org)
> >> * Daniel Dai (daijy at apache dot org)
> >> * Ted Dunning (tdunning at apache dot org)
> >>
> >> === Sponsoring Entity ===
> >> We are requesting the Incubator to sponsor this project.
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> >> For additional commands, e-mail: general-help@incubator.apache.org
> >>
> >>
> >
>

Re: [VOTE] Accept Apache Singa as incubator project

Posted by Thejas Nair <th...@gmail.com>.

Thanks for raising this issue. I agree that committer diversity is
important for long term success of a project. I think that should be a
criteria for graduation from incubator.
I think it is going to be more easier to find new contributors as an Apache
incubator project.


On Tue, Mar 10, 2015 at 9:09 AM, jan i <ja...@apache.org> wrote:

>
> +0 I am really concerned about the diversity of the initial committers,
> what happens if the university pulls the plug. I know we all say it will
> never happen, but it could happen.
>
> rgds
> jan i.
>
>
> On 10 March 2015 at 16:20, Alan Gates <al...@gmail.com> wrote:
>
>> +1
>>
>> Alan.
>>
>>   Thejas Nair <th...@gmail.com>
>>  March 10, 2015 at 7:33
>> The Singa Incubator Proposal document has been updated based on
>> feedback in the proposal thread.
>>
>> This vote is proposing the inclusion of Apache Singa as incubator project.
>> The vote will run for at least 72 hours.
>>
>> [ ] +1 Accept Apache Singa into the Incubator
>> [ ] +0 Don’t care.
>> [ ] -1 Don’t accept Apache Singa into the Incubator because..
>>
>> Please vote !
>>
>> Here is my +1 .
>>
>> Link to version of proposal being voted on :
>> https://wiki.apache.org/incubator/SingaProposal?action=recall&rev=10
>>
>> The text is below
>> ----------------------------------------------
>>
>> = Singa Incubator Proposal =
>> == Abstract ==
>> SINGA is a distributed deep learning platform.
>>
>> == Proposal ==
>> SINGA is an efficient, scalable and easy-to-use distributed platform
>> for training deep learning models, e.g., Deep Convolutional Neural
>> Network and
>> Deep Belief Network. It parallelizes the computation (i.e., training)
>> onto a
>> cluster of nodes by distributing the training data and model
>> automatically to
>> speed up the training. Built-in training algorithms like Back-Propagation
>> and
>> Contrastive Divergence are implemented based on common abstractions of
>> deep
>> learning models. Users can train their own deep learning models by simply
>> customizing these abstractions like implementing the Mapper and
>> Reducer in Hadoop.
>>
>> == Background ==
>> Deep learning refers to a set of feature (or representation) learning
>> models
>> that consist of multiple (non-linear) layers, where different layers learn
>> different levels of abstractions (representations) of the raw input data.
>> Larger (in terms of model parameters) and deeper (in terms of number of
>> layers)
>> models have shown better performance, e.g., lower image classification
>> error in
>> Large Scale Visual Recognition Challenge. However, a larger model
>> requires more
>> memory and larger training data to reduce over-fitting. Complex
>> numeric operations
>> make the training computation intensive. In practice, training large
>> deep learning
>> models takes weeks or months on a single node (even with GPU).
>>
>> == Rational ==
>> Deep learning has gained a lot of attraction in both academia and
>> industry due to
>> its success in a wide range of areas such as computer vision and
>> speech recognition.
>> However, training of such models is computationally expensive,
>> especially for large
>> and deep models (e.g., with billions of parameters and more than 10
>> layers). Both
>> Google and Microsoft have developed distributed deep learning systems
>> to make the
>> training more efficient by distributing the computations within a
>> cluster of nodes.
>> However, these systems are closed source softwares. Our goal is to
>> leverage the
>> community of open source developers to make SINGA efficient, scalable
>> and easy to
>> use. SINGA is a full fledged distributed platform, that could benefit the
>> community and also benefit from the community in their involvement in
>> contributing
>> to the further work in this area. We believe the nature of SINGA and our
>> visions
>> for the system fit naturally to Apache's philosophy and development
>> framework.
>>
>> == Initial Goals ==
>> We have developed a system for SINGA running on a commodity computer
>> cluster. The initial goals include,
>> * improving the system in terms of scalability and efficiency, e.g.,
>> using Infiniband for network communication and multi-threading for one
>> node computation. We would consider extending SINGA to GPU clusters
>> later.
>> * benchmarking with larger datasets (hundreds of millions of training
>> instances) and models (billions of parameters).
>> * adding more built-in deep learning models. Users can train the
>> built-in models on their datasets directly.
>>
>>
>> == Current Status ==
>> === Meritocracy ===
>> We would like to follow ASF meritocratic principles to encourage more
>> developers
>> to contribute in this project. We know that only active and excellent
>> developers
>> can make SINGA a successful project. The committer list and PMC will be
>> updated
>> based on developers' performance and commitment. We are also improving the
>> documentation and code to help new developers get started quickly.
>>
>> === Community ===
>> SINGA is currently being developed in the Database System Research Lab at
>> the
>> National University of Singapore (NUS) in collaboration with Zhejiang
>> University in China.
>> Our lab has extensive experience in building database related systems,
>> including
>> distributed systems. Six PhD students and research assistants (Jinyang
>> Gao,
>> Kaiping Zheng, Sheng Wang, Wei Wang, Zhaojing Luo and Zhongle Xie) , a
>> research
>> fellow (Anh Dinh) and three professors (Beng Chin Ooi, Gang Chen, Kian
>> Lee Tan)
>> have been working for a year on this project. We are open to recruiting
>> more
>> developers from diverse backgrounds.
>>
>> === Core Developers ===
>> Beng Chin Ooi, Gang Chen and Kian Lee Tan are professors who have worked
>> on
>> distributed systems for more than 20 years. They have collaborated with
>> the
>> industry and have built various large scale systems. Anh Dinh's research
>> is also
>> on distributed systems, albeit with more focus on security aspects. Wei
>> Wang's
>> research is on deep learning problems including deep learning
>> applications and
>> large scale training. Sheng Wang and Jinyang are working on efficient
>> indexing,
>> querying of large scale data and machine learning. Kaiping, Zhaojing and
>> Zhongle
>> are new PhD students who jointed SINGA recently. They will work on this
>> project
>> for a longer time (next 4-5 years). While we share common research
>> interests,
>> each member also brings diverse expertise to the team.
>>
>> === Alignment ===
>> ASF is already the home of many distributed platforms, e.g., Hadoop,
>> Spark and
>> Mahout, each of which targets a different application domain. SINGA,
>> being a
>> distributed platform for large-scale deep learning, focuses on another
>> important
>> domain for which there still lacks a robust and scalable open-source
>> platform.
>> The recent success of deep learning models especially for vision and
>> speech
>> recognition tasks has generated interests in both applying existing
>> deep learning
>> models and in developing new ones. Thus, an open-source platform for deep
>> learning will be able to attract a large community of users and
>> developers.
>> SINGA is a complex system needing many iterations of design,
>> implementation and
>> testing. Apache's collaboration framework which encourages active
>> contribution
>> from developers will inevitably help improve the quality of the system,
>> as shown
>> in the success of Hadoop, Spark, etc.. Equally important is the community
>> of
>> users which helps identify real-life applications of deep learning, and
>> helps
>> to evaluate the system's performance and ease-of-use. We hope to
>> leverage ASF for
>> coordinating and promoting both communities, and in return benefit the
>> communities
>> with another useful tool.
>>
>> == Known Risks ==
>> === Orphaned products ===
>> Four core developers (Anh, Wei Wang, Jinyang and Sheng Wang) may leave the
>> lab in two to four years time. It is possible that some of them may
>> not have enough
>> time to focus on this project after that. But, SINGA is part of our other
>> bigger
>> research projects on building an infrastructure for data intensive
>> applications,
>> which include health-care analytics and brain-inspired computing. Beng
>> Chin and
>> Kian Lee would continue working on it and getting more people
>> involved. For example,
>> three new developers (Kaiping, Zhaojing and Zhongle) joined us recently.
>> Individual developers are welcome to make SINGA a diverse community
>> that is robust and independent from any single developer.
>>
>> === Inexperience with Open Source ===
>> All the developers are active users and followers of open source
>> projects. Our
>> research lab has a strong commitment to open source, and has released the
>> source
>> code of several systems under open source license as a way of
>> contributing back
>> to the open source community. But we do not have much real experience
>> in open source
>> projects with large and well organized communities like those in Apache.
>> This is
>> one reason we choose Apache which is experienced in open source
>> project incubation.
>> We hope to get the help from Apache (e.g., champion and mentors) to
>> establish a
>> healthy path for SINGA.
>>
>> === Homogenous Developers ===
>> Although the current developers are researchers in the universities, they
>> have
>> different research interests and project experiences, as mentioned in
>> the section
>> that introduces the core developers. We know that a diverse community
>> is helpful.
>> Hence we are open to the idea of recruiting developers from other
>> regions and organizations.
>>
>> === Reliance on Salaried Developers ===
>> As a research project in the university, SINGA's current developing
>> community
>> consists of professors, PhD students, research assistants and
>> postdoctoral fellows.
>> They are driven by their interests to work on this project and have
>> contributed
>> actively since the start of the project. The research assistants and
>> fellows are
>> expected to leave when their contracts expire. However, they are keen
>> to continue
>> to work on the project voluntarily. Moreover, as a long term research
>> project, new
>> research assistants and fellows are likely to join the project.
>>
>> === A Excessive Fascination with the Apache Brand ===
>> We choose Apache not for publicity. We have two purposes. First, we want
>> to
>> leverage Apache's reputation to recruit more developers to make a diverse
>> community. Second, we hope that Apache can help us to establish a healthy
>> path
>> in developing SINGA. Beng Chin and Kian-Lee are established database and
>> distributed system researchers, and together with the other contributors,
>> they
>> sincerely believe that there is a need for a widely accepted open source
>> distributed deep learning platform. The field of deep learning is still
>> at its
>> infancy, and an open source platform will fuel the research in the
>> area. Moreover,
>> such a platform will enable researchers to develop new models and
>> algorithms,
>> rather than spending time implementing a deep learning system from
>> scratch.
>> Furthermore, the need for scalability for such a platform is obvious.
>>
>> === Relationship with Other Apache Products ===
>> Apache Mahout and Apache Spark's ML-LIB are general machine learning
>> systems. Deep
>> learning algorithm can thus be implemented on these two platforms as
>> well. However, the there are differences in training efficiency,
>> scalability and
>> usability. Mahout and Spark ML-LIB follow models where their
>> nodes run synchronously. This is the fundamental difference to Singa who
>> follows the parameter server framework (like Google Brain and Microsoft
>> Adam). Singa can run synchronously or asynchronously. The asynchronous
>> mode
>> is superior than the synchronous mode in terms of scalability. In
>> addition, Singa has some optimizations towards deep learning models
>> (e.g., model
>> parallelism, data parallelism and hybrid-parallelism) which make Singa
>> more efficient. We also provide ease of use programming model for deep
>> learning algorithms.
>>
>> There are also plans for integration with Apache Hadoop's HDFS as
>> storage, to handle large training data.
>> Specifically, we store the training data (e.g., images or raw features of
>> images) in HDFS, then (pre-)fetch them online.
>> We will also explore integration with Hadoop's Yarn and Apache Mesos
>> to do resource management.
>>
>>
>> == Documentation ==
>> The project is hosted at
>> http://www.comp.nus.edu.sg/~dbsystem/project/singa.html.
>> Documentations can be found at the Github Wiki Page:
>> https://github.com/nusinga/singa/wiki.
>> We continue to refine and improve the documentation.
>>
>> == Initial Source ==
>> We use Github to maintain our source code,
>> https://github.com/nusinga/singa
>>
>> == Source and Intellectual Property Submission Plan ==
>> We plan to make our code base be under Apache License, Version 2.0.
>>
>> == External Dependencies ==
>> * required by the core code base: glog, gflags, google protobuf,
>> open-blas, mpich, armci-mpi.
>> * required by data preparation and preprocessing: opencv, hdfs, python.
>>
>> == Cryptography ==
>> Not Applicable
>>
>> == Required Resources ==
>> === Mailing Lists ===
>> Currently, we use google group for internal discussion. The mailing
>> address is
>> nusinga@googlegroup.com. We will migrate the content to the apache
>> mailing
>> lists in the future.
>>
>> * singa-dev
>> * singa-user
>> * singa-commits
>> * singa-private (for private discussion within PCM)
>>
>> === Git Repository ===
>> We want to continue using git for version control. Hence, a git repo
>> is required.
>>
>> === Issue Tracking ===
>> JIRA Singa (SINGA)
>>
>> == Initial Committers ==
>> * Beng Chin Ooi (ooibc @comp.nus.edu.sg)
>> * Kian Lee Tan (tankl @comp.nus.edu.sg)
>> * Gang Chen (cg @zju.edu.cn)
>> * Wei Wang (wangwei @comp.nus.edu.sg)
>> * Dinh Tien Tuan Anh (dinhtta @comp.nus.edu.sg)
>> * Jinyang Gao (jinyang.gao @comp.nus.edu.sg)
>> * Sheng Wang (wangsh @comp.nus.edu.sg)
>> * Kaiping Zheng (kaiping @comp.nus.edu.sg)
>> * Zhaojing Luo (zhaojing @comp.nus.edu.sg)
>> * Zhongle Xie (zhongle @comp.nus.edu.sg)
>>
>> == Affiliations ==
>> * Beng Chin Ooi, National University of Singapore
>> * Kian Lee Tan, National University of Singapore
>> * Gang Chen, Zhejiang University
>> * Wei Wang, National University of Singapore
>> * Dinh Tien Tuan Anh, National University of Singapore
>> * Jinyang Gao, National University of Singapore
>> * Sheng Wang, National University of Singapore
>> * Kaiping Zheng, National University of Singapore
>> * Zhaojing Luo, National University of Singapore
>> * Zhongle Xie, National University of Singapore
>>
>> == Sponsors ==
>> === Champion ===
>> Thejas Nair (thejas at apache.org)
>>
>> === Nominated Mentors ===
>> * Thejas Nair (thejas at apache.org)
>> * Alan Gates (gates at apache dot org)
>> * Daniel Dai (daijy at apache dot org)
>> * Ted Dunning (tdunning at apache dot org)
>>
>> === Sponsoring Entity ===
>> We are requesting the Incubator to sponsor this project.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: general-help@incubator.apache.org
>>
>>
>

Re: [VOTE] Accept Apache Singa as incubator project

Posted by jan i <ja...@apache.org>.

+0 I am really concerned about the diversity of the initial committers,
what happens if the university pulls the plug. I know we all say it will
never happen, but it could happen.

rgds
jan i.


On 10 March 2015 at 16:20, Alan Gates <al...@gmail.com> wrote:

> +1
>
> Alan.
>
>   Thejas Nair <th...@gmail.com>
>  March 10, 2015 at 7:33
> The Singa Incubator Proposal document has been updated based on
> feedback in the proposal thread.
>
> This vote is proposing the inclusion of Apache Singa as incubator project.
> The vote will run for at least 72 hours.
>
> [ ] +1 Accept Apache Singa into the Incubator
> [ ] +0 Don’t care.
> [ ] -1 Don’t accept Apache Singa into the Incubator because..
>
> Please vote !
>
> Here is my +1 .
>
> Link to version of proposal being voted on :
> https://wiki.apache.org/incubator/SingaProposal?action=recall&rev=10
>
> The text is below
> ----------------------------------------------
>
> = Singa Incubator Proposal =
> == Abstract ==
> SINGA is a distributed deep learning platform.
>
> == Proposal ==
> SINGA is an efficient, scalable and easy-to-use distributed platform
> for training deep learning models, e.g., Deep Convolutional Neural Network
> and
> Deep Belief Network. It parallelizes the computation (i.e., training) onto
> a
> cluster of nodes by distributing the training data and model automatically
> to
> speed up the training. Built-in training algorithms like Back-Propagation
> and
> Contrastive Divergence are implemented based on common abstractions of deep
> learning models. Users can train their own deep learning models by simply
> customizing these abstractions like implementing the Mapper and
> Reducer in Hadoop.
>
> == Background ==
> Deep learning refers to a set of feature (or representation) learning
> models
> that consist of multiple (non-linear) layers, where different layers learn
> different levels of abstractions (representations) of the raw input data.
> Larger (in terms of model parameters) and deeper (in terms of number of
> layers)
> models have shown better performance, e.g., lower image classification
> error in
> Large Scale Visual Recognition Challenge. However, a larger model requires
> more
> memory and larger training data to reduce over-fitting. Complex
> numeric operations
> make the training computation intensive. In practice, training large
> deep learning
> models takes weeks or months on a single node (even with GPU).
>
> == Rational ==
> Deep learning has gained a lot of attraction in both academia and
> industry due to
> its success in a wide range of areas such as computer vision and
> speech recognition.
> However, training of such models is computationally expensive,
> especially for large
> and deep models (e.g., with billions of parameters and more than 10
> layers). Both
> Google and Microsoft have developed distributed deep learning systems
> to make the
> training more efficient by distributing the computations within a
> cluster of nodes.
> However, these systems are closed source softwares. Our goal is to
> leverage the
> community of open source developers to make SINGA efficient, scalable
> and easy to
> use. SINGA is a full fledged distributed platform, that could benefit the
> community and also benefit from the community in their involvement in
> contributing
> to the further work in this area. We believe the nature of SINGA and our
> visions
> for the system fit naturally to Apache's philosophy and development
> framework.
>
> == Initial Goals ==
> We have developed a system for SINGA running on a commodity computer
> cluster. The initial goals include,
> * improving the system in terms of scalability and efficiency, e.g.,
> using Infiniband for network communication and multi-threading for one
> node computation. We would consider extending SINGA to GPU clusters
> later.
> * benchmarking with larger datasets (hundreds of millions of training
> instances) and models (billions of parameters).
> * adding more built-in deep learning models. Users can train the
> built-in models on their datasets directly.
>
>
> == Current Status ==
> === Meritocracy ===
> We would like to follow ASF meritocratic principles to encourage more
> developers
> to contribute in this project. We know that only active and excellent
> developers
> can make SINGA a successful project. The committer list and PMC will be
> updated
> based on developers' performance and commitment. We are also improving the
> documentation and code to help new developers get started quickly.
>
> === Community ===
> SINGA is currently being developed in the Database System Research Lab at
> the
> National University of Singapore (NUS) in collaboration with Zhejiang
> University in China.
> Our lab has extensive experience in building database related systems,
> including
> distributed systems. Six PhD students and research assistants (Jinyang Gao,
> Kaiping Zheng, Sheng Wang, Wei Wang, Zhaojing Luo and Zhongle Xie) , a
> research
> fellow (Anh Dinh) and three professors (Beng Chin Ooi, Gang Chen, Kian Lee
> Tan)
> have been working for a year on this project. We are open to recruiting
> more
> developers from diverse backgrounds.
>
> === Core Developers ===
> Beng Chin Ooi, Gang Chen and Kian Lee Tan are professors who have worked on
> distributed systems for more than 20 years. They have collaborated with the
> industry and have built various large scale systems. Anh Dinh's research
> is also
> on distributed systems, albeit with more focus on security aspects. Wei
> Wang's
> research is on deep learning problems including deep learning applications
> and
> large scale training. Sheng Wang and Jinyang are working on efficient
> indexing,
> querying of large scale data and machine learning. Kaiping, Zhaojing and
> Zhongle
> are new PhD students who jointed SINGA recently. They will work on this
> project
> for a longer time (next 4-5 years). While we share common research
> interests,
> each member also brings diverse expertise to the team.
>
> === Alignment ===
> ASF is already the home of many distributed platforms, e.g., Hadoop, Spark
> and
> Mahout, each of which targets a different application domain. SINGA, being
> a
> distributed platform for large-scale deep learning, focuses on another
> important
> domain for which there still lacks a robust and scalable open-source
> platform.
> The recent success of deep learning models especially for vision and speech
> recognition tasks has generated interests in both applying existing
> deep learning
> models and in developing new ones. Thus, an open-source platform for deep
> learning will be able to attract a large community of users and developers.
> SINGA is a complex system needing many iterations of design,
> implementation and
> testing. Apache's collaboration framework which encourages active
> contribution
> from developers will inevitably help improve the quality of the system, as
> shown
> in the success of Hadoop, Spark, etc.. Equally important is the community
> of
> users which helps identify real-life applications of deep learning, and
> helps
> to evaluate the system's performance and ease-of-use. We hope to
> leverage ASF for
> coordinating and promoting both communities, and in return benefit the
> communities
> with another useful tool.
>
> == Known Risks ==
> === Orphaned products ===
> Four core developers (Anh, Wei Wang, Jinyang and Sheng Wang) may leave the
> lab in two to four years time. It is possible that some of them may
> not have enough
> time to focus on this project after that. But, SINGA is part of our other
> bigger
> research projects on building an infrastructure for data intensive
> applications,
> which include health-care analytics and brain-inspired computing. Beng
> Chin and
> Kian Lee would continue working on it and getting more people
> involved. For example,
> three new developers (Kaiping, Zhaojing and Zhongle) joined us recently.
> Individual developers are welcome to make SINGA a diverse community
> that is robust and independent from any single developer.
>
> === Inexperience with Open Source ===
> All the developers are active users and followers of open source projects.
> Our
> research lab has a strong commitment to open source, and has released the
> source
> code of several systems under open source license as a way of contributing
> back
> to the open source community. But we do not have much real experience
> in open source
> projects with large and well organized communities like those in Apache.
> This is
> one reason we choose Apache which is experienced in open source
> project incubation.
> We hope to get the help from Apache (e.g., champion and mentors) to
> establish a
> healthy path for SINGA.
>
> === Homogenous Developers ===
> Although the current developers are researchers in the universities, they
> have
> different research interests and project experiences, as mentioned in
> the section
> that introduces the core developers. We know that a diverse community
> is helpful.
> Hence we are open to the idea of recruiting developers from other
> regions and organizations.
>
> === Reliance on Salaried Developers ===
> As a research project in the university, SINGA's current developing
> community
> consists of professors, PhD students, research assistants and
> postdoctoral fellows.
> They are driven by their interests to work on this project and have
> contributed
> actively since the start of the project. The research assistants and
> fellows are
> expected to leave when their contracts expire. However, they are keen
> to continue
> to work on the project voluntarily. Moreover, as a long term research
> project, new
> research assistants and fellows are likely to join the project.
>
> === A Excessive Fascination with the Apache Brand ===
> We choose Apache not for publicity. We have two purposes. First, we want to
> leverage Apache's reputation to recruit more developers to make a diverse
> community. Second, we hope that Apache can help us to establish a healthy
> path
> in developing SINGA. Beng Chin and Kian-Lee are established database and
> distributed system researchers, and together with the other contributors,
> they
> sincerely believe that there is a need for a widely accepted open source
> distributed deep learning platform. The field of deep learning is still at
> its
> infancy, and an open source platform will fuel the research in the
> area. Moreover,
> such a platform will enable researchers to develop new models and
> algorithms,
> rather than spending time implementing a deep learning system from scratch.
> Furthermore, the need for scalability for such a platform is obvious.
>
> === Relationship with Other Apache Products ===
> Apache Mahout and Apache Spark's ML-LIB are general machine learning
> systems. Deep
> learning algorithm can thus be implemented on these two platforms as
> well. However, the there are differences in training efficiency,
> scalability and
> usability. Mahout and Spark ML-LIB follow models where their
> nodes run synchronously. This is the fundamental difference to Singa who
> follows the parameter server framework (like Google Brain and Microsoft
> Adam). Singa can run synchronously or asynchronously. The asynchronous mode
> is superior than the synchronous mode in terms of scalability. In
> addition, Singa has some optimizations towards deep learning models
> (e.g., model
> parallelism, data parallelism and hybrid-parallelism) which make Singa
> more efficient. We also provide ease of use programming model for deep
> learning algorithms.
>
> There are also plans for integration with Apache Hadoop's HDFS as
> storage, to handle large training data.
> Specifically, we store the training data (e.g., images or raw features of
> images) in HDFS, then (pre-)fetch them online.
> We will also explore integration with Hadoop's Yarn and Apache Mesos
> to do resource management.
>
>
> == Documentation ==
> The project is hosted at
> http://www.comp.nus.edu.sg/~dbsystem/project/singa.html.
> Documentations can be found at the Github Wiki Page:
> https://github.com/nusinga/singa/wiki.
> We continue to refine and improve the documentation.
>
> == Initial Source ==
> We use Github to maintain our source code,
> https://github.com/nusinga/singa
>
> == Source and Intellectual Property Submission Plan ==
> We plan to make our code base be under Apache License, Version 2.0.
>
> == External Dependencies ==
> * required by the core code base: glog, gflags, google protobuf,
> open-blas, mpich, armci-mpi.
> * required by data preparation and preprocessing: opencv, hdfs, python.
>
> == Cryptography ==
> Not Applicable
>
> == Required Resources ==
> === Mailing Lists ===
> Currently, we use google group for internal discussion. The mailing
> address is
> nusinga@googlegroup.com. We will migrate the content to the apache mailing
> lists in the future.
>
> * singa-dev
> * singa-user
> * singa-commits
> * singa-private (for private discussion within PCM)
>
> === Git Repository ===
> We want to continue using git for version control. Hence, a git repo
> is required.
>
> === Issue Tracking ===
> JIRA Singa (SINGA)
>
> == Initial Committers ==
> * Beng Chin Ooi (ooibc @comp.nus.edu.sg)
> * Kian Lee Tan (tankl @comp.nus.edu.sg)
> * Gang Chen (cg @zju.edu.cn)
> * Wei Wang (wangwei @comp.nus.edu.sg)
> * Dinh Tien Tuan Anh (dinhtta @comp.nus.edu.sg)
> * Jinyang Gao (jinyang.gao @comp.nus.edu.sg)
> * Sheng Wang (wangsh @comp.nus.edu.sg)
> * Kaiping Zheng (kaiping @comp.nus.edu.sg)
> * Zhaojing Luo (zhaojing @comp.nus.edu.sg)
> * Zhongle Xie (zhongle @comp.nus.edu.sg)
>
> == Affiliations ==
> * Beng Chin Ooi, National University of Singapore
> * Kian Lee Tan, National University of Singapore
> * Gang Chen, Zhejiang University
> * Wei Wang, National University of Singapore
> * Dinh Tien Tuan Anh, National University of Singapore
> * Jinyang Gao, National University of Singapore
> * Sheng Wang, National University of Singapore
> * Kaiping Zheng, National University of Singapore
> * Zhaojing Luo, National University of Singapore
> * Zhongle Xie, National University of Singapore
>
> == Sponsors ==
> === Champion ===
> Thejas Nair (thejas at apache.org)
>
> === Nominated Mentors ===
> * Thejas Nair (thejas at apache.org)
> * Alan Gates (gates at apache dot org)
> * Daniel Dai (daijy at apache dot org)
> * Ted Dunning (tdunning at apache dot org)
>
> === Sponsoring Entity ===
> We are requesting the Incubator to sponsor this project.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>

Re: [VOTE] Accept Apache Singa as incubator project

Posted by Alan Gates <al...@gmail.com>.

+1

Alan.

> Thejas Nair <ma...@gmail.com>
> March 10, 2015 at 7:33
> The Singa Incubator Proposal document has been updated based on
> feedback in the proposal thread.
>
> This vote is proposing the inclusion of Apache Singa as incubator project.
> The vote will run for at least 72 hours.
>
> [ ] +1 Accept Apache Singa into the Incubator
> [ ] +0 Don’t care.
> [ ] -1 Don’t accept Apache Singa into the Incubator because..
>
> Please vote !
>
> Here is my +1 .
>
> Link to version of proposal being voted on :
> https://wiki.apache.org/incubator/SingaProposal?action=recall&rev=10
>
> The text is below
> ----------------------------------------------
>
> = Singa Incubator Proposal =
> == Abstract ==
> SINGA is a distributed deep learning platform.
>
> == Proposal ==
> SINGA is an efficient, scalable and easy-to-use distributed platform
> for training deep learning models, e.g., Deep Convolutional Neural 
> Network and
> Deep Belief Network. It parallelizes the computation (i.e., training) 
> onto a
> cluster of nodes by distributing the training data and model 
> automatically to
> speed up the training. Built-in training algorithms like 
> Back-Propagation and
> Contrastive Divergence are implemented based on common abstractions of 
> deep
> learning models. Users can train their own deep learning models by simply
> customizing these abstractions like implementing the Mapper and
> Reducer in Hadoop.
>
> == Background ==
> Deep learning refers to a set of feature (or representation) learning 
> models
> that consist of multiple (non-linear) layers, where different layers learn
> different levels of abstractions (representations) of the raw input data.
> Larger (in terms of model parameters) and deeper (in terms of number 
> of layers)
> models have shown better performance, e.g., lower image classification 
> error in
> Large Scale Visual Recognition Challenge. However, a larger model 
> requires more
> memory and larger training data to reduce over-fitting. Complex
> numeric operations
> make the training computation intensive. In practice, training large
> deep learning
> models takes weeks or months on a single node (even with GPU).
>
> == Rational ==
> Deep learning has gained a lot of attraction in both academia and
> industry due to
> its success in a wide range of areas such as computer vision and
> speech recognition.
> However, training of such models is computationally expensive,
> especially for large
> and deep models (e.g., with billions of parameters and more than 10
> layers). Both
> Google and Microsoft have developed distributed deep learning systems
> to make the
> training more efficient by distributing the computations within a
> cluster of nodes.
> However, these systems are closed source softwares. Our goal is to 
> leverage the
> community of open source developers to make SINGA efficient, scalable
> and easy to
> use. SINGA is a full fledged distributed platform, that could benefit the
> community and also benefit from the community in their involvement in
> contributing
> to the further work in this area. We believe the nature of SINGA and 
> our visions
> for the system fit naturally to Apache's philosophy and development 
> framework.
>
> == Initial Goals ==
> We have developed a system for SINGA running on a commodity computer
> cluster. The initial goals include,
> * improving the system in terms of scalability and efficiency, e.g.,
> using Infiniband for network communication and multi-threading for one
> node computation. We would consider extending SINGA to GPU clusters
> later.
> * benchmarking with larger datasets (hundreds of millions of training
> instances) and models (billions of parameters).
> * adding more built-in deep learning models. Users can train the
> built-in models on their datasets directly.
>
>
> == Current Status ==
> === Meritocracy ===
> We would like to follow ASF meritocratic principles to encourage more 
> developers
> to contribute in this project. We know that only active and excellent 
> developers
> can make SINGA a successful project. The committer list and PMC will 
> be updated
> based on developers' performance and commitment. We are also improving the
> documentation and code to help new developers get started quickly.
>
> === Community ===
> SINGA is currently being developed in the Database System Research Lab 
> at the
> National University of Singapore (NUS) in collaboration with Zhejiang
> University in China.
> Our lab has extensive experience in building database related systems, 
> including
> distributed systems. Six PhD students and research assistants (Jinyang 
> Gao,
> Kaiping Zheng, Sheng Wang, Wei Wang, Zhaojing Luo and Zhongle Xie) , a 
> research
> fellow (Anh Dinh) and three professors (Beng Chin Ooi, Gang Chen, Kian 
> Lee Tan)
> have been working for a year on this project. We are open to 
> recruiting more
> developers from diverse backgrounds.
>
> === Core Developers ===
> Beng Chin Ooi, Gang Chen and Kian Lee Tan are professors who have 
> worked on
> distributed systems for more than 20 years. They have collaborated 
> with the
> industry and have built various large scale systems. Anh Dinh's 
> research is also
> on distributed systems, albeit with more focus on security aspects. 
> Wei Wang's
> research is on deep learning problems including deep learning 
> applications and
> large scale training. Sheng Wang and Jinyang are working on efficient 
> indexing,
> querying of large scale data and machine learning. Kaiping, Zhaojing 
> and Zhongle
> are new PhD students who jointed SINGA recently. They will work on 
> this project
> for a longer time (next 4-5 years). While we share common research 
> interests,
> each member also brings diverse expertise to the team.
>
> === Alignment ===
> ASF is already the home of many distributed platforms, e.g., Hadoop, 
> Spark and
> Mahout, each of which targets a different application domain. SINGA, 
> being a
> distributed platform for large-scale deep learning, focuses on another 
> important
> domain for which there still lacks a robust and scalable open-source 
> platform.
> The recent success of deep learning models especially for vision and 
> speech
> recognition tasks has generated interests in both applying existing
> deep learning
> models and in developing new ones. Thus, an open-source platform for deep
> learning will be able to attract a large community of users and 
> developers.
> SINGA is a complex system needing many iterations of design, 
> implementation and
> testing. Apache's collaboration framework which encourages active 
> contribution
> from developers will inevitably help improve the quality of the 
> system, as shown
> in the success of Hadoop, Spark, etc.. Equally important is the 
> community of
> users which helps identify real-life applications of deep learning, 
> and helps
> to evaluate the system's performance and ease-of-use. We hope to
> leverage ASF for
> coordinating and promoting both communities, and in return benefit the
> communities
> with another useful tool.
>
> == Known Risks ==
> === Orphaned products ===
> Four core developers (Anh, Wei Wang, Jinyang and Sheng Wang) may leave the
> lab in two to four years time. It is possible that some of them may
> not have enough
> time to focus on this project after that. But, SINGA is part of our 
> other bigger
> research projects on building an infrastructure for data intensive 
> applications,
> which include health-care analytics and brain-inspired computing. Beng 
> Chin and
> Kian Lee would continue working on it and getting more people
> involved. For example,
> three new developers (Kaiping, Zhaojing and Zhongle) joined us recently.
> Individual developers are welcome to make SINGA a diverse community
> that is robust and independent from any single developer.
>
> === Inexperience with Open Source ===
> All the developers are active users and followers of open source 
> projects. Our
> research lab has a strong commitment to open source, and has released 
> the source
> code of several systems under open source license as a way of 
> contributing back
> to the open source community. But we do not have much real experience
> in open source
> projects with large and well organized communities like those in 
> Apache. This is
> one reason we choose Apache which is experienced in open source
> project incubation.
> We hope to get the help from Apache (e.g., champion and mentors) to 
> establish a
> healthy path for SINGA.
>
> === Homogenous Developers ===
> Although the current developers are researchers in the universities, 
> they have
> different research interests and project experiences, as mentioned in
> the section
> that introduces the core developers. We know that a diverse community
> is helpful.
> Hence we are open to the idea of recruiting developers from other
> regions and organizations.
>
> === Reliance on Salaried Developers ===
> As a research project in the university, SINGA's current developing 
> community
> consists of professors, PhD students, research assistants and
> postdoctoral fellows.
> They are driven by their interests to work on this project and have 
> contributed
> actively since the start of the project. The research assistants and 
> fellows are
> expected to leave when their contracts expire. However, they are keen
> to continue
> to work on the project voluntarily. Moreover, as a long term research
> project, new
> research assistants and fellows are likely to join the project.
>
> === A Excessive Fascination with the Apache Brand ===
> We choose Apache not for publicity. We have two purposes. First, we 
> want to
> leverage Apache's reputation to recruit more developers to make a diverse
> community. Second, we hope that Apache can help us to establish a 
> healthy path
> in developing SINGA. Beng Chin and Kian-Lee are established database and
> distributed system researchers, and together with the other 
> contributors, they
> sincerely believe that there is a need for a widely accepted open source
> distributed deep learning platform. The field of deep learning is 
> still at its
> infancy, and an open source platform will fuel the research in the
> area. Moreover,
> such a platform will enable researchers to develop new models and 
> algorithms,
> rather than spending time implementing a deep learning system from 
> scratch.
> Furthermore, the need for scalability for such a platform is obvious.
>
> === Relationship with Other Apache Products ===
> Apache Mahout and Apache Spark's ML-LIB are general machine learning
> systems. Deep
> learning algorithm can thus be implemented on these two platforms as
> well. However, the there are differences in training efficiency,
> scalability and
> usability. Mahout and Spark ML-LIB follow models where their
> nodes run synchronously. This is the fundamental difference to Singa who
> follows the parameter server framework (like Google Brain and Microsoft
> Adam). Singa can run synchronously or asynchronously. The asynchronous 
> mode
> is superior than the synchronous mode in terms of scalability. In
> addition, Singa has some optimizations towards deep learning models
> (e.g., model
> parallelism, data parallelism and hybrid-parallelism) which make Singa
> more efficient. We also provide ease of use programming model for deep
> learning algorithms.
>
> There are also plans for integration with Apache Hadoop's HDFS as
> storage, to handle large training data.
> Specifically, we store the training data (e.g., images or raw features of
> images) in HDFS, then (pre-)fetch them online.
> We will also explore integration with Hadoop's Yarn and Apache Mesos
> to do resource management.
>
>
> == Documentation ==
> The project is hosted at
> http://www.comp.nus.edu.sg/~dbsystem/project/singa.html.
> Documentations can be found at the Github Wiki Page:
> https://github.com/nusinga/singa/wiki.
> We continue to refine and improve the documentation.
>
> == Initial Source ==
> We use Github to maintain our source code, 
> https://github.com/nusinga/singa
>
> == Source and Intellectual Property Submission Plan ==
> We plan to make our code base be under Apache License, Version 2.0.
>
> == External Dependencies ==
> * required by the core code base: glog, gflags, google protobuf,
> open-blas, mpich, armci-mpi.
> * required by data preparation and preprocessing: opencv, hdfs, python.
>
> == Cryptography ==
> Not Applicable
>
> == Required Resources ==
> === Mailing Lists ===
> Currently, we use google group for internal discussion. The mailing 
> address is
> nusinga@googlegroup.com. We will migrate the content to the apache mailing
> lists in the future.
>
> * singa-dev
> * singa-user
> * singa-commits
> * singa-private (for private discussion within PCM)
>
> === Git Repository ===
> We want to continue using git for version control. Hence, a git repo
> is required.
>
> === Issue Tracking ===
> JIRA Singa (SINGA)
>
> == Initial Committers ==
> * Beng Chin Ooi (ooibc @comp.nus.edu.sg)
> * Kian Lee Tan (tankl @comp.nus.edu.sg)
> * Gang Chen (cg @zju.edu.cn)
> * Wei Wang (wangwei @comp.nus.edu.sg)
> * Dinh Tien Tuan Anh (dinhtta @comp.nus.edu.sg)
> * Jinyang Gao (jinyang.gao @comp.nus.edu.sg)
> * Sheng Wang (wangsh @comp.nus.edu.sg)
> * Kaiping Zheng (kaiping @comp.nus.edu.sg)
> * Zhaojing Luo (zhaojing @comp.nus.edu.sg)
> * Zhongle Xie (zhongle @comp.nus.edu.sg)
>
> == Affiliations ==
> * Beng Chin Ooi, National University of Singapore
> * Kian Lee Tan, National University of Singapore
> * Gang Chen, Zhejiang University
> * Wei Wang, National University of Singapore
> * Dinh Tien Tuan Anh, National University of Singapore
> * Jinyang Gao, National University of Singapore
> * Sheng Wang, National University of Singapore
> * Kaiping Zheng, National University of Singapore
> * Zhaojing Luo, National University of Singapore
> * Zhongle Xie, National University of Singapore
>
> == Sponsors ==
> === Champion ===
> Thejas Nair (thejas at apache.org)
>
> === Nominated Mentors ===
> * Thejas Nair (thejas at apache.org)
> * Alan Gates (gates at apache dot org)
> * Daniel Dai (daijy at apache dot org)
> * Ted Dunning (tdunning at apache dot org)
>
> === Sponsoring Entity ===
> We are requesting the Incubator to sponsor this project.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>