You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@incubator.apache.org by Thejas Nair <th...@gmail.com> on 2015/01/28 02:29:51 UTC

[DISCUSS] [PROPOSAL] Singa for Apache Incubator

Hello everyone,

I would like to propose the inclusion of Singa as an Apache Incubator project.

Here is the proposal - https://wiki.apache.org/incubator/SingaProposal

Please review the proposal and give feedback. I am planning to start a
vote after 7 days if the proposal looks good.
We are also seeking additional Apache mentors for the project.

Thanks,
Thejas
==========================================================
Singa Incubator Proposal

Abstract

SINGA is a distributed deep learning platform.

Proposal

SINGA is an efficient, scalable and easy-to-use distributed platform
for training deep learning models, e.g., Deep Convolutional Neural
Network and Deep Belief Network. It parallelizes the computation
(i.e., training) onto a cluster of nodes by distributing the training
data and model automatically to speed up the training. Built-in
training algorithms like Back-Propagation and Contrastive Divergence
are implemented based on common abstractions of deep learning models.
Users can train their own deep learning models by simply customizing
these abstractions like implementing the Mapper and Reducer in Hadoop.

Background

Deep learning refers to a set of feature (or representation) learning
models that consist of multiple (non-linear) layers, where different
layers learn different levels of abstractions (representations) of the
raw input data. Larger (in terms of model parameters) and deeper (in
terms of number of layers) models have shown better performance, e.g.,
lower image classification error in Large Scale Visual Recognition
Challenge. However, a larger model requires more memory and larger
training data to reduce over-fitting. Complex numeric operations make
the training computation intensive. In practice, training large deep
learning models takes weeks or months on a single node (even with
GPU).

Rational

Deep learning has gained a lot of attraction in both academia and
industry due to its success in a wide range of areas such as computer
vision and speech recognition. However, training of such models is
computationally expensive, especially for large and deep models (e.g.,
with billions of parameters and more than 10 layers). Both Google and
Microsoft have developed distributed deep learning systems to make the
training more efficient by distributing the computations within a
cluster of nodes. However, these systems are closed source softwares.
Our goal is to leverage the community of open source developers to
make SINGA efficient, scalable and easy to use. SINGA is a full
fledged distributed platform, that could benefit the community and
also benefit from the community in their involvement in contributing
to the further work in this area. We believe the nature of SINGA and
our visions for the system fit naturally to Apache's philosophy and
development framework.

Initial Goals

We have developed a system for SINGA running on a commodity computer
cluster. The initial goals include, * improving the system in terms of
scalability and efficiency, e.g., using Infiniband for network
communication and multi-threading for one node computation. We would
consider extending SINGA to GPU clusters later. * benchmarking with
larger datasets (hundreds of millions of training instances) and
models (billions of parameters). * adding more built-in deep learning
models. Users can train the built-in models on their datasets
directly.

Current Status

Meritocracy

We would like to follow ASF meritocratic principles to encourage more
developers to contribute in this project. We know that only active and
excellent developers can make SINGA a successful project. The
committer list and PMC will be updated based on developers'
performance and commitment. We are also improving the documentation
and code to help new developers get started quickly.

Community

SINGA is currently being developed in the Database System Research Lab
at the National University of Singapore (NUS) in collaboration with
Zhejiang University in China. Our lab has extensive experience in
building database related systems, including distributed systems. Six
PhD students and research assistants (Jinyang Gao, Kaiping Zheng,
Sheng Wang, Wei Wang, Zhaojing Luo and Zhongle Xie) , a research
fellow (Anh Dinh) and three professors (Beng Chin Ooi, Gang Chen, Kian
Lee Tan) have been working for a year on this project. We are open to
recruiting more developers from diverse backgrounds.

Core Developers

Beng Chin Ooi, Gang Chen and Kian Lee Tan are professors who have
worked on distributed systems for more than 20 years. They have
collaborated with the industry and have built various large scale
systems. Anh Dinh's research is also on distributed systems, albeit
with more focus on security aspects. Wei Wang's research is on deep
learning problems including deep learning applications and large scale
training. Sheng Wang and Jinyang are working on efficient indexing,
querying of large scale data and machine learning. Kaiping, Zhaojing
and Zhongle are new PhD students who jointed SINGA recently. They will
work on this project for a longer time (next 4-5 years). While we
share common research interests, each member also brings diverse
expertise to the team.

Alignment

ASF is already the home of many distributed platforms, e.g., Hadoop,
Spark and Mahout, each of which targets a different application
domain. SINGA, being a distributed platform for large-scale deep
learning, focuses on another important domain for which there still
lacks a robust and scalable open-source platform. The recent success
of deep learning models especially for vision and speech recognition
tasks has generated interests in both applying existing deep learning
models and in developing new ones. Thus, an open-source platform for
deep learning will be able to attract a large community of users and
developers. SINGA is a complex system needing many iterations of
design, implementation and testing. Apache's collaboration framework
which encourages active contribution from developers will inevitably
help improve the quality of the system, as shown in the success of
Hadoop, Spark, etc.. Equally important is the community of users which
helps identify real-life applications of deep learning, and helps to
evaluate the system's performance and ease-of-use. We hope to leverage
ASF for coordinating and promoting both communities, and in return
benefit the communities with another useful tool.

Known Risks

Orphaned products

Four core developers (Anh, Wei Wang, Jinyang and Sheng Wang) may leave
the lab in two to four years time. It is possible that some of them
may not have enough time to focus on this project after that. But,
SINGA is part of our other bigger research projects on building an
infrastructure for data intensive applications, which include
health-care analytics and brain-inspired computing. Beng Chin and Kian
Lee would continue working on it and getting more people involved. For
example, three new developers (Kaiping, Zhaojing and Zhongle) joined
us recently. Individual developers are welcome to make SINGA a diverse
community that is robust and independent from any single developer.

Inexperience with Open Source

All the developers are active users and followers of open source
projects. Our research lab has a strong commitment to open source, and
has released the source code of several systems under open source
license as a way of contributing back to the open source community.
But we do not have much real experience in open source projects with
large and well organized communities like those in Apache. This is one
reason we choose Apache which is experienced in open source project
incubation. We hope to get the help from Apache (e.g., champion and
mentors) to establish a healthy path for SINGA.

Homogenous Developers

Although the current developers are researchers in the universities,
they have different research interests and project experiences, as
mentioned in the section that introduces the core developers. We know
that a diverse community is helpful. Hence we are open to the idea of
recruiting developers from other regions and organizations.

Reliance on Salaried Developers

As a research project in the university, SINGA's current developing
community consists of professors, PhD students, research assistants
and postdoctoral fellows. They are driven by their interests to work
on this project and have contributed actively since the start of the
project. The research assistants and fellows are expected to leave
when their contracts expire. However, they are keen to continue to
work on the project voluntarily. Moreover, as a long term research
project, new research assistants and fellows are likely to join the
project.

A Excessive Fascination with the Apache Brand

We choose Apache not for publicity. We have two purposes. First, we
want to leverage Apache's reputation to recruit more developers to
make a diverse community. Second, we hope that Apache can help us to
establish a healthy path in developing SINGA. Beng Chin and Kian-Lee
are established database and distributed system researchers, and
together with the other contributors, they sincerely believe that
there is a need for a widely accepted open source distributed deep
learning platform. The field of deep learning is still at its infancy,
and an open source platform will fuel the research in the area.
Moreover, such a platform will enable researchers to develop new
models and algorithms, rather than spending time implementing a deep
learning system from scratch. Furthermore, the need for scalability
for such a platform is obvious.

Relationship with Other Apache Products

Apache H2O implemented two simple deep learning models, namely the
Multi-Layer Perceptron and Deep Auto-encoders. There are two
significant differences between H2O and SINGA. First, H2O adopts the
Map-Reduce framework which runs a set of computing nodes in parallel
againsts of the training set. Model parameters trained by all
computing nodes are averaged as the final model parameters. This
training algorithm is different from the distributed training
algorithm used by DistBelief, Adam and SINGA, which frequently
synchronizes the parameters trained from different nodes. SINGA adopts
the parameter server framework to support a wide range of distributed
training algorithms and parallelization methods (e.g., data
parallelism, model parallelism and hybrid parallelism. H2O only
support data parallelism) . Second, in H2O, users are restricted to
use the two built-in models. In SINGA, we provide simple programming
model to let users implement their own deep learning models. A new
deep learning model can be implemented by customizing the base Layer
class for each layer involved in the model. It is similar to writing
Hadoop programs where users only need to override the base Mapper and
Reducer. We also provide built-in models for users to use directly.

Documentation

The project is hosted at
http://www.comp.nus.edu.sg/~dbsystem/project/singa.html.
Documentations can be found at the Github Wiki Page:
https://github.com/nusinga/singa/wiki. We continue to refine and
improve the documentation.

Initial Source

We use Github to maintain our source code, https://github.com/nusinga/singa

Source and Intellectual Property Submission Plan

We plan to make our code base be under Apache License, Version 2.0.

External Dependencies

required by the core code base: glog, gflags, google protobuf,
open-blas, mpich, armci-mpi.
required by data preparation and preprocessing: opencv, hdfs, python.

Cryptography

Not Applicable

Required Resources

Mailing Lists

Currently, we use google group for internal discussion. The mailing
address is nusinga@googlegroup.com. We will migrate the content to the
apache mailing lists in the future.

singa-dev
singa-user
singa-commits
singa-private (for private discussion within PCM)

Git Repository

We want to continue using git for version control. Hence, a git repo
is required.

Issue Tracking

JIRA Singa (SINGA)

Initial Committers

Beng Chin Ooi (ooibc @comp.nus.edu.sg)
Kian Lee Tan (tankl @comp.nus.edu.sg)
Gang Chen (cg @zju.edu.cn)
Wei Wang (wangwei @comp.nus.edu.sg)
Dinh Tien Tuan Anh (dinhtta @comp.nus.edu.sg)
Jinyang Gao (jinyang.gao @comp.nus.edu.sg)
Sheng Wang (wangsh @comp.nus.edu.sg)
Kaiping Zheng (kaiping @comp.nus.edu.sg)
Zhaojing Luo (zhaojing @comp.nus.edu.sg)
Zhongle Xie (zhongle @comp.nus.edu.sg)

Affiliations

Beng Chin Ooi, National University of Singapore
Kian Lee Tan, National University of Singapore
Gang Chen, Zhejiang University
Wei Wang, National University of Singapore
Dinh Tien Tuan Anh, National University of Singapore
Jinyang Gao, National University of Singapore
Sheng Wang, National University of Singapore
Kaiping Zheng, National University of Singapore
Zhaojing Luo, National University of Singapore
Zhongle Xie, National University of Singapore

Sponsors

Champion

Thejas Nair (thejas at apache.org) - Hortonworks

Nominated Mentors

Thejas Nair (thejas at apache.org) - Hortonworks
Alan Gates (gates at apache dot org) - Hortonworks
(Seeking more volunteers!)

Sponsoring Entity

We are requesting the Incubator to sponsor this project.

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [DISCUSS] [PROPOSAL] Singa for Apache Incubator

Posted by Henry Saputra <he...@gmail.com>.
Several comments:
-) How many users already using this project? I would reccomend to
drop request for singa-user list at the beginning.
-) All the initial committers come from university and seemed like
some of them already ready to leave university. I am not too sure if
this project go survive if all of the inital committers are from
university as students.
-) Need to solicit more mentors if this project ever get to Apache incubator.

- Henry

On Tue, Feb 3, 2015 at 3:58 PM, Thejas Nair <th...@gmail.com> wrote:
> The "Relationship with Other Apache Products" section has been
> updated. The reference to H2O in that section has been removed, and
> other projects have been added.
>  Thanks for the feedback!
>
>
> On Wed, Jan 28, 2015 at 10:27 AM, Thejas Nair <th...@gmail.com> wrote:
>> Thanks for pointing that out Henry! Yes, looks like H20 is not an
>> apache project, I should have verified that.
>> I will edit that, and revisit that section along with the folks in
>> Singa community.
>>
>>
>> On Tue, Jan 27, 2015 at 6:55 PM, Henry Saputra <he...@gmail.com> wrote:
>>> Quick immediate comment that "Apache H2O" is not really Apache project.
>>>
>>> I assume you are referring to https://github.com/h2oai/h2o (or
>>> https://github.com/h2oai/h2o-dev) ?
>>>
>>> - Henry
>>>
>>> On Tue, Jan 27, 2015 at 5:29 PM, Thejas Nair <th...@gmail.com> wrote:
>>>> Hello everyone,
>>>>
>>>> I would like to propose the inclusion of Singa as an Apache Incubator project.
>>>>
>>>> Here is the proposal - https://wiki.apache.org/incubator/SingaProposal
>>>>
>>>> Please review the proposal and give feedback. I am planning to start a
>>>> vote after 7 days if the proposal looks good.
>>>> We are also seeking additional Apache mentors for the project.
>>>>
>>>> Thanks,
>>>> Thejas
>>>> ==========================================================
>>>> Singa Incubator Proposal
>>>>
>>>> Abstract
>>>>
>>>> SINGA is a distributed deep learning platform.
>>>>
>>>> Proposal
>>>>
>>>> SINGA is an efficient, scalable and easy-to-use distributed platform
>>>> for training deep learning models, e.g., Deep Convolutional Neural
>>>> Network and Deep Belief Network. It parallelizes the computation
>>>> (i.e., training) onto a cluster of nodes by distributing the training
>>>> data and model automatically to speed up the training. Built-in
>>>> training algorithms like Back-Propagation and Contrastive Divergence
>>>> are implemented based on common abstractions of deep learning models.
>>>> Users can train their own deep learning models by simply customizing
>>>> these abstractions like implementing the Mapper and Reducer in Hadoop.
>>>>
>>>> Background
>>>>
>>>> Deep learning refers to a set of feature (or representation) learning
>>>> models that consist of multiple (non-linear) layers, where different
>>>> layers learn different levels of abstractions (representations) of the
>>>> raw input data. Larger (in terms of model parameters) and deeper (in
>>>> terms of number of layers) models have shown better performance, e.g.,
>>>> lower image classification error in Large Scale Visual Recognition
>>>> Challenge. However, a larger model requires more memory and larger
>>>> training data to reduce over-fitting. Complex numeric operations make
>>>> the training computation intensive. In practice, training large deep
>>>> learning models takes weeks or months on a single node (even with
>>>> GPU).
>>>>
>>>> Rational
>>>>
>>>> Deep learning has gained a lot of attraction in both academia and
>>>> industry due to its success in a wide range of areas such as computer
>>>> vision and speech recognition. However, training of such models is
>>>> computationally expensive, especially for large and deep models (e.g.,
>>>> with billions of parameters and more than 10 layers). Both Google and
>>>> Microsoft have developed distributed deep learning systems to make the
>>>> training more efficient by distributing the computations within a
>>>> cluster of nodes. However, these systems are closed source softwares.
>>>> Our goal is to leverage the community of open source developers to
>>>> make SINGA efficient, scalable and easy to use. SINGA is a full
>>>> fledged distributed platform, that could benefit the community and
>>>> also benefit from the community in their involvement in contributing
>>>> to the further work in this area. We believe the nature of SINGA and
>>>> our visions for the system fit naturally to Apache's philosophy and
>>>> development framework.
>>>>
>>>> Initial Goals
>>>>
>>>> We have developed a system for SINGA running on a commodity computer
>>>> cluster. The initial goals include, * improving the system in terms of
>>>> scalability and efficiency, e.g., using Infiniband for network
>>>> communication and multi-threading for one node computation. We would
>>>> consider extending SINGA to GPU clusters later. * benchmarking with
>>>> larger datasets (hundreds of millions of training instances) and
>>>> models (billions of parameters). * adding more built-in deep learning
>>>> models. Users can train the built-in models on their datasets
>>>> directly.
>>>>
>>>> Current Status
>>>>
>>>> Meritocracy
>>>>
>>>> We would like to follow ASF meritocratic principles to encourage more
>>>> developers to contribute in this project. We know that only active and
>>>> excellent developers can make SINGA a successful project. The
>>>> committer list and PMC will be updated based on developers'
>>>> performance and commitment. We are also improving the documentation
>>>> and code to help new developers get started quickly.
>>>>
>>>> Community
>>>>
>>>> SINGA is currently being developed in the Database System Research Lab
>>>> at the National University of Singapore (NUS) in collaboration with
>>>> Zhejiang University in China. Our lab has extensive experience in
>>>> building database related systems, including distributed systems. Six
>>>> PhD students and research assistants (Jinyang Gao, Kaiping Zheng,
>>>> Sheng Wang, Wei Wang, Zhaojing Luo and Zhongle Xie) , a research
>>>> fellow (Anh Dinh) and three professors (Beng Chin Ooi, Gang Chen, Kian
>>>> Lee Tan) have been working for a year on this project. We are open to
>>>> recruiting more developers from diverse backgrounds.
>>>>
>>>> Core Developers
>>>>
>>>> Beng Chin Ooi, Gang Chen and Kian Lee Tan are professors who have
>>>> worked on distributed systems for more than 20 years. They have
>>>> collaborated with the industry and have built various large scale
>>>> systems. Anh Dinh's research is also on distributed systems, albeit
>>>> with more focus on security aspects. Wei Wang's research is on deep
>>>> learning problems including deep learning applications and large scale
>>>> training. Sheng Wang and Jinyang are working on efficient indexing,
>>>> querying of large scale data and machine learning. Kaiping, Zhaojing
>>>> and Zhongle are new PhD students who jointed SINGA recently. They will
>>>> work on this project for a longer time (next 4-5 years). While we
>>>> share common research interests, each member also brings diverse
>>>> expertise to the team.
>>>>
>>>> Alignment
>>>>
>>>> ASF is already the home of many distributed platforms, e.g., Hadoop,
>>>> Spark and Mahout, each of which targets a different application
>>>> domain. SINGA, being a distributed platform for large-scale deep
>>>> learning, focuses on another important domain for which there still
>>>> lacks a robust and scalable open-source platform. The recent success
>>>> of deep learning models especially for vision and speech recognition
>>>> tasks has generated interests in both applying existing deep learning
>>>> models and in developing new ones. Thus, an open-source platform for
>>>> deep learning will be able to attract a large community of users and
>>>> developers. SINGA is a complex system needing many iterations of
>>>> design, implementation and testing. Apache's collaboration framework
>>>> which encourages active contribution from developers will inevitably
>>>> help improve the quality of the system, as shown in the success of
>>>> Hadoop, Spark, etc.. Equally important is the community of users which
>>>> helps identify real-life applications of deep learning, and helps to
>>>> evaluate the system's performance and ease-of-use. We hope to leverage
>>>> ASF for coordinating and promoting both communities, and in return
>>>> benefit the communities with another useful tool.
>>>>
>>>> Known Risks
>>>>
>>>> Orphaned products
>>>>
>>>> Four core developers (Anh, Wei Wang, Jinyang and Sheng Wang) may leave
>>>> the lab in two to four years time. It is possible that some of them
>>>> may not have enough time to focus on this project after that. But,
>>>> SINGA is part of our other bigger research projects on building an
>>>> infrastructure for data intensive applications, which include
>>>> health-care analytics and brain-inspired computing. Beng Chin and Kian
>>>> Lee would continue working on it and getting more people involved. For
>>>> example, three new developers (Kaiping, Zhaojing and Zhongle) joined
>>>> us recently. Individual developers are welcome to make SINGA a diverse
>>>> community that is robust and independent from any single developer.
>>>>
>>>> Inexperience with Open Source
>>>>
>>>> All the developers are active users and followers of open source
>>>> projects. Our research lab has a strong commitment to open source, and
>>>> has released the source code of several systems under open source
>>>> license as a way of contributing back to the open source community.
>>>> But we do not have much real experience in open source projects with
>>>> large and well organized communities like those in Apache. This is one
>>>> reason we choose Apache which is experienced in open source project
>>>> incubation. We hope to get the help from Apache (e.g., champion and
>>>> mentors) to establish a healthy path for SINGA.
>>>>
>>>> Homogenous Developers
>>>>
>>>> Although the current developers are researchers in the universities,
>>>> they have different research interests and project experiences, as
>>>> mentioned in the section that introduces the core developers. We know
>>>> that a diverse community is helpful. Hence we are open to the idea of
>>>> recruiting developers from other regions and organizations.
>>>>
>>>> Reliance on Salaried Developers
>>>>
>>>> As a research project in the university, SINGA's current developing
>>>> community consists of professors, PhD students, research assistants
>>>> and postdoctoral fellows. They are driven by their interests to work
>>>> on this project and have contributed actively since the start of the
>>>> project. The research assistants and fellows are expected to leave
>>>> when their contracts expire. However, they are keen to continue to
>>>> work on the project voluntarily. Moreover, as a long term research
>>>> project, new research assistants and fellows are likely to join the
>>>> project.
>>>>
>>>> A Excessive Fascination with the Apache Brand
>>>>
>>>> We choose Apache not for publicity. We have two purposes. First, we
>>>> want to leverage Apache's reputation to recruit more developers to
>>>> make a diverse community. Second, we hope that Apache can help us to
>>>> establish a healthy path in developing SINGA. Beng Chin and Kian-Lee
>>>> are established database and distributed system researchers, and
>>>> together with the other contributors, they sincerely believe that
>>>> there is a need for a widely accepted open source distributed deep
>>>> learning platform. The field of deep learning is still at its infancy,
>>>> and an open source platform will fuel the research in the area.
>>>> Moreover, such a platform will enable researchers to develop new
>>>> models and algorithms, rather than spending time implementing a deep
>>>> learning system from scratch. Furthermore, the need for scalability
>>>> for such a platform is obvious.
>>>>
>>>> Relationship with Other Apache Products
>>>>
>>>> Apache H2O implemented two simple deep learning models, namely the
>>>> Multi-Layer Perceptron and Deep Auto-encoders. There are two
>>>> significant differences between H2O and SINGA. First, H2O adopts the
>>>> Map-Reduce framework which runs a set of computing nodes in parallel
>>>> againsts of the training set. Model parameters trained by all
>>>> computing nodes are averaged as the final model parameters. This
>>>> training algorithm is different from the distributed training
>>>> algorithm used by DistBelief, Adam and SINGA, which frequently
>>>> synchronizes the parameters trained from different nodes. SINGA adopts
>>>> the parameter server framework to support a wide range of distributed
>>>> training algorithms and parallelization methods (e.g., data
>>>> parallelism, model parallelism and hybrid parallelism. H2O only
>>>> support data parallelism) . Second, in H2O, users are restricted to
>>>> use the two built-in models. In SINGA, we provide simple programming
>>>> model to let users implement their own deep learning models. A new
>>>> deep learning model can be implemented by customizing the base Layer
>>>> class for each layer involved in the model. It is similar to writing
>>>> Hadoop programs where users only need to override the base Mapper and
>>>> Reducer. We also provide built-in models for users to use directly.
>>>>
>>>> Documentation
>>>>
>>>> The project is hosted at
>>>> http://www.comp.nus.edu.sg/~dbsystem/project/singa.html.
>>>> Documentations can be found at the Github Wiki Page:
>>>> https://github.com/nusinga/singa/wiki. We continue to refine and
>>>> improve the documentation.
>>>>
>>>> Initial Source
>>>>
>>>> We use Github to maintain our source code, https://github.com/nusinga/singa
>>>>
>>>> Source and Intellectual Property Submission Plan
>>>>
>>>> We plan to make our code base be under Apache License, Version 2.0.
>>>>
>>>> External Dependencies
>>>>
>>>> required by the core code base: glog, gflags, google protobuf,
>>>> open-blas, mpich, armci-mpi.
>>>> required by data preparation and preprocessing: opencv, hdfs, python.
>>>>
>>>> Cryptography
>>>>
>>>> Not Applicable
>>>>
>>>> Required Resources
>>>>
>>>> Mailing Lists
>>>>
>>>> Currently, we use google group for internal discussion. The mailing
>>>> address is nusinga@googlegroup.com. We will migrate the content to the
>>>> apache mailing lists in the future.
>>>>
>>>> singa-dev
>>>> singa-user
>>>> singa-commits
>>>> singa-private (for private discussion within PCM)
>>>>
>>>> Git Repository
>>>>
>>>> We want to continue using git for version control. Hence, a git repo
>>>> is required.
>>>>
>>>> Issue Tracking
>>>>
>>>> JIRA Singa (SINGA)
>>>>
>>>> Initial Committers
>>>>
>>>> Beng Chin Ooi (ooibc @comp.nus.edu.sg)
>>>> Kian Lee Tan (tankl @comp.nus.edu.sg)
>>>> Gang Chen (cg @zju.edu.cn)
>>>> Wei Wang (wangwei @comp.nus.edu.sg)
>>>> Dinh Tien Tuan Anh (dinhtta @comp.nus.edu.sg)
>>>> Jinyang Gao (jinyang.gao @comp.nus.edu.sg)
>>>> Sheng Wang (wangsh @comp.nus.edu.sg)
>>>> Kaiping Zheng (kaiping @comp.nus.edu.sg)
>>>> Zhaojing Luo (zhaojing @comp.nus.edu.sg)
>>>> Zhongle Xie (zhongle @comp.nus.edu.sg)
>>>>
>>>> Affiliations
>>>>
>>>> Beng Chin Ooi, National University of Singapore
>>>> Kian Lee Tan, National University of Singapore
>>>> Gang Chen, Zhejiang University
>>>> Wei Wang, National University of Singapore
>>>> Dinh Tien Tuan Anh, National University of Singapore
>>>> Jinyang Gao, National University of Singapore
>>>> Sheng Wang, National University of Singapore
>>>> Kaiping Zheng, National University of Singapore
>>>> Zhaojing Luo, National University of Singapore
>>>> Zhongle Xie, National University of Singapore
>>>>
>>>> Sponsors
>>>>
>>>> Champion
>>>>
>>>> Thejas Nair (thejas at apache.org) - Hortonworks
>>>>
>>>> Nominated Mentors
>>>>
>>>> Thejas Nair (thejas at apache.org) - Hortonworks
>>>> Alan Gates (gates at apache dot org) - Hortonworks
>>>> (Seeking more volunteers!)
>>>>
>>>> Sponsoring Entity
>>>>
>>>> We are requesting the Incubator to sponsor this project.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>>> For additional commands, e-mail: general-help@incubator.apache.org
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>> For additional commands, e-mail: general-help@incubator.apache.org
>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [DISCUSS] [PROPOSAL] Singa for Apache Incubator

Posted by Thejas Nair <th...@gmail.com>.
The "Relationship with Other Apache Products" section has been
updated. The reference to H2O in that section has been removed, and
other projects have been added.
 Thanks for the feedback!


On Wed, Jan 28, 2015 at 10:27 AM, Thejas Nair <th...@gmail.com> wrote:
> Thanks for pointing that out Henry! Yes, looks like H20 is not an
> apache project, I should have verified that.
> I will edit that, and revisit that section along with the folks in
> Singa community.
>
>
> On Tue, Jan 27, 2015 at 6:55 PM, Henry Saputra <he...@gmail.com> wrote:
>> Quick immediate comment that "Apache H2O" is not really Apache project.
>>
>> I assume you are referring to https://github.com/h2oai/h2o (or
>> https://github.com/h2oai/h2o-dev) ?
>>
>> - Henry
>>
>> On Tue, Jan 27, 2015 at 5:29 PM, Thejas Nair <th...@gmail.com> wrote:
>>> Hello everyone,
>>>
>>> I would like to propose the inclusion of Singa as an Apache Incubator project.
>>>
>>> Here is the proposal - https://wiki.apache.org/incubator/SingaProposal
>>>
>>> Please review the proposal and give feedback. I am planning to start a
>>> vote after 7 days if the proposal looks good.
>>> We are also seeking additional Apache mentors for the project.
>>>
>>> Thanks,
>>> Thejas
>>> ==========================================================
>>> Singa Incubator Proposal
>>>
>>> Abstract
>>>
>>> SINGA is a distributed deep learning platform.
>>>
>>> Proposal
>>>
>>> SINGA is an efficient, scalable and easy-to-use distributed platform
>>> for training deep learning models, e.g., Deep Convolutional Neural
>>> Network and Deep Belief Network. It parallelizes the computation
>>> (i.e., training) onto a cluster of nodes by distributing the training
>>> data and model automatically to speed up the training. Built-in
>>> training algorithms like Back-Propagation and Contrastive Divergence
>>> are implemented based on common abstractions of deep learning models.
>>> Users can train their own deep learning models by simply customizing
>>> these abstractions like implementing the Mapper and Reducer in Hadoop.
>>>
>>> Background
>>>
>>> Deep learning refers to a set of feature (or representation) learning
>>> models that consist of multiple (non-linear) layers, where different
>>> layers learn different levels of abstractions (representations) of the
>>> raw input data. Larger (in terms of model parameters) and deeper (in
>>> terms of number of layers) models have shown better performance, e.g.,
>>> lower image classification error in Large Scale Visual Recognition
>>> Challenge. However, a larger model requires more memory and larger
>>> training data to reduce over-fitting. Complex numeric operations make
>>> the training computation intensive. In practice, training large deep
>>> learning models takes weeks or months on a single node (even with
>>> GPU).
>>>
>>> Rational
>>>
>>> Deep learning has gained a lot of attraction in both academia and
>>> industry due to its success in a wide range of areas such as computer
>>> vision and speech recognition. However, training of such models is
>>> computationally expensive, especially for large and deep models (e.g.,
>>> with billions of parameters and more than 10 layers). Both Google and
>>> Microsoft have developed distributed deep learning systems to make the
>>> training more efficient by distributing the computations within a
>>> cluster of nodes. However, these systems are closed source softwares.
>>> Our goal is to leverage the community of open source developers to
>>> make SINGA efficient, scalable and easy to use. SINGA is a full
>>> fledged distributed platform, that could benefit the community and
>>> also benefit from the community in their involvement in contributing
>>> to the further work in this area. We believe the nature of SINGA and
>>> our visions for the system fit naturally to Apache's philosophy and
>>> development framework.
>>>
>>> Initial Goals
>>>
>>> We have developed a system for SINGA running on a commodity computer
>>> cluster. The initial goals include, * improving the system in terms of
>>> scalability and efficiency, e.g., using Infiniband for network
>>> communication and multi-threading for one node computation. We would
>>> consider extending SINGA to GPU clusters later. * benchmarking with
>>> larger datasets (hundreds of millions of training instances) and
>>> models (billions of parameters). * adding more built-in deep learning
>>> models. Users can train the built-in models on their datasets
>>> directly.
>>>
>>> Current Status
>>>
>>> Meritocracy
>>>
>>> We would like to follow ASF meritocratic principles to encourage more
>>> developers to contribute in this project. We know that only active and
>>> excellent developers can make SINGA a successful project. The
>>> committer list and PMC will be updated based on developers'
>>> performance and commitment. We are also improving the documentation
>>> and code to help new developers get started quickly.
>>>
>>> Community
>>>
>>> SINGA is currently being developed in the Database System Research Lab
>>> at the National University of Singapore (NUS) in collaboration with
>>> Zhejiang University in China. Our lab has extensive experience in
>>> building database related systems, including distributed systems. Six
>>> PhD students and research assistants (Jinyang Gao, Kaiping Zheng,
>>> Sheng Wang, Wei Wang, Zhaojing Luo and Zhongle Xie) , a research
>>> fellow (Anh Dinh) and three professors (Beng Chin Ooi, Gang Chen, Kian
>>> Lee Tan) have been working for a year on this project. We are open to
>>> recruiting more developers from diverse backgrounds.
>>>
>>> Core Developers
>>>
>>> Beng Chin Ooi, Gang Chen and Kian Lee Tan are professors who have
>>> worked on distributed systems for more than 20 years. They have
>>> collaborated with the industry and have built various large scale
>>> systems. Anh Dinh's research is also on distributed systems, albeit
>>> with more focus on security aspects. Wei Wang's research is on deep
>>> learning problems including deep learning applications and large scale
>>> training. Sheng Wang and Jinyang are working on efficient indexing,
>>> querying of large scale data and machine learning. Kaiping, Zhaojing
>>> and Zhongle are new PhD students who jointed SINGA recently. They will
>>> work on this project for a longer time (next 4-5 years). While we
>>> share common research interests, each member also brings diverse
>>> expertise to the team.
>>>
>>> Alignment
>>>
>>> ASF is already the home of many distributed platforms, e.g., Hadoop,
>>> Spark and Mahout, each of which targets a different application
>>> domain. SINGA, being a distributed platform for large-scale deep
>>> learning, focuses on another important domain for which there still
>>> lacks a robust and scalable open-source platform. The recent success
>>> of deep learning models especially for vision and speech recognition
>>> tasks has generated interests in both applying existing deep learning
>>> models and in developing new ones. Thus, an open-source platform for
>>> deep learning will be able to attract a large community of users and
>>> developers. SINGA is a complex system needing many iterations of
>>> design, implementation and testing. Apache's collaboration framework
>>> which encourages active contribution from developers will inevitably
>>> help improve the quality of the system, as shown in the success of
>>> Hadoop, Spark, etc.. Equally important is the community of users which
>>> helps identify real-life applications of deep learning, and helps to
>>> evaluate the system's performance and ease-of-use. We hope to leverage
>>> ASF for coordinating and promoting both communities, and in return
>>> benefit the communities with another useful tool.
>>>
>>> Known Risks
>>>
>>> Orphaned products
>>>
>>> Four core developers (Anh, Wei Wang, Jinyang and Sheng Wang) may leave
>>> the lab in two to four years time. It is possible that some of them
>>> may not have enough time to focus on this project after that. But,
>>> SINGA is part of our other bigger research projects on building an
>>> infrastructure for data intensive applications, which include
>>> health-care analytics and brain-inspired computing. Beng Chin and Kian
>>> Lee would continue working on it and getting more people involved. For
>>> example, three new developers (Kaiping, Zhaojing and Zhongle) joined
>>> us recently. Individual developers are welcome to make SINGA a diverse
>>> community that is robust and independent from any single developer.
>>>
>>> Inexperience with Open Source
>>>
>>> All the developers are active users and followers of open source
>>> projects. Our research lab has a strong commitment to open source, and
>>> has released the source code of several systems under open source
>>> license as a way of contributing back to the open source community.
>>> But we do not have much real experience in open source projects with
>>> large and well organized communities like those in Apache. This is one
>>> reason we choose Apache which is experienced in open source project
>>> incubation. We hope to get the help from Apache (e.g., champion and
>>> mentors) to establish a healthy path for SINGA.
>>>
>>> Homogenous Developers
>>>
>>> Although the current developers are researchers in the universities,
>>> they have different research interests and project experiences, as
>>> mentioned in the section that introduces the core developers. We know
>>> that a diverse community is helpful. Hence we are open to the idea of
>>> recruiting developers from other regions and organizations.
>>>
>>> Reliance on Salaried Developers
>>>
>>> As a research project in the university, SINGA's current developing
>>> community consists of professors, PhD students, research assistants
>>> and postdoctoral fellows. They are driven by their interests to work
>>> on this project and have contributed actively since the start of the
>>> project. The research assistants and fellows are expected to leave
>>> when their contracts expire. However, they are keen to continue to
>>> work on the project voluntarily. Moreover, as a long term research
>>> project, new research assistants and fellows are likely to join the
>>> project.
>>>
>>> A Excessive Fascination with the Apache Brand
>>>
>>> We choose Apache not for publicity. We have two purposes. First, we
>>> want to leverage Apache's reputation to recruit more developers to
>>> make a diverse community. Second, we hope that Apache can help us to
>>> establish a healthy path in developing SINGA. Beng Chin and Kian-Lee
>>> are established database and distributed system researchers, and
>>> together with the other contributors, they sincerely believe that
>>> there is a need for a widely accepted open source distributed deep
>>> learning platform. The field of deep learning is still at its infancy,
>>> and an open source platform will fuel the research in the area.
>>> Moreover, such a platform will enable researchers to develop new
>>> models and algorithms, rather than spending time implementing a deep
>>> learning system from scratch. Furthermore, the need for scalability
>>> for such a platform is obvious.
>>>
>>> Relationship with Other Apache Products
>>>
>>> Apache H2O implemented two simple deep learning models, namely the
>>> Multi-Layer Perceptron and Deep Auto-encoders. There are two
>>> significant differences between H2O and SINGA. First, H2O adopts the
>>> Map-Reduce framework which runs a set of computing nodes in parallel
>>> againsts of the training set. Model parameters trained by all
>>> computing nodes are averaged as the final model parameters. This
>>> training algorithm is different from the distributed training
>>> algorithm used by DistBelief, Adam and SINGA, which frequently
>>> synchronizes the parameters trained from different nodes. SINGA adopts
>>> the parameter server framework to support a wide range of distributed
>>> training algorithms and parallelization methods (e.g., data
>>> parallelism, model parallelism and hybrid parallelism. H2O only
>>> support data parallelism) . Second, in H2O, users are restricted to
>>> use the two built-in models. In SINGA, we provide simple programming
>>> model to let users implement their own deep learning models. A new
>>> deep learning model can be implemented by customizing the base Layer
>>> class for each layer involved in the model. It is similar to writing
>>> Hadoop programs where users only need to override the base Mapper and
>>> Reducer. We also provide built-in models for users to use directly.
>>>
>>> Documentation
>>>
>>> The project is hosted at
>>> http://www.comp.nus.edu.sg/~dbsystem/project/singa.html.
>>> Documentations can be found at the Github Wiki Page:
>>> https://github.com/nusinga/singa/wiki. We continue to refine and
>>> improve the documentation.
>>>
>>> Initial Source
>>>
>>> We use Github to maintain our source code, https://github.com/nusinga/singa
>>>
>>> Source and Intellectual Property Submission Plan
>>>
>>> We plan to make our code base be under Apache License, Version 2.0.
>>>
>>> External Dependencies
>>>
>>> required by the core code base: glog, gflags, google protobuf,
>>> open-blas, mpich, armci-mpi.
>>> required by data preparation and preprocessing: opencv, hdfs, python.
>>>
>>> Cryptography
>>>
>>> Not Applicable
>>>
>>> Required Resources
>>>
>>> Mailing Lists
>>>
>>> Currently, we use google group for internal discussion. The mailing
>>> address is nusinga@googlegroup.com. We will migrate the content to the
>>> apache mailing lists in the future.
>>>
>>> singa-dev
>>> singa-user
>>> singa-commits
>>> singa-private (for private discussion within PCM)
>>>
>>> Git Repository
>>>
>>> We want to continue using git for version control. Hence, a git repo
>>> is required.
>>>
>>> Issue Tracking
>>>
>>> JIRA Singa (SINGA)
>>>
>>> Initial Committers
>>>
>>> Beng Chin Ooi (ooibc @comp.nus.edu.sg)
>>> Kian Lee Tan (tankl @comp.nus.edu.sg)
>>> Gang Chen (cg @zju.edu.cn)
>>> Wei Wang (wangwei @comp.nus.edu.sg)
>>> Dinh Tien Tuan Anh (dinhtta @comp.nus.edu.sg)
>>> Jinyang Gao (jinyang.gao @comp.nus.edu.sg)
>>> Sheng Wang (wangsh @comp.nus.edu.sg)
>>> Kaiping Zheng (kaiping @comp.nus.edu.sg)
>>> Zhaojing Luo (zhaojing @comp.nus.edu.sg)
>>> Zhongle Xie (zhongle @comp.nus.edu.sg)
>>>
>>> Affiliations
>>>
>>> Beng Chin Ooi, National University of Singapore
>>> Kian Lee Tan, National University of Singapore
>>> Gang Chen, Zhejiang University
>>> Wei Wang, National University of Singapore
>>> Dinh Tien Tuan Anh, National University of Singapore
>>> Jinyang Gao, National University of Singapore
>>> Sheng Wang, National University of Singapore
>>> Kaiping Zheng, National University of Singapore
>>> Zhaojing Luo, National University of Singapore
>>> Zhongle Xie, National University of Singapore
>>>
>>> Sponsors
>>>
>>> Champion
>>>
>>> Thejas Nair (thejas at apache.org) - Hortonworks
>>>
>>> Nominated Mentors
>>>
>>> Thejas Nair (thejas at apache.org) - Hortonworks
>>> Alan Gates (gates at apache dot org) - Hortonworks
>>> (Seeking more volunteers!)
>>>
>>> Sponsoring Entity
>>>
>>> We are requesting the Incubator to sponsor this project.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>> For additional commands, e-mail: general-help@incubator.apache.org
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: general-help@incubator.apache.org
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [DISCUSS] [PROPOSAL] Singa for Apache Incubator

Posted by Thejas Nair <th...@gmail.com>.
Thanks for pointing that out Henry! Yes, looks like H20 is not an
apache project, I should have verified that.
I will edit that, and revisit that section along with the folks in
Singa community.


On Tue, Jan 27, 2015 at 6:55 PM, Henry Saputra <he...@gmail.com> wrote:
> Quick immediate comment that "Apache H2O" is not really Apache project.
>
> I assume you are referring to https://github.com/h2oai/h2o (or
> https://github.com/h2oai/h2o-dev) ?
>
> - Henry
>
> On Tue, Jan 27, 2015 at 5:29 PM, Thejas Nair <th...@gmail.com> wrote:
>> Hello everyone,
>>
>> I would like to propose the inclusion of Singa as an Apache Incubator project.
>>
>> Here is the proposal - https://wiki.apache.org/incubator/SingaProposal
>>
>> Please review the proposal and give feedback. I am planning to start a
>> vote after 7 days if the proposal looks good.
>> We are also seeking additional Apache mentors for the project.
>>
>> Thanks,
>> Thejas
>> ==========================================================
>> Singa Incubator Proposal
>>
>> Abstract
>>
>> SINGA is a distributed deep learning platform.
>>
>> Proposal
>>
>> SINGA is an efficient, scalable and easy-to-use distributed platform
>> for training deep learning models, e.g., Deep Convolutional Neural
>> Network and Deep Belief Network. It parallelizes the computation
>> (i.e., training) onto a cluster of nodes by distributing the training
>> data and model automatically to speed up the training. Built-in
>> training algorithms like Back-Propagation and Contrastive Divergence
>> are implemented based on common abstractions of deep learning models.
>> Users can train their own deep learning models by simply customizing
>> these abstractions like implementing the Mapper and Reducer in Hadoop.
>>
>> Background
>>
>> Deep learning refers to a set of feature (or representation) learning
>> models that consist of multiple (non-linear) layers, where different
>> layers learn different levels of abstractions (representations) of the
>> raw input data. Larger (in terms of model parameters) and deeper (in
>> terms of number of layers) models have shown better performance, e.g.,
>> lower image classification error in Large Scale Visual Recognition
>> Challenge. However, a larger model requires more memory and larger
>> training data to reduce over-fitting. Complex numeric operations make
>> the training computation intensive. In practice, training large deep
>> learning models takes weeks or months on a single node (even with
>> GPU).
>>
>> Rational
>>
>> Deep learning has gained a lot of attraction in both academia and
>> industry due to its success in a wide range of areas such as computer
>> vision and speech recognition. However, training of such models is
>> computationally expensive, especially for large and deep models (e.g.,
>> with billions of parameters and more than 10 layers). Both Google and
>> Microsoft have developed distributed deep learning systems to make the
>> training more efficient by distributing the computations within a
>> cluster of nodes. However, these systems are closed source softwares.
>> Our goal is to leverage the community of open source developers to
>> make SINGA efficient, scalable and easy to use. SINGA is a full
>> fledged distributed platform, that could benefit the community and
>> also benefit from the community in their involvement in contributing
>> to the further work in this area. We believe the nature of SINGA and
>> our visions for the system fit naturally to Apache's philosophy and
>> development framework.
>>
>> Initial Goals
>>
>> We have developed a system for SINGA running on a commodity computer
>> cluster. The initial goals include, * improving the system in terms of
>> scalability and efficiency, e.g., using Infiniband for network
>> communication and multi-threading for one node computation. We would
>> consider extending SINGA to GPU clusters later. * benchmarking with
>> larger datasets (hundreds of millions of training instances) and
>> models (billions of parameters). * adding more built-in deep learning
>> models. Users can train the built-in models on their datasets
>> directly.
>>
>> Current Status
>>
>> Meritocracy
>>
>> We would like to follow ASF meritocratic principles to encourage more
>> developers to contribute in this project. We know that only active and
>> excellent developers can make SINGA a successful project. The
>> committer list and PMC will be updated based on developers'
>> performance and commitment. We are also improving the documentation
>> and code to help new developers get started quickly.
>>
>> Community
>>
>> SINGA is currently being developed in the Database System Research Lab
>> at the National University of Singapore (NUS) in collaboration with
>> Zhejiang University in China. Our lab has extensive experience in
>> building database related systems, including distributed systems. Six
>> PhD students and research assistants (Jinyang Gao, Kaiping Zheng,
>> Sheng Wang, Wei Wang, Zhaojing Luo and Zhongle Xie) , a research
>> fellow (Anh Dinh) and three professors (Beng Chin Ooi, Gang Chen, Kian
>> Lee Tan) have been working for a year on this project. We are open to
>> recruiting more developers from diverse backgrounds.
>>
>> Core Developers
>>
>> Beng Chin Ooi, Gang Chen and Kian Lee Tan are professors who have
>> worked on distributed systems for more than 20 years. They have
>> collaborated with the industry and have built various large scale
>> systems. Anh Dinh's research is also on distributed systems, albeit
>> with more focus on security aspects. Wei Wang's research is on deep
>> learning problems including deep learning applications and large scale
>> training. Sheng Wang and Jinyang are working on efficient indexing,
>> querying of large scale data and machine learning. Kaiping, Zhaojing
>> and Zhongle are new PhD students who jointed SINGA recently. They will
>> work on this project for a longer time (next 4-5 years). While we
>> share common research interests, each member also brings diverse
>> expertise to the team.
>>
>> Alignment
>>
>> ASF is already the home of many distributed platforms, e.g., Hadoop,
>> Spark and Mahout, each of which targets a different application
>> domain. SINGA, being a distributed platform for large-scale deep
>> learning, focuses on another important domain for which there still
>> lacks a robust and scalable open-source platform. The recent success
>> of deep learning models especially for vision and speech recognition
>> tasks has generated interests in both applying existing deep learning
>> models and in developing new ones. Thus, an open-source platform for
>> deep learning will be able to attract a large community of users and
>> developers. SINGA is a complex system needing many iterations of
>> design, implementation and testing. Apache's collaboration framework
>> which encourages active contribution from developers will inevitably
>> help improve the quality of the system, as shown in the success of
>> Hadoop, Spark, etc.. Equally important is the community of users which
>> helps identify real-life applications of deep learning, and helps to
>> evaluate the system's performance and ease-of-use. We hope to leverage
>> ASF for coordinating and promoting both communities, and in return
>> benefit the communities with another useful tool.
>>
>> Known Risks
>>
>> Orphaned products
>>
>> Four core developers (Anh, Wei Wang, Jinyang and Sheng Wang) may leave
>> the lab in two to four years time. It is possible that some of them
>> may not have enough time to focus on this project after that. But,
>> SINGA is part of our other bigger research projects on building an
>> infrastructure for data intensive applications, which include
>> health-care analytics and brain-inspired computing. Beng Chin and Kian
>> Lee would continue working on it and getting more people involved. For
>> example, three new developers (Kaiping, Zhaojing and Zhongle) joined
>> us recently. Individual developers are welcome to make SINGA a diverse
>> community that is robust and independent from any single developer.
>>
>> Inexperience with Open Source
>>
>> All the developers are active users and followers of open source
>> projects. Our research lab has a strong commitment to open source, and
>> has released the source code of several systems under open source
>> license as a way of contributing back to the open source community.
>> But we do not have much real experience in open source projects with
>> large and well organized communities like those in Apache. This is one
>> reason we choose Apache which is experienced in open source project
>> incubation. We hope to get the help from Apache (e.g., champion and
>> mentors) to establish a healthy path for SINGA.
>>
>> Homogenous Developers
>>
>> Although the current developers are researchers in the universities,
>> they have different research interests and project experiences, as
>> mentioned in the section that introduces the core developers. We know
>> that a diverse community is helpful. Hence we are open to the idea of
>> recruiting developers from other regions and organizations.
>>
>> Reliance on Salaried Developers
>>
>> As a research project in the university, SINGA's current developing
>> community consists of professors, PhD students, research assistants
>> and postdoctoral fellows. They are driven by their interests to work
>> on this project and have contributed actively since the start of the
>> project. The research assistants and fellows are expected to leave
>> when their contracts expire. However, they are keen to continue to
>> work on the project voluntarily. Moreover, as a long term research
>> project, new research assistants and fellows are likely to join the
>> project.
>>
>> A Excessive Fascination with the Apache Brand
>>
>> We choose Apache not for publicity. We have two purposes. First, we
>> want to leverage Apache's reputation to recruit more developers to
>> make a diverse community. Second, we hope that Apache can help us to
>> establish a healthy path in developing SINGA. Beng Chin and Kian-Lee
>> are established database and distributed system researchers, and
>> together with the other contributors, they sincerely believe that
>> there is a need for a widely accepted open source distributed deep
>> learning platform. The field of deep learning is still at its infancy,
>> and an open source platform will fuel the research in the area.
>> Moreover, such a platform will enable researchers to develop new
>> models and algorithms, rather than spending time implementing a deep
>> learning system from scratch. Furthermore, the need for scalability
>> for such a platform is obvious.
>>
>> Relationship with Other Apache Products
>>
>> Apache H2O implemented two simple deep learning models, namely the
>> Multi-Layer Perceptron and Deep Auto-encoders. There are two
>> significant differences between H2O and SINGA. First, H2O adopts the
>> Map-Reduce framework which runs a set of computing nodes in parallel
>> againsts of the training set. Model parameters trained by all
>> computing nodes are averaged as the final model parameters. This
>> training algorithm is different from the distributed training
>> algorithm used by DistBelief, Adam and SINGA, which frequently
>> synchronizes the parameters trained from different nodes. SINGA adopts
>> the parameter server framework to support a wide range of distributed
>> training algorithms and parallelization methods (e.g., data
>> parallelism, model parallelism and hybrid parallelism. H2O only
>> support data parallelism) . Second, in H2O, users are restricted to
>> use the two built-in models. In SINGA, we provide simple programming
>> model to let users implement their own deep learning models. A new
>> deep learning model can be implemented by customizing the base Layer
>> class for each layer involved in the model. It is similar to writing
>> Hadoop programs where users only need to override the base Mapper and
>> Reducer. We also provide built-in models for users to use directly.
>>
>> Documentation
>>
>> The project is hosted at
>> http://www.comp.nus.edu.sg/~dbsystem/project/singa.html.
>> Documentations can be found at the Github Wiki Page:
>> https://github.com/nusinga/singa/wiki. We continue to refine and
>> improve the documentation.
>>
>> Initial Source
>>
>> We use Github to maintain our source code, https://github.com/nusinga/singa
>>
>> Source and Intellectual Property Submission Plan
>>
>> We plan to make our code base be under Apache License, Version 2.0.
>>
>> External Dependencies
>>
>> required by the core code base: glog, gflags, google protobuf,
>> open-blas, mpich, armci-mpi.
>> required by data preparation and preprocessing: opencv, hdfs, python.
>>
>> Cryptography
>>
>> Not Applicable
>>
>> Required Resources
>>
>> Mailing Lists
>>
>> Currently, we use google group for internal discussion. The mailing
>> address is nusinga@googlegroup.com. We will migrate the content to the
>> apache mailing lists in the future.
>>
>> singa-dev
>> singa-user
>> singa-commits
>> singa-private (for private discussion within PCM)
>>
>> Git Repository
>>
>> We want to continue using git for version control. Hence, a git repo
>> is required.
>>
>> Issue Tracking
>>
>> JIRA Singa (SINGA)
>>
>> Initial Committers
>>
>> Beng Chin Ooi (ooibc @comp.nus.edu.sg)
>> Kian Lee Tan (tankl @comp.nus.edu.sg)
>> Gang Chen (cg @zju.edu.cn)
>> Wei Wang (wangwei @comp.nus.edu.sg)
>> Dinh Tien Tuan Anh (dinhtta @comp.nus.edu.sg)
>> Jinyang Gao (jinyang.gao @comp.nus.edu.sg)
>> Sheng Wang (wangsh @comp.nus.edu.sg)
>> Kaiping Zheng (kaiping @comp.nus.edu.sg)
>> Zhaojing Luo (zhaojing @comp.nus.edu.sg)
>> Zhongle Xie (zhongle @comp.nus.edu.sg)
>>
>> Affiliations
>>
>> Beng Chin Ooi, National University of Singapore
>> Kian Lee Tan, National University of Singapore
>> Gang Chen, Zhejiang University
>> Wei Wang, National University of Singapore
>> Dinh Tien Tuan Anh, National University of Singapore
>> Jinyang Gao, National University of Singapore
>> Sheng Wang, National University of Singapore
>> Kaiping Zheng, National University of Singapore
>> Zhaojing Luo, National University of Singapore
>> Zhongle Xie, National University of Singapore
>>
>> Sponsors
>>
>> Champion
>>
>> Thejas Nair (thejas at apache.org) - Hortonworks
>>
>> Nominated Mentors
>>
>> Thejas Nair (thejas at apache.org) - Hortonworks
>> Alan Gates (gates at apache dot org) - Hortonworks
>> (Seeking more volunteers!)
>>
>> Sponsoring Entity
>>
>> We are requesting the Incubator to sponsor this project.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: general-help@incubator.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [DISCUSS] [PROPOSAL] Singa for Apache Incubator

Posted by Henry Saputra <he...@gmail.com>.
Quick immediate comment that "Apache H2O" is not really Apache project.

I assume you are referring to https://github.com/h2oai/h2o (or
https://github.com/h2oai/h2o-dev) ?

- Henry

On Tue, Jan 27, 2015 at 5:29 PM, Thejas Nair <th...@gmail.com> wrote:
> Hello everyone,
>
> I would like to propose the inclusion of Singa as an Apache Incubator project.
>
> Here is the proposal - https://wiki.apache.org/incubator/SingaProposal
>
> Please review the proposal and give feedback. I am planning to start a
> vote after 7 days if the proposal looks good.
> We are also seeking additional Apache mentors for the project.
>
> Thanks,
> Thejas
> ==========================================================
> Singa Incubator Proposal
>
> Abstract
>
> SINGA is a distributed deep learning platform.
>
> Proposal
>
> SINGA is an efficient, scalable and easy-to-use distributed platform
> for training deep learning models, e.g., Deep Convolutional Neural
> Network and Deep Belief Network. It parallelizes the computation
> (i.e., training) onto a cluster of nodes by distributing the training
> data and model automatically to speed up the training. Built-in
> training algorithms like Back-Propagation and Contrastive Divergence
> are implemented based on common abstractions of deep learning models.
> Users can train their own deep learning models by simply customizing
> these abstractions like implementing the Mapper and Reducer in Hadoop.
>
> Background
>
> Deep learning refers to a set of feature (or representation) learning
> models that consist of multiple (non-linear) layers, where different
> layers learn different levels of abstractions (representations) of the
> raw input data. Larger (in terms of model parameters) and deeper (in
> terms of number of layers) models have shown better performance, e.g.,
> lower image classification error in Large Scale Visual Recognition
> Challenge. However, a larger model requires more memory and larger
> training data to reduce over-fitting. Complex numeric operations make
> the training computation intensive. In practice, training large deep
> learning models takes weeks or months on a single node (even with
> GPU).
>
> Rational
>
> Deep learning has gained a lot of attraction in both academia and
> industry due to its success in a wide range of areas such as computer
> vision and speech recognition. However, training of such models is
> computationally expensive, especially for large and deep models (e.g.,
> with billions of parameters and more than 10 layers). Both Google and
> Microsoft have developed distributed deep learning systems to make the
> training more efficient by distributing the computations within a
> cluster of nodes. However, these systems are closed source softwares.
> Our goal is to leverage the community of open source developers to
> make SINGA efficient, scalable and easy to use. SINGA is a full
> fledged distributed platform, that could benefit the community and
> also benefit from the community in their involvement in contributing
> to the further work in this area. We believe the nature of SINGA and
> our visions for the system fit naturally to Apache's philosophy and
> development framework.
>
> Initial Goals
>
> We have developed a system for SINGA running on a commodity computer
> cluster. The initial goals include, * improving the system in terms of
> scalability and efficiency, e.g., using Infiniband for network
> communication and multi-threading for one node computation. We would
> consider extending SINGA to GPU clusters later. * benchmarking with
> larger datasets (hundreds of millions of training instances) and
> models (billions of parameters). * adding more built-in deep learning
> models. Users can train the built-in models on their datasets
> directly.
>
> Current Status
>
> Meritocracy
>
> We would like to follow ASF meritocratic principles to encourage more
> developers to contribute in this project. We know that only active and
> excellent developers can make SINGA a successful project. The
> committer list and PMC will be updated based on developers'
> performance and commitment. We are also improving the documentation
> and code to help new developers get started quickly.
>
> Community
>
> SINGA is currently being developed in the Database System Research Lab
> at the National University of Singapore (NUS) in collaboration with
> Zhejiang University in China. Our lab has extensive experience in
> building database related systems, including distributed systems. Six
> PhD students and research assistants (Jinyang Gao, Kaiping Zheng,
> Sheng Wang, Wei Wang, Zhaojing Luo and Zhongle Xie) , a research
> fellow (Anh Dinh) and three professors (Beng Chin Ooi, Gang Chen, Kian
> Lee Tan) have been working for a year on this project. We are open to
> recruiting more developers from diverse backgrounds.
>
> Core Developers
>
> Beng Chin Ooi, Gang Chen and Kian Lee Tan are professors who have
> worked on distributed systems for more than 20 years. They have
> collaborated with the industry and have built various large scale
> systems. Anh Dinh's research is also on distributed systems, albeit
> with more focus on security aspects. Wei Wang's research is on deep
> learning problems including deep learning applications and large scale
> training. Sheng Wang and Jinyang are working on efficient indexing,
> querying of large scale data and machine learning. Kaiping, Zhaojing
> and Zhongle are new PhD students who jointed SINGA recently. They will
> work on this project for a longer time (next 4-5 years). While we
> share common research interests, each member also brings diverse
> expertise to the team.
>
> Alignment
>
> ASF is already the home of many distributed platforms, e.g., Hadoop,
> Spark and Mahout, each of which targets a different application
> domain. SINGA, being a distributed platform for large-scale deep
> learning, focuses on another important domain for which there still
> lacks a robust and scalable open-source platform. The recent success
> of deep learning models especially for vision and speech recognition
> tasks has generated interests in both applying existing deep learning
> models and in developing new ones. Thus, an open-source platform for
> deep learning will be able to attract a large community of users and
> developers. SINGA is a complex system needing many iterations of
> design, implementation and testing. Apache's collaboration framework
> which encourages active contribution from developers will inevitably
> help improve the quality of the system, as shown in the success of
> Hadoop, Spark, etc.. Equally important is the community of users which
> helps identify real-life applications of deep learning, and helps to
> evaluate the system's performance and ease-of-use. We hope to leverage
> ASF for coordinating and promoting both communities, and in return
> benefit the communities with another useful tool.
>
> Known Risks
>
> Orphaned products
>
> Four core developers (Anh, Wei Wang, Jinyang and Sheng Wang) may leave
> the lab in two to four years time. It is possible that some of them
> may not have enough time to focus on this project after that. But,
> SINGA is part of our other bigger research projects on building an
> infrastructure for data intensive applications, which include
> health-care analytics and brain-inspired computing. Beng Chin and Kian
> Lee would continue working on it and getting more people involved. For
> example, three new developers (Kaiping, Zhaojing and Zhongle) joined
> us recently. Individual developers are welcome to make SINGA a diverse
> community that is robust and independent from any single developer.
>
> Inexperience with Open Source
>
> All the developers are active users and followers of open source
> projects. Our research lab has a strong commitment to open source, and
> has released the source code of several systems under open source
> license as a way of contributing back to the open source community.
> But we do not have much real experience in open source projects with
> large and well organized communities like those in Apache. This is one
> reason we choose Apache which is experienced in open source project
> incubation. We hope to get the help from Apache (e.g., champion and
> mentors) to establish a healthy path for SINGA.
>
> Homogenous Developers
>
> Although the current developers are researchers in the universities,
> they have different research interests and project experiences, as
> mentioned in the section that introduces the core developers. We know
> that a diverse community is helpful. Hence we are open to the idea of
> recruiting developers from other regions and organizations.
>
> Reliance on Salaried Developers
>
> As a research project in the university, SINGA's current developing
> community consists of professors, PhD students, research assistants
> and postdoctoral fellows. They are driven by their interests to work
> on this project and have contributed actively since the start of the
> project. The research assistants and fellows are expected to leave
> when their contracts expire. However, they are keen to continue to
> work on the project voluntarily. Moreover, as a long term research
> project, new research assistants and fellows are likely to join the
> project.
>
> A Excessive Fascination with the Apache Brand
>
> We choose Apache not for publicity. We have two purposes. First, we
> want to leverage Apache's reputation to recruit more developers to
> make a diverse community. Second, we hope that Apache can help us to
> establish a healthy path in developing SINGA. Beng Chin and Kian-Lee
> are established database and distributed system researchers, and
> together with the other contributors, they sincerely believe that
> there is a need for a widely accepted open source distributed deep
> learning platform. The field of deep learning is still at its infancy,
> and an open source platform will fuel the research in the area.
> Moreover, such a platform will enable researchers to develop new
> models and algorithms, rather than spending time implementing a deep
> learning system from scratch. Furthermore, the need for scalability
> for such a platform is obvious.
>
> Relationship with Other Apache Products
>
> Apache H2O implemented two simple deep learning models, namely the
> Multi-Layer Perceptron and Deep Auto-encoders. There are two
> significant differences between H2O and SINGA. First, H2O adopts the
> Map-Reduce framework which runs a set of computing nodes in parallel
> againsts of the training set. Model parameters trained by all
> computing nodes are averaged as the final model parameters. This
> training algorithm is different from the distributed training
> algorithm used by DistBelief, Adam and SINGA, which frequently
> synchronizes the parameters trained from different nodes. SINGA adopts
> the parameter server framework to support a wide range of distributed
> training algorithms and parallelization methods (e.g., data
> parallelism, model parallelism and hybrid parallelism. H2O only
> support data parallelism) . Second, in H2O, users are restricted to
> use the two built-in models. In SINGA, we provide simple programming
> model to let users implement their own deep learning models. A new
> deep learning model can be implemented by customizing the base Layer
> class for each layer involved in the model. It is similar to writing
> Hadoop programs where users only need to override the base Mapper and
> Reducer. We also provide built-in models for users to use directly.
>
> Documentation
>
> The project is hosted at
> http://www.comp.nus.edu.sg/~dbsystem/project/singa.html.
> Documentations can be found at the Github Wiki Page:
> https://github.com/nusinga/singa/wiki. We continue to refine and
> improve the documentation.
>
> Initial Source
>
> We use Github to maintain our source code, https://github.com/nusinga/singa
>
> Source and Intellectual Property Submission Plan
>
> We plan to make our code base be under Apache License, Version 2.0.
>
> External Dependencies
>
> required by the core code base: glog, gflags, google protobuf,
> open-blas, mpich, armci-mpi.
> required by data preparation and preprocessing: opencv, hdfs, python.
>
> Cryptography
>
> Not Applicable
>
> Required Resources
>
> Mailing Lists
>
> Currently, we use google group for internal discussion. The mailing
> address is nusinga@googlegroup.com. We will migrate the content to the
> apache mailing lists in the future.
>
> singa-dev
> singa-user
> singa-commits
> singa-private (for private discussion within PCM)
>
> Git Repository
>
> We want to continue using git for version control. Hence, a git repo
> is required.
>
> Issue Tracking
>
> JIRA Singa (SINGA)
>
> Initial Committers
>
> Beng Chin Ooi (ooibc @comp.nus.edu.sg)
> Kian Lee Tan (tankl @comp.nus.edu.sg)
> Gang Chen (cg @zju.edu.cn)
> Wei Wang (wangwei @comp.nus.edu.sg)
> Dinh Tien Tuan Anh (dinhtta @comp.nus.edu.sg)
> Jinyang Gao (jinyang.gao @comp.nus.edu.sg)
> Sheng Wang (wangsh @comp.nus.edu.sg)
> Kaiping Zheng (kaiping @comp.nus.edu.sg)
> Zhaojing Luo (zhaojing @comp.nus.edu.sg)
> Zhongle Xie (zhongle @comp.nus.edu.sg)
>
> Affiliations
>
> Beng Chin Ooi, National University of Singapore
> Kian Lee Tan, National University of Singapore
> Gang Chen, Zhejiang University
> Wei Wang, National University of Singapore
> Dinh Tien Tuan Anh, National University of Singapore
> Jinyang Gao, National University of Singapore
> Sheng Wang, National University of Singapore
> Kaiping Zheng, National University of Singapore
> Zhaojing Luo, National University of Singapore
> Zhongle Xie, National University of Singapore
>
> Sponsors
>
> Champion
>
> Thejas Nair (thejas at apache.org) - Hortonworks
>
> Nominated Mentors
>
> Thejas Nair (thejas at apache.org) - Hortonworks
> Alan Gates (gates at apache dot org) - Hortonworks
> (Seeking more volunteers!)
>
> Sponsoring Entity
>
> We are requesting the Incubator to sponsor this project.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org