You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@incubator.apache.org by oo...@comp.nus.edu.sg on 2015/02/06 04:55:24 UTC

[Fwd: Re: [DISCUSS] [PROPOSAL] Singa for Apache Incubator]

Thanks for the comments and suggestions.
With permission from Thejas, I would like to respond to point 2.

We have a huge team down at NUS (National University of Singapore) --
we have about seven database/data mining data professors (not including
those in systems, networking, and machine learning).
I myself have nine PhD students in a steady state, and I have a few large
grants, with a total budget of about 15 million S$ (~12 million USD), that
allows me to hire a number of research fellows and research assistants for
the next few years.  In a constant state, I have about 20 people (PhD
students/RA/RF) working with me alone.  Other professors have their own
grants (unlike other countries, it is relatively easy to get large grants
in Singapore; many overseas Universities, including UIUC, MIT, ETH etc
have research labs funded by Singapore Research Foundation [equivalent of
NSF]).

SINGA is a long term project for us -- while it is a platform as it is, we
are using it for healthcare predictive analytics (by working with a
hospital associated with the University).  Therefore, we will be working
on SINGA, not solely as a distributed DL platform, but as a tool that will
enable us to do data analytics on some business domains (eg. healthcase,
consumer etc)

For the initial set of committers, three are tenured professors, five are
students, with 2-5 years to go before they complete their PhD.  Quite
often, some would stay back as a research fellow for a couple of years
before they start looking for a job outside.  We will work with mentors
and new developers (from outside of NUS or Zhejiang University) in
enhancing the system.

The project should survive in that sense.

(I have an on-going project CIIDAA that has been around since 2008; it was
started as another project, epiC,  with a different grant, and then we
continue the development with a new grant for CIIDAA --
http://www.comp.nus.edu.sg/~ciidaa/
)

Thanks.

regards
beng chin
ps: i am not sure if my email will get through to the group.


---------------------------- Original Message ----------------------------
Subject: Re: [DISCUSS] [PROPOSAL] Singa for Apache Incubator
From:    "Henry Saputra" <he...@gmail.com>
Date:    Thu, February 5, 2015 2:57 pm
To:      "general@incubator.apache.org" <ge...@incubator.apache.org>
Cc:      ooibc@comp.nus.edu.sg
--------------------------------------------------------------------------

Several comments:
-) How many users already using this project? I would reccomend to
drop request for singa-user list at the beginning.
-) All the initial committers come from university and seemed like
some of them already ready to leave university. I am not too sure if
this project go survive if all of the inital committers are from
university as students.
-) Need to solicit more mentors if this project ever get to Apache incubator.

- Henry

On Tue, Feb 3, 2015 at 3:58 PM, Thejas Nair <th...@gmail.com> wrote:
> The "Relationship with Other Apache Products" section has been
> updated. The reference to H2O in that section has been removed, and
> other projects have been added.
>  Thanks for the feedback!
>
>
> On Wed, Jan 28, 2015 at 10:27 AM, Thejas Nair <th...@gmail.com>
wrote:
>> Thanks for pointing that out Henry! Yes, looks like H20 is not an
>> apache project, I should have verified that.
>> I will edit that, and revisit that section along with the folks in
>> Singa community.
>>
>>
>> On Tue, Jan 27, 2015 at 6:55 PM, Henry Saputra
<he...@gmail.com> wrote:
>>> Quick immediate comment that "Apache H2O" is not really Apache project.
>>>
>>> I assume you are referring to https://github.com/h2oai/h2o (or
>>> https://github.com/h2oai/h2o-dev) ?
>>>
>>> - Henry
>>>
>>> On Tue, Jan 27, 2015 at 5:29 PM, Thejas Nair <th...@gmail.com>
wrote:
>>>> Hello everyone,
>>>>
>>>> I would like to propose the inclusion of Singa as an Apache Incubator
project.
>>>>
>>>> Here is the proposal - https://wiki.apache.org/incubator/SingaProposal
>>>>
>>>> Please review the proposal and give feedback. I am planning to start a
>>>> vote after 7 days if the proposal looks good.
>>>> We are also seeking additional Apache mentors for the project.
>>>>
>>>> Thanks,
>>>> Thejas
>>>> ==========================================================
>>>> Singa Incubator Proposal
>>>>
>>>> Abstract
>>>>
>>>> SINGA is a distributed deep learning platform.
>>>>
>>>> Proposal
>>>>
>>>> SINGA is an efficient, scalable and easy-to-use distributed platform
>>>> for training deep learning models, e.g., Deep Convolutional Neural
>>>> Network and Deep Belief Network. It parallelizes the computation
>>>> (i.e., training) onto a cluster of nodes by distributing the training
>>>> data and model automatically to speed up the training. Built-in
>>>> training algorithms like Back-Propagation and Contrastive Divergence
>>>> are implemented based on common abstractions of deep learning models.
>>>> Users can train their own deep learning models by simply customizing
>>>> these abstractions like implementing the Mapper and Reducer in Hadoop.
>>>>
>>>> Background
>>>>
>>>> Deep learning refers to a set of feature (or representation) learning
>>>> models that consist of multiple (non-linear) layers, where different
>>>> layers learn different levels of abstractions (representations) of the
>>>> raw input data. Larger (in terms of model parameters) and deeper (in
>>>> terms of number of layers) models have shown better performance, e.g.,
>>>> lower image classification error in Large Scale Visual Recognition
>>>> Challenge. However, a larger model requires more memory and larger
>>>> training data to reduce over-fitting. Complex numeric operations make
>>>> the training computation intensive. In practice, training large deep
>>>> learning models takes weeks or months on a single node (even with
>>>> GPU).
>>>>
>>>> Rational
>>>>
>>>> Deep learning has gained a lot of attraction in both academia and
>>>> industry due to its success in a wide range of areas such as computer
>>>> vision and speech recognition. However, training of such models is
>>>> computationally expensive, especially for large and deep models (e.g.,
>>>> with billions of parameters and more than 10 layers). Both Google and
>>>> Microsoft have developed distributed deep learning systems to make the
>>>> training more efficient by distributing the computations within a
>>>> cluster of nodes. However, these systems are closed source softwares.
>>>> Our goal is to leverage the community of open source developers to
>>>> make SINGA efficient, scalable and easy to use. SINGA is a full
>>>> fledged distributed platform, that could benefit the community and
>>>> also benefit from the community in their involvement in contributing
>>>> to the further work in this area. We believe the nature of SINGA and
>>>> our visions for the system fit naturally to Apache's philosophy and
>>>> development framework.
>>>>
>>>> Initial Goals
>>>>
>>>> We have developed a system for SINGA running on a commodity computer
>>>> cluster. The initial goals include, * improving the system in terms of
>>>> scalability and efficiency, e.g., using Infiniband for network
>>>> communication and multi-threading for one node computation. We would
>>>> consider extending SINGA to GPU clusters later. * benchmarking with
>>>> larger datasets (hundreds of millions of training instances) and
>>>> models (billions of parameters). * adding more built-in deep learning
>>>> models. Users can train the built-in models on their datasets
>>>> directly.
>>>>
>>>> Current Status
>>>>
>>>> Meritocracy
>>>>
>>>> We would like to follow ASF meritocratic principles to encourage more
>>>> developers to contribute in this project. We know that only active and
>>>> excellent developers can make SINGA a successful project. The
>>>> committer list and PMC will be updated based on developers'
>>>> performance and commitment. We are also improving the documentation
>>>> and code to help new developers get started quickly.
>>>>
>>>> Community
>>>>
>>>> SINGA is currently being developed in the Database System Research Lab
>>>> at the National University of Singapore (NUS) in collaboration with
>>>> Zhejiang University in China. Our lab has extensive experience in
>>>> building database related systems, including distributed systems. Six
>>>> PhD students and research assistants (Jinyang Gao, Kaiping Zheng,
>>>> Sheng Wang, Wei Wang, Zhaojing Luo and Zhongle Xie) , a research
>>>> fellow (Anh Dinh) and three professors (Beng Chin Ooi, Gang Chen, Kian
>>>> Lee Tan) have been working for a year on this project. We are open to
>>>> recruiting more developers from diverse backgrounds.
>>>>
>>>> Core Developers
>>>>
>>>> Beng Chin Ooi, Gang Chen and Kian Lee Tan are professors who have
>>>> worked on distributed systems for more than 20 years. They have
>>>> collaborated with the industry and have built various large scale
>>>> systems. Anh Dinh's research is also on distributed systems, albeit
>>>> with more focus on security aspects. Wei Wang's research is on deep
>>>> learning problems including deep learning applications and large scale
>>>> training. Sheng Wang and Jinyang are working on efficient indexing,
>>>> querying of large scale data and machine learning. Kaiping, Zhaojing
>>>> and Zhongle are new PhD students who jointed SINGA recently. They will
>>>> work on this project for a longer time (next 4-5 years). While we
>>>> share common research interests, each member also brings diverse
>>>> expertise to the team.
>>>>
>>>> Alignment
>>>>
>>>> ASF is already the home of many distributed platforms, e.g., Hadoop,
>>>> Spark and Mahout, each of which targets a different application
>>>> domain. SINGA, being a distributed platform for large-scale deep
>>>> learning, focuses on another important domain for which there still
>>>> lacks a robust and scalable open-source platform. The recent success
>>>> of deep learning models especially for vision and speech recognition
>>>> tasks has generated interests in both applying existing deep learning
>>>> models and in developing new ones. Thus, an open-source platform for
>>>> deep learning will be able to attract a large community of users and
>>>> developers. SINGA is a complex system needing many iterations of
>>>> design, implementation and testing. Apache's collaboration framework
>>>> which encourages active contribution from developers will inevitably
>>>> help improve the quality of the system, as shown in the success of
>>>> Hadoop, Spark, etc.. Equally important is the community of users which
>>>> helps identify real-life applications of deep learning, and helps to
>>>> evaluate the system's performance and ease-of-use. We hope to leverage
>>>> ASF for coordinating and promoting both communities, and in return
>>>> benefit the communities with another useful tool.
>>>>
>>>> Known Risks
>>>>
>>>> Orphaned products
>>>>
>>>> Four core developers (Anh, Wei Wang, Jinyang and Sheng Wang) may leave
>>>> the lab in two to four years time. It is possible that some of them
>>>> may not have enough time to focus on this project after that. But,
>>>> SINGA is part of our other bigger research projects on building an
>>>> infrastructure for data intensive applications, which include
>>>> health-care analytics and brain-inspired computing. Beng Chin and Kian
>>>> Lee would continue working on it and getting more people involved. For
>>>> example, three new developers (Kaiping, Zhaojing and Zhongle) joined
>>>> us recently. Individual developers are welcome to make SINGA a diverse
>>>> community that is robust and independent from any single developer.
>>>>
>>>> Inexperience with Open Source
>>>>
>>>> All the developers are active users and followers of open source
>>>> projects. Our research lab has a strong commitment to open source, and
>>>> has released the source code of several systems under open source
>>>> license as a way of contributing back to the open source community.
>>>> But we do not have much real experience in open source projects with
>>>> large and well organized communities like those in Apache. This is one
>>>> reason we choose Apache which is experienced in open source project
>>>> incubation. We hope to get the help from Apache (e.g., champion and
>>>> mentors) to establish a healthy path for SINGA.
>>>>
>>>> Homogenous Developers
>>>>
>>>> Although the current developers are researchers in the universities,
>>>> they have different research interests and project experiences, as
>>>> mentioned in the section that introduces the core developers. We know
>>>> that a diverse community is helpful. Hence we are open to the idea of
>>>> recruiting developers from other regions and organizations.
>>>>
>>>> Reliance on Salaried Developers
>>>>
>>>> As a research project in the university, SINGA's current developing
>>>> community consists of professors, PhD students, research assistants
>>>> and postdoctoral fellows. They are driven by their interests to work
>>>> on this project and have contributed actively since the start of the
>>>> project. The research assistants and fellows are expected to leave
>>>> when their contracts expire. However, they are keen to continue to
>>>> work on the project voluntarily. Moreover, as a long term research
>>>> project, new research assistants and fellows are likely to join the
>>>> project.
>>>>
>>>> A Excessive Fascination with the Apache Brand
>>>>
>>>> We choose Apache not for publicity. We have two purposes. First, we
>>>> want to leverage Apache's reputation to recruit more developers to
>>>> make a diverse community. Second, we hope that Apache can help us to
>>>> establish a healthy path in developing SINGA. Beng Chin and Kian-Lee
>>>> are established database and distributed system researchers, and
>>>> together with the other contributors, they sincerely believe that
>>>> there is a need for a widely accepted open source distributed deep
>>>> learning platform. The field of deep learning is still at its infancy,
>>>> and an open source platform will fuel the research in the area.
>>>> Moreover, such a platform will enable researchers to develop new
>>>> models and algorithms, rather than spending time implementing a deep
>>>> learning system from scratch. Furthermore, the need for scalability
>>>> for such a platform is obvious.
>>>>
>>>> Relationship with Other Apache Products
>>>>
>>>> Apache H2O implemented two simple deep learning models, namely the
>>>> Multi-Layer Perceptron and Deep Auto-encoders. There are two
>>>> significant differences between H2O and SINGA. First, H2O adopts the
>>>> Map-Reduce framework which runs a set of computing nodes in parallel
>>>> againsts of the training set. Model parameters trained by all
>>>> computing nodes are averaged as the final model parameters. This
>>>> training algorithm is different from the distributed training
>>>> algorithm used by DistBelief, Adam and SINGA, which frequently
>>>> synchronizes the parameters trained from different nodes. SINGA adopts
>>>> the parameter server framework to support a wide range of distributed
>>>> training algorithms and parallelization methods (e.g., data
>>>> parallelism, model parallelism and hybrid parallelism. H2O only
>>>> support data parallelism) . Second, in H2O, users are restricted to
>>>> use the two built-in models. In SINGA, we provide simple programming
>>>> model to let users implement their own deep learning models. A new
>>>> deep learning model can be implemented by customizing the base Layer
>>>> class for each layer involved in the model. It is similar to writing
>>>> Hadoop programs where users only need to override the base Mapper and
>>>> Reducer. We also provide built-in models for users to use directly.
>>>>
>>>> Documentation
>>>>
>>>> The project is hosted at
>>>> http://www.comp.nus.edu.sg/~dbsystem/project/singa.html.
>>>> Documentations can be found at the Github Wiki Page:
>>>> https://github.com/nusinga/singa/wiki. We continue to refine and
>>>> improve the documentation.
>>>>
>>>> Initial Source
>>>>
>>>> We use Github to maintain our source code,
https://github.com/nusinga/singa
>>>>
>>>> Source and Intellectual Property Submission Plan
>>>>
>>>> We plan to make our code base be under Apache License, Version 2.0.
>>>>
>>>> External Dependencies
>>>>
>>>> required by the core code base: glog, gflags, google protobuf,
>>>> open-blas, mpich, armci-mpi.
>>>> required by data preparation and preprocessing: opencv, hdfs, python.
>>>>
>>>> Cryptography
>>>>
>>>> Not Applicable
>>>>
>>>> Required Resources
>>>>
>>>> Mailing Lists
>>>>
>>>> Currently, we use google group for internal discussion. The mailing
>>>> address is nusinga@googlegroup.com. We will migrate the content to the
>>>> apache mailing lists in the future.
>>>>
>>>> singa-dev
>>>> singa-user
>>>> singa-commits
>>>> singa-private (for private discussion within PCM)
>>>>
>>>> Git Repository
>>>>
>>>> We want to continue using git for version control. Hence, a git repo
>>>> is required.
>>>>
>>>> Issue Tracking
>>>>
>>>> JIRA Singa (SINGA)
>>>>
>>>> Initial Committers
>>>>
>>>> Beng Chin Ooi (ooibc @comp.nus.edu.sg)
>>>> Kian Lee Tan (tankl @comp.nus.edu.sg)
>>>> Gang Chen (cg @zju.edu.cn)
>>>> Wei Wang (wangwei @comp.nus.edu.sg)
>>>> Dinh Tien Tuan Anh (dinhtta @comp.nus.edu.sg)
>>>> Jinyang Gao (jinyang.gao @comp.nus.edu.sg)
>>>> Sheng Wang (wangsh @comp.nus.edu.sg)
>>>> Kaiping Zheng (kaiping @comp.nus.edu.sg)
>>>> Zhaojing Luo (zhaojing @comp.nus.edu.sg)
>>>> Zhongle Xie (zhongle @comp.nus.edu.sg)
>>>>
>>>> Affiliations
>>>>
>>>> Beng Chin Ooi, National University of Singapore
>>>> Kian Lee Tan, National University of Singapore
>>>> Gang Chen, Zhejiang University
>>>> Wei Wang, National University of Singapore
>>>> Dinh Tien Tuan Anh, National University of Singapore
>>>> Jinyang Gao, National University of Singapore
>>>> Sheng Wang, National University of Singapore
>>>> Kaiping Zheng, National University of Singapore
>>>> Zhaojing Luo, National University of Singapore
>>>> Zhongle Xie, National University of Singapore
>>>>
>>>> Sponsors
>>>>
>>>> Champion
>>>>
>>>> Thejas Nair (thejas at apache.org) - Hortonworks
>>>>
>>>> Nominated Mentors
>>>>
>>>> Thejas Nair (thejas at apache.org) - Hortonworks
>>>> Alan Gates (gates at apache dot org) - Hortonworks
>>>> (Seeking more volunteers!)
>>>>
>>>> Sponsoring Entity
>>>>
>>>> We are requesting the Incubator to sponsor this project.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>>> For additional commands, e-mail: general-help@incubator.apache.org
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>> For additional commands, e-mail: general-help@incubator.apache.org
>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>



---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [DISCUSS] [PROPOSAL] Singa for Apache Incubator

Posted by Henry Saputra <he...@gmail.com>.
Thanks for the response, Thejas. Glad you have tried soliciting more
mentors.
Hopefully some of them bite.

On Friday, February 27, 2015, Thejas Nair <th...@gmail.com> wrote:

> Thanks for your inputs Henry.
> I did send personal emails to two folks (outside of Hortonworks) who
> seemed to be interested in the project, but that didn't help.  I have
> also been soliciting more mentors in this thread as well. I will try
> reaching out to folks who are in the intersection of incubator and
> mahout (or spark-ml) to see if they might be interested (hopefully
> people working on related projects are more likely to join in).
> Any other suggestions for soliciting more diverse set of mentors are
> also welcome.
>
> Regarding the diversity of initial set of committers, growing that
> should be easier once the project is an apache incubator project.  I
> see a strong desire to grow the community in the people who are
> currently working on the project.
>
>
>
> On Thu, Feb 26, 2015 at 11:42 PM, Henry Saputra <henry.saputra@gmail.com
> <javascript:;>> wrote:
> > I was not actually talking about requirement, but for the sake of
> > podling itself.
> >
> > If all initial mentors coming from same company, the risk of all of
> > them absent are greater because all will be subjected to same schedule
> > and priorities from their daytime employers. Especially for release
> > VOTEs. Three initial mentors wont be enough for this project, I think.
> >
> > Not too worries about initial committers coming from same org, but I
> > have seen that podling that does not have initial community will
> > struggle to thrive.
> >
> > Just 2-cents from my experience in incubator.
> >
> > - Henry
> >
> > On Thu, Feb 26, 2015 at 11:37 PM, jan i <jani@apache.org <javascript:;>>
> wrote:
> >> On Friday, February 27, 2015, Henry Saputra <henry.saputra@gmail.com
> <javascript:;>> wrote:
> >>
> >>> I am strongly suggest you solicit more (diverse) mentors before start
> the
> >>> VOTE.
> >>>
> >>> All initial committers are from same org and all initial mentors are
> >>> from same company (HW).
> >>
> >> We do have a requirement for diversity, for me all initial committers
> from
> >> the same company is just as big a problem as mentors. when everyone
> >> involved are from the same company then that signals a serious problem
> >> which should be addressed before starting a vote.
> >>
> >> rgds
> >> jan i
> >>
> >>>
> >>> I am not sure this is a good start for Apache podling.
> >>>
> >>>
> >>> - Henry
> >>>
> >>> On Thu, Feb 26, 2015 at 9:12 AM, Thejas Nair <thejas.nair@gmail.com
> <javascript:;>
> >>> <javascript:;>> wrote:
> >>> > The incubator proposal has been updated with the feedback so far.
> >>> > We have 3 mentors now, but I think it would be good to have
> additional
> >>> > mentors. Please let me know if anyone is able to help mentor this
> >>> > project.
> >>> >
> >>> > I am planning to start a vote on the proposal in a day or two.
> >>> >
> >>> >
> >>> > On Fri, Feb 6, 2015 at 5:21 PM,  <ooibc@comp.nus.edu.sg
> <javascript:;> <javascript:;>>
> >>> wrote:
> >>> >>
> >>> >> Regarding the number of users using this project -- at this moment,
> the
> >>> >> community is not big.  A few local start-ups have been trying to
> use it
> >>> >> (mainly due to announcement in our seminar list), eg. one is using
> it
> >>> for
> >>> >> image recognition (given a phone snapped by a user, it wants to be
> >>> return
> >>> >> the same the product, and a list of similar products, such as a
> luxury
> >>> bag
> >>> >> on a passerby).  Researchers from outside of NUS may have been
> using it
> >>> >> since we published an application paper on cross domain/modal
> retrieval
> >>> in
> >>> >> VLDB 2014.
> >>> >>
> >>> >> We have not announced the project to the outside community yet -- we
> >>> would
> >>> >> announce it in dbworld etc in due course.
> >>> >>
> >>> >> Thanks and have a good weekend.
> >>> >>
> >>> >> regards
> >>> >> beng chin
> >>> >>
> >>> >>>
> >>> >>> Thanks for the comments and suggestions.
> >>> >>> With permission from Thejas, I would like to respond to point 2.
> >>> >>>
> >>> >>> We have a huge team down at NUS (National University of Singapore)
> --
> >>> >>> we have about seven database/data mining data professors (not
> including
> >>> >>> those in systems, networking, and machine learning).
> >>> >>> I myself have nine PhD students in a steady state, and I have a few
> >>> large
> >>> >>> grants, with a total budget of about 15 million S$ (~12 million
> USD),
> >>> that
> >>> >>> allows me to hire a number of research fellows and research
> assistants
> >>> for
> >>> >>> the next few years.  In a constant state, I have about 20 people
> (PhD
> >>> >>> students/RA/RF) working with me alone.  Other professors have
> their own
> >>> >>> grants (unlike other countries, it is relatively easy to get large
> >>> grants
> >>> >>> in Singapore; many overseas Universities, including UIUC, MIT, ETH
> etc
> >>> >>> have research labs funded by Singapore Research Foundation
> [equivalent
> >>> of
> >>> >>> NSF]).
> >>> >>>
> >>> >>> SINGA is a long term project for us -- while it is a platform as it
> >>> is, we
> >>> >>> are using it for healthcare predictive analytics (by working with a
> >>> >>> hospital associated with the University).  Therefore, we will be
> >>> working
> >>> >>> on SINGA, not solely as a distributed DL platform, but as a tool
> that
> >>> will
> >>> >>> enable us to do data analytics on some business domains (eg.
> >>> healthcase,
> >>> >>> consumer etc)
> >>> >>>
> >>> >>> For the initial set of committers, three are tenured professors,
> five
> >>> are
> >>> >>> students, with 2-5 years to go before they complete their PhD.
> Quite
> >>> >>> often, some would stay back as a research fellow for a couple of
> years
> >>> >>> before they start looking for a job outside.  We will work with
> mentors
> >>> >>> and new developers (from outside of NUS or Zhejiang University) in
> >>> >>> enhancing the system.
> >>> >>>
> >>> >>> The project should survive in that sense.
> >>> >>>
> >>> >>> (I have an on-going project CIIDAA that has been around since
> 2008; it
> >>> was
> >>> >>> started as another project, epiC,  with a different grant, and
> then we
> >>> >>> continue the development with a new grant for CIIDAA --
> >>> >>> http://www.comp.nus.edu.sg/~ciidaa/
> >>> >>> )
> >>> >>>
> >>> >>> Thanks.
> >>> >>>
> >>> >>> regards
> >>> >>> beng chin
> >>> >>> ps: i am not sure if my email will get through to the group.
> >>> >>>
> >>> >>>
> >>> >>> ---------------------------- Original Message
> >>> ----------------------------
> >>> >>> Subject: Re: [DISCUSS] [PROPOSAL] Singa for Apache Incubator
> >>> >>> From:    "Henry Saputra" <henry.saputra@gmail.com <javascript:;>
> <javascript:;>>
> >>> >>> Date:    Thu, February 5, 2015 2:57 pm
> >>> >>> To:      "general@incubator.apache.org <javascript:;>
> <javascript:;>" <
> >>> general@incubator.apache.org <javascript:;> <javascript:;>>
> >>> >>> Cc:      ooibc@comp.nus.edu.sg <javascript:;> <javascript:;>
> >>> >>>
> >>>
> --------------------------------------------------------------------------
> >>> >>>
> >>> >>> Several comments:
> >>> >>> -) How many users already using this project? I would reccomend to
> >>> >>> drop request for singa-user list at the beginning.
> >>> >>> -) All the initial committers come from university and seemed like
> >>> >>> some of them already ready to leave university. I am not too sure
> if
> >>> >>> this project go survive if all of the inital committers are from
> >>> >>> university as students.
> >>> >>> -) Need to solicit more mentors if this project ever get to Apache
> >>> >>> incubator.
> >>> >>>
> >>> >>> - Henry
> >>> >>>
> >>> >>> On Tue, Feb 3, 2015 at 3:58 PM, Thejas Nair <thejas.nair@gmail.com
> <javascript:;>
> >>> <javascript:;>> wrote:
> >>> >>>> The "Relationship with Other Apache Products" section has been
> >>> >>>> updated. The reference to H2O in that section has been removed,
> and
> >>> >>>> other projects have been added.
> >>> >>>>  Thanks for the feedback!
> >>> >>>>
> >>> >>>>
> >>> >>>> On Wed, Jan 28, 2015 at 10:27 AM, Thejas Nair <
> thejas.nair@gmail.com <javascript:;>
> >>> <javascript:;>>
> >>> >>> wrote:
> >>> >>>>> Thanks for pointing that out Henry! Yes, looks like H20 is not an
> >>> >>>>> apache project, I should have verified that.
> >>> >>>>> I will edit that, and revisit that section along with the folks
> in
> >>> >>>>> Singa community.
> >>> >>>>>
> >>> >>>>>
> >>> >>>>> On Tue, Jan 27, 2015 at 6:55 PM, Henry Saputra
> >>> >>> <henry.saputra@gmail.com <javascript:;> <javascript:;>> wrote:
> >>> >>>>>> Quick immediate comment that "Apache H2O" is not really Apache
> >>> >>>>>> project.
> >>> >>>>>>
> >>> >>>>>> I assume you are referring to https://github.com/h2oai/h2o (or
> >>> >>>>>> https://github.com/h2oai/h2o-dev) ?
> >>> >>>>>>
> >>> >>>>>> - Henry
> >>> >>>>>>
> >>> >>>>>> On Tue, Jan 27, 2015 at 5:29 PM, Thejas Nair <
> thejas.nair@gmail.com <javascript:;>
> >>> <javascript:;>>
> >>> >>> wrote:
> >>> >>>>>>> Hello everyone,
> >>> >>>>>>>
> >>> >>>>>>> I would like to propose the inclusion of Singa as an Apache
> >>> Incubator
> >>> >>> project.
> >>> >>>>>>>
> >>> >>>>>>> Here is the proposal -
> >>> >>>>>>> https://wiki.apache.org/incubator/SingaProposal
> >>> >>>>>>>
> >>> >>>>>>> Please review the proposal and give feedback. I am planning to
> >>> start
> >>> >>>>>>> a
> >>> >>>>>>> vote after 7 days if the proposal looks good.
> >>> >>>>>>> We are also seeking additional Apache mentors for the project.
> >>> >>>>>>>
> >>> >>>>>>> Thanks,
> >>> >>>>>>> Thejas
> >>> >>>>>>> ==========================================================
> >>> >>>>>>> Singa Incubator Proposal
> >>> >>>>>>>
> >>> >>>>>>> Abstract
> >>> >>>>>>>
> >>> >>>>>>> SINGA is a distributed deep learning platform.
> >>> >>>>>>>
> >>> >>>>>>> Proposal
> >>> >>>>>>>
> >>> >>>>>>> SINGA is an efficient, scalable and easy-to-use distributed
> >>> platform
> >>> >>>>>>> for training deep learning models, e.g., Deep Convolutional
> Neural
> >>> >>>>>>> Network and Deep Belief Network. It parallelizes the
> computation
> >>> >>>>>>> (i.e., training) onto a cluster of nodes by distributing the
> >>> training
> >>> >>>>>>> data and model automatically to speed up the training. Built-in
> >>> >>>>>>> training algorithms like Back-Propagation and Contrastive
> >>> Divergence
> >>> >>>>>>> are implemented based on common abstractions of deep learning
> >>> models.
> >>> >>>>>>> Users can train their own deep learning models by simply
> >>> customizing
> >>> >>>>>>> these abstractions like implementing the Mapper and Reducer in
> >>> >>>>>>> Hadoop.
> >>> >>>>>>>
> >>> >>>>>>> Background
> >>> >>>>>>>
> >>> >>>>>>> Deep learning refers to a set of feature (or representation)
> >>> learning
> >>> >>>>>>> models that consist of multiple (non-linear) layers, where
> >>> different
> >>> >>>>>>> layers learn different levels of abstractions
> (representations) of
> >>> >>>>>>> the
> >>> >>>>>>> raw input data. Larger (in terms of model parameters) and
> deeper
> >>> (in
> >>> >>>>>>> terms of number of layers) models have shown better
> performance,
> >>> >>>>>>> e.g.,
> >>> >>>>>>> lower image classification error in Large Scale Visual
> Recognition
> >>> >>>>>>> Challenge. However, a larger model requires more memory and
> larger
> >>> >>>>>>> training data to reduce over-fitting. Complex numeric
> operations
> >>> make
> >>> >>>>>>> the training computation intensive. In practice, training large
> >>> deep
> >>> >>>>>>> learning models takes weeks or months on a single node (even
> with
> >>> >>>>>>> GPU).
> >>> >>>>>>>
> >>> >>>>>>> Rational
> >>> >>>>>>>
> >>> >>>>>>> Deep learning has gained a lot of attraction in both academia
> and
> >>> >>>>>>> industry due to its success in a wide range of areas such as
> >>> computer
> >>> >>>>>>> vision and speech recognition. However, training of such
> models is
> >>> >>>>>>> computationally expensive, especially for large and deep models
> >>> >>>>>>> (e.g.,
> >>> >>>>>>> with billions of parameters and more than 10 layers). Both
> Google
> >>> and
> >>> >>>>>>> Microsoft have developed distributed deep learning systems to
> make
> >>> >>>>>>> the
> >>> >>>>>>> training more efficient by distributing the computations
> within a
> >>> >>>>>>> cluster of nodes. However, these systems are closed source
> >>> softwares.
> >>> >>>>>>> Our goal is to leverage the community of open source
> developers to
> >>> >>>>>>> make SINGA efficient, scalable and easy to use. SINGA is a full
> >>> >>>>>>> fledged distributed platform, that could benefit the community
> and
> >>> >>>>>>> also benefit from the community in their involvement in
> >>> contributing
> >>> >>>>>>> to the further work in this area. We believe the nature of
> SINGA
> >>> and
> >>> >>>>>>> our visions for the system fit naturally to Apache's
> philosophy and
> >>> >>>>>>> development framework.
> >>> >>>>>>>
> >>> >>>>>>> Initial Goals
> >>> >>>>>>>
> >>> >>>>>>> We have developed a system for SINGA running on a commodity
> >>> computer
> >>> >>>>>>> cluster. The initial goals include, * improving the system in
> terms
> >>> >>>>>>> of
> >>> >>>>>>> scalability and efficiency, e.g., using Infiniband for network
> >>> >>>>>>> communication and multi-threading for one node computation. We
> >>> would
> >>> >>>>>>> consider extending SINGA to GPU clusters later. * benchmarking
> with
> >>> >>>>>>> larger datasets (hundreds of millions of training instances)
> and
> >>> >>>>>>> models (billions of parameters). * adding more built-in deep
> >>> learning
> >>> >>>>>>> models. Users can train the built-in models on their datasets
> >>> >>>>>>> directly.
> >>> >>>>>>>
> >>> >>>>>>> Current Status
> >>> >>>>>>>
> >>> >>>>>>> Meritocracy
> >>> >>>>>>>
> >>> >>>>>>> We would like to follow ASF meritocratic principles to
> encourage
> >>> more
> >>> >>>>>>> developers to contribute in this project. We know that only
> active
> >>> >>>>>>> and
> >>> >>>>>>> excellent developers can make SINGA a successful project. The
> >>> >>>>>>> committer list and PMC will be updated based on developers'
> >>> >>>>>>> performance and commitment. We are also improving the
> documentation
> >>> >>>>>>> and code to help new developers get started quickly.
> >>> >>>>>>>
> >>> >>>>>>> Community
> >>> >>>>>>>
> >>> >>>>>>> SINGA is currently being developed in the Database System
> Research
> >>> >>>>>>> Lab
> >>> >>>>>>> at the National University of Singapore (NUS) in collaboration
> with
> >>> >>>>>>> Zhejiang University in China. Our lab has extensive experience
> in
> >>> >>>>>>> building database related systems, including distributed
> systems.
> >>> Six
> >>> >>>>>>> PhD students and research assistants (Jinyang Gao, Kaiping
> Zheng,
> >>> >>>>>>> Sheng Wang, Wei Wang, Zhaojing Luo and Zhongle Xie) , a
> research
> >>> >>>>>>> fellow (Anh Dinh) and three professors (Beng Chin Ooi, Gang
> Chen,
> >>> >>>>>>> Kian
> >>> >>>>>>> Lee Tan) have been working for a year on this project. We are
> open
> >>> to
> >>> >>>>>>> recruiting more developers from diverse backgrounds.
> >>> >>>>>>>
> >>> >>>>>>> Core Developers
> >>> >>>>>>>
> >>> >>>>>>> Beng Chin Ooi, Gang Chen and Kian Lee Tan are professors who
> have
> >>> >>>>>>> worked on distributed systems for more than 20 years. They have
> >>> >>>>>>> collaborated with the industry and have built various large
> scale
> >>> >>>>>>> systems. Anh Dinh's research is also on distributed systems,
> albeit
> >>> >>>>>>> with more focus on security aspects. Wei Wang's research is on
> deep
> >>> >>>>>>> learning problems including deep learning applications and
> large
> >>> >>>>>>> scale
> >>> >>>>>>> training. Sheng Wang and Jinyang are working on efficient
> indexing,
> >>> >>>>>>> querying of large scale data and machine learning. Kaiping,
> >>> Zhaojing
> >>> >>>>>>> and Zhongle are new PhD students who jointed SINGA recently.
> They
> >>> >>>>>>> will
> >>> >>>>>>> work on this project for a longer time (next 4-5 years). While
> we
> >>> >>>>>>> share common research interests, each member also brings
> diverse
> >>> >>>>>>> expertise to the team.
> >>> >>>>>>>
> >>> >>>>>>> Alignment
> >>> >>>>>>>
> >>> >>>>>>> ASF is already the home of many distributed platforms, e.g.,
> >>> Hadoop,
> >>> >>>>>>> Spark and Mahout, each of which targets a different application
> >>> >>>>>>> domain. SINGA, being a distributed platform for large-scale
> deep
> >>> >>>>>>> learning, focuses on another important domain for which there
> still
> >>> >>>>>>> lacks a robust and scalable open-source platform. The recent
> >>> success
> >>> >>>>>>> of deep learning models especially for vision and speech
> >>> recognition
> >>> >>>>>>> tasks has generated interests in both applying existing deep
> >>> learning
> >>> >>>>>>> models and in developing new ones. Thus, an open-source
> platform
> >>> for
> >>> >>>>>>> deep learning will be able to attract a large community of
> users
> >>> and
> >>> >>>>>>> developers. SINGA is a complex system needing many iterations
> of
> >>> >>>>>>> design, implementation and testing. Apache's collaboration
> >>> framework
> >>> >>>>>>> which encourages active contribution from developers will
> >>> inevitably
> >>> >>>>>>> help improve the quality of the system, as shown in the
> success of
> >>> >>>>>>> Hadoop, Spark, etc.. Equally important is the community of
> users
> >>> >>>>>>> which
> >>> >>>>>>> helps identify real-life applications of deep learning, and
> helps
> >>> to
> >>> >>>>>>> evaluate the system's performance and ease-of-use. We hope to
> >>> >>>>>>> leverage
> >>> >>>>>>> ASF for coordinating and promoting both communities, and in
> return
> >>> >>>>>>> benefit the communities with another useful tool.
> >>> >>>>>>>
> >>> >>>>>>> Known Risks
> >>> >>>>>>>
> >>> >>>>>>> Orphaned products
> >>> >>>>>>>
> >>> >>>>>>> Four core developers (Anh, Wei Wang, Jinyang and Sheng Wang)
> may
> >>> >>>>>>> leave
> >>> >>>>>>> the lab in two to four years time. It is possible that some of
> them
> >>> >>>>>>> may not have enough time to focus on this project after that.
> But,
> >>> >>>>>>> SINGA is part of our other bigger research projects on
> building an
> >>> >>>>>>> infrastructure for data intensive applications, which include
> >>> >>>>>>> health-care analytics and brain-inspired computing. Beng Chin
> and
> >>> >>>>>>> Kian
> >>> >>>>>>> Lee would continue working on it and getting more people
> involved.
> >>> >>>>>>> For
> >>> >>>>>>> example, three new developers (Kaiping, Zhaojing and Zhongle)
> >>> joined
> >>> >>>>>>> us recently. Individual developers are welcome to make SINGA a
> >>> >>>>>>> diverse
> >>> >>>>>>> community that is robust and independent from any single
> developer.
> >>> >>>>>>>
> >>> >>>>>>> Inexperience with Open Source
> >>> >>>>>>>
> >>> >>>>>>> All the developers are active users and followers of open
> source
> >>> >>>>>>> projects. Our research lab has a strong commitment to open
> source,
> >>> >>>>>>> and
> >>> >>>>>>> has released the source code of several systems under open
> source
> >>> >>>>>>> license as a way of contributing back to the open source
> community.
> >>> >>>>>>> But we do not have much real experience in open source projects
> >>> with
> >>> >>>>>>> large and well organized communities like those in Apache.
> This is
> >>> >>>>>>> one
> >>> >>>>>>> reason we choose Apache which is experienced in open source
> project
> >>> >>>>>>> incubation. We hope to get the help from Apache (e.g.,
> champion and
> >>> >>>>>>> mentors) to establish a healthy path for SINGA.
> >>> >>>>>>>
> >>> >>>>>>> Homogenous Developers
> >>> >>>>>>>
> >>> >>>>>>> Although the current developers are researchers in the
> >>> universities,
> >>> >>>>>>> they have different research interests and project
> experiences, as
> >>> >>>>>>> mentioned in the section that introduces the core developers.
> We
> >>> know
> >>> >>>>>>> that a diverse community is helpful. Hence we are open to the
> idea
> >>> of
> >>> >>>>>>> recruiting developers from other regions and organizations.
> >>> >>>>>>>
> >>> >>>>>>> Reliance on Salaried Developers
> >>> >>>>>>>
> >>> >>>>>>> As a research project in the university, SINGA's current
> developing
> >>> >>>>>>> community consists of professors, PhD students, research
> assistants
> >>> >>>>>>> and postdoctoral fellows. They are driven by their interests to
> >>> work
> >>> >>>>>>> on this project and have contributed actively since the start
> of
> >>> the
> >>> >>>>>>> project. The research assistants and fellows are expected to
> leave
> >>> >>>>>>> when their contracts expire. However, they are keen to
> continue to
> >>> >>>>>>> work on the project voluntarily. Moreover, as a long term
> research
> >>> >>>>>>> project, new research assistants and fellows are likely to
> join the
> >>> >>>>>>> project.
> >>> >>>>>>>
> >>> >>>>>>> A Excessive Fascination with the Apache Brand
> >>> >>>>>>>
> >>> >>>>>>> We choose Apache not for publicity. We have two purposes.
> First, we
> >>> >>>>>>> want to leverage Apache's reputation to recruit more
> developers to
> >>> >>>>>>> make a diverse community. Second, we hope that Apache can help
> us
> >>> to
> >>> >>>>>>> establish a healthy path in developing SINGA. Beng Chin and
> >>> Kian-Lee
> >>> >>>>>>> are established database and distributed system researchers,
> and
> >>> >>>>>>> together with the other contributors, they sincerely believe
> that
> >>> >>>>>>> there is a need for a widely accepted open source distributed
> deep
> >>> >>>>>>> learning platform. The field of deep learning is still at its
> >>> >>>>>>> infancy,
> >>> >>>>>>> and an open source platform will fuel the research in the area.
> >>> >>>>>>> Moreover, such a platform will enable researchers to develop
> new
> >>> >>>>>>> models and algorithms, rather than spending time implementing a
> >>> deep
> >>> >>>>>>> learning system from scratch. Furthermore, the need for
> scalability
> >>> >>>>>>> for such a platform is obvious.
> >>> >>>>>>>
> >>> >>>>>>> Relationship with Other Apache Products
> >>> >>>>>>>
> >>> >>>>>>> Apache H2O implemented two simple deep learning models, namely
> the
> >>> >>>>>>> Multi-Layer Perceptron and Deep Auto-encoders. There are two
> >>> >>>>>>> significant differences between H2O and SINGA. First, H2O
> adopts
> >>> the
> >>> >>>>>>> Map-Reduce framework which runs a set of computing nodes in
> >>> parallel
> >>> >>>>>>> againsts of the training set. Model parameters trained by all
> >>> >>>>>>> computing nodes are averaged as the final model parameters.
> This
> >>> >>>>>>> training algorithm is different from the distributed training
> >>> >>>>>>> algorithm used by DistBelief, Adam and SINGA, which frequently
> >>> >>>>>>> synchronizes the parameters trained from different nodes. SINGA
> >>> >>>>>>> adopts
> >>> >>>>>>> the parameter server framework to support a wide range of
> >>> distributed
> >>> >>>>>>> training algorithms and parallelization methods (e.g., data
> >>> >>>>>>> parallelism, model parallelism and hybrid parallelism. H2O only
> >>> >>>>>>> support data parallelism) . Second, in H2O, users are
> restricted to
> >>> >>>>>>> use the two built-in models. In SINGA, we provide simple
> >>> programming
> >>> >>>>>>> model to let users implement their own deep learning models. A
> new
> >>> >>>>>>> deep learning model can be implemented by customizing the base
> >>> Layer
> >>> >>>>>>> class for each layer involved in the model. It is similar to
> >>> writing
> >>> >>>>>>> Hadoop programs where users only need to override the base
> Mapper
> >>> and
> >>> >>>>>>> Reducer. We also provide built-in models for users to use
> directly.
> >>> >>>>>>>
> >>> >>>>>>> Documentation
> >>> >>>>>>>
> >>> >>>>>>> The project is hosted at
> >>> >>>>>>> http://www.comp.nus.edu.sg/~dbsystem/project/singa.html.
> >>> >>>>>>> Documentations can be found at the Github Wiki Page:
> >>> >>>>>>> https://github.com/nusinga/singa/wiki. We continue to refine
> and
> >>> >>>>>>> improve the documentation.
> >>> >>>>>>>
> >>> >>>>>>> Initial Source
> >>> >>>>>>>
> >>> >>>>>>> We use Github to maintain our source code,
> >>> >>> https://github.com/nusinga/singa
> >>> >>>>>>>
> >>> >>>>>>> Source and Intellectual Property Submission Plan
> >>> >>>>>>>
> >>> >>>>>>> We plan to make our code base be under Apache License, Version
> 2.0.
> >>> >>>>>>>
> >>> >>>>>>> External Dependencies
> >>> >>>>>>>
> >>> >>>>>>> required by the core code base: glog, gflags, google protobuf,
> >>> >>>>>>> open-blas, mpich, armci-mpi.
> >>> >>>>>>> required by data preparation and preprocessing: opencv, hdfs,
> >>> python.
> >>> >>>>>>>
> >>> >>>>>>> Cryptography
> >>> >>>>>>>
> >>> >>>>>>> Not Applicable
> >>> >>>>>>>
> >>> >>>>>>> Required Resources
> >>> >>>>>>>
> >>> >>>>>>> Mailing Lists
> >>> >>>>>>>
> >>> >>>>>>> Currently, we use google group for internal discussion. The
> mailing
> >>> >>>>>>> address is nusinga@googlegroup.com <javascript:;>
> <javascript:;>. We will
> >>> migrate the content to
> >>> >>>>>>> the
> >>> >>>>>>> apache mailing lists in the future.
> >>> >>>>>>>
> >>> >>>>>>> singa-dev
> >>> >>>>>>> singa-user
> >>> >>>>>>> singa-commits
> >>> >>>>>>> singa-private (for private discussion within PCM)
> >>> >>>>>>>
> >>> >>>>>>> Git Repository
> >>> >>>>>>>
> >>> >>>>>>> We want to continue using git for version control. Hence, a git
> >>> repo
> >>> >>>>>>> is required.
> >>> >>>>>>>
> >>> >>>>>>> Issue Tracking
> >>> >>>>>>>
> >>> >>>>>>> JIRA Singa (SINGA)
> >>> >>>>>>>
> >>> >>>>>>> Initial Committers
> >>> >>>>>>>
> >>> >>>>>>> Beng Chin Ooi (ooibc @comp.nus.edu.sg)
> >>> >>>>>>> Kian Lee Tan (tankl @comp.nus.edu.sg)
> >>> >>>>>>> Gang Chen (cg @zju.edu.cn)
> >>> >>>>>>> Wei Wang (wangwei @comp.nus.edu.sg)
> >>> >>>>>>> Dinh Tien Tuan Anh (dinhtta @comp.nus.edu.sg)
> >>> >>>>>>> Jinyang Gao (jinyang.gao @comp.nus.edu.sg)
> >>> >>>>>>> Sheng Wang (wangsh @comp.nus.edu.sg)
> >>> >>>>>>> Kaiping Zheng (kaiping @comp.nus.edu.sg)
> >>> >>>>>>> Zhaojing Luo (zhaojing @comp.nus.edu.sg)
> >>> >>>>>>> Zhongle Xie (zhongle @comp.nus.edu.sg)
> >>> >>>>>>>
> >>> >>>>>>> Affiliations
> >>> >>>>>>>
> >>> >>>>>>> Beng Chin Ooi, National University of Singapore
> >>> >>>>>>> Kian Lee Tan, National University of Singapore
> >>> >>>>>>> Gang Chen, Zhejiang University
> >>> >>>>>>> Wei Wang, National University of Singapore
> >>> >>>>>>> Dinh Tien Tuan Anh, National University of Singapore
> >>> >>>>>>> Jinyang Gao, National University of Singapore
> >>> >>>>>>> Sheng Wang, National University of Singapore
> >>> >>>>>>> Kaiping Zheng, National University of Singapore
> >>> >>>>>>> Zhaojing Luo, National University of Singapore
> >>> >>>>>>> Zhongle Xie, National University of Singapore
> >>> >>>>>>>
> >>> >>>>>>> Sponsors
> >>> >>>>>>>
> >>> >>>>>>> Champion
> >>> >>>>>>>
> >>> >>>>>>> Thejas Nair (thejas at apache.org) - Hortonworks
> >>> >>>>>>>
> >>> >>>>>>> Nominated Mentors
> >>> >>>>>>>
> >>> >>>>>>> Thejas Nair (thejas at apache.org) - Hortonworks
> >>> >>>>>>> Alan Gates (gates at apache dot org) - Hortonworks
> >>> >>>>>>> (Seeking more volunteers!)
> >>> >>>>>>>
> >>> >>>>>>> Sponsoring Entity
> >>> >>>>>>>
> >>> >>>>>>> We are requesting the Incubator to sponsor this project.
> >>> >>>>>>>
> >>> >>>>>>>
> >>> ---------------------------------------------------------------------
> >>> >>>>>>> To unsubscribe, e-mail:
> general-unsubscribe@incubator.apache.org <javascript:;>
> >>> <javascript:;>
> >>> >>>>>>> For additional commands, e-mail:
> general-help@incubator.apache.org <javascript:;>
> >>> <javascript:;>
> >>> >>>>>>>
> >>> >>>>>>
> >>> >>>>>>
> >>> ---------------------------------------------------------------------
> >>> >>>>>> To unsubscribe, e-mail:
> general-unsubscribe@incubator.apache.org <javascript:;>
> >>> <javascript:;>
> >>> >>>>>> For additional commands, e-mail:
> general-help@incubator.apache.org <javascript:;>
> >>> <javascript:;>
> >>> >>>>>>
> >>> >>>>
> >>> >>>>
> ---------------------------------------------------------------------
> >>> >>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> <javascript:;>
> >>> <javascript:;>
> >>> >>>> For additional commands, e-mail:
> general-help@incubator.apache.org <javascript:;>
> >>> <javascript:;>
> >>> >>>>
> >>> >>>
> >>> >>>
> >>> >>>
> >>> >>
> >>> >
> >>> > ---------------------------------------------------------------------
> >>> > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> <javascript:;>
> >>> <javascript:;>
> >>> > For additional commands, e-mail: general-help@incubator.apache.org
> <javascript:;>
> >>> <javascript:;>
> >>> >
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> <javascript:;>
> >>> <javascript:;>
> >>> For additional commands, e-mail: general-help@incubator.apache.org
> <javascript:;>
> >>> <javascript:;>
> >>>
> >>>
> >>
> >> --
> >> Sent from My iPad, sorry for any misspellings.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> <javascript:;>
> > For additional commands, e-mail: general-help@incubator.apache.org
> <javascript:;>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> <javascript:;>
> For additional commands, e-mail: general-help@incubator.apache.org
> <javascript:;>
>
>

Re: [DISCUSS] [PROPOSAL] Singa for Apache Incubator

Posted by Thejas Nair <th...@gmail.com>.
Thanks for your inputs Henry.
I did send personal emails to two folks (outside of Hortonworks) who
seemed to be interested in the project, but that didn't help.  I have
also been soliciting more mentors in this thread as well. I will try
reaching out to folks who are in the intersection of incubator and
mahout (or spark-ml) to see if they might be interested (hopefully
people working on related projects are more likely to join in).
Any other suggestions for soliciting more diverse set of mentors are
also welcome.

Regarding the diversity of initial set of committers, growing that
should be easier once the project is an apache incubator project.  I
see a strong desire to grow the community in the people who are
currently working on the project.



On Thu, Feb 26, 2015 at 11:42 PM, Henry Saputra <he...@gmail.com> wrote:
> I was not actually talking about requirement, but for the sake of
> podling itself.
>
> If all initial mentors coming from same company, the risk of all of
> them absent are greater because all will be subjected to same schedule
> and priorities from their daytime employers. Especially for release
> VOTEs. Three initial mentors wont be enough for this project, I think.
>
> Not too worries about initial committers coming from same org, but I
> have seen that podling that does not have initial community will
> struggle to thrive.
>
> Just 2-cents from my experience in incubator.
>
> - Henry
>
> On Thu, Feb 26, 2015 at 11:37 PM, jan i <ja...@apache.org> wrote:
>> On Friday, February 27, 2015, Henry Saputra <he...@gmail.com> wrote:
>>
>>> I am strongly suggest you solicit more (diverse) mentors before start the
>>> VOTE.
>>>
>>> All initial committers are from same org and all initial mentors are
>>> from same company (HW).
>>
>> We do have a requirement for diversity, for me all initial committers from
>> the same company is just as big a problem as mentors. when everyone
>> involved are from the same company then that signals a serious problem
>> which should be addressed before starting a vote.
>>
>> rgds
>> jan i
>>
>>>
>>> I am not sure this is a good start for Apache podling.
>>>
>>>
>>> - Henry
>>>
>>> On Thu, Feb 26, 2015 at 9:12 AM, Thejas Nair <thejas.nair@gmail.com
>>> <javascript:;>> wrote:
>>> > The incubator proposal has been updated with the feedback so far.
>>> > We have 3 mentors now, but I think it would be good to have additional
>>> > mentors. Please let me know if anyone is able to help mentor this
>>> > project.
>>> >
>>> > I am planning to start a vote on the proposal in a day or two.
>>> >
>>> >
>>> > On Fri, Feb 6, 2015 at 5:21 PM,  <ooibc@comp.nus.edu.sg <javascript:;>>
>>> wrote:
>>> >>
>>> >> Regarding the number of users using this project -- at this moment, the
>>> >> community is not big.  A few local start-ups have been trying to use it
>>> >> (mainly due to announcement in our seminar list), eg. one is using it
>>> for
>>> >> image recognition (given a phone snapped by a user, it wants to be
>>> return
>>> >> the same the product, and a list of similar products, such as a luxury
>>> bag
>>> >> on a passerby).  Researchers from outside of NUS may have been using it
>>> >> since we published an application paper on cross domain/modal retrieval
>>> in
>>> >> VLDB 2014.
>>> >>
>>> >> We have not announced the project to the outside community yet -- we
>>> would
>>> >> announce it in dbworld etc in due course.
>>> >>
>>> >> Thanks and have a good weekend.
>>> >>
>>> >> regards
>>> >> beng chin
>>> >>
>>> >>>
>>> >>> Thanks for the comments and suggestions.
>>> >>> With permission from Thejas, I would like to respond to point 2.
>>> >>>
>>> >>> We have a huge team down at NUS (National University of Singapore) --
>>> >>> we have about seven database/data mining data professors (not including
>>> >>> those in systems, networking, and machine learning).
>>> >>> I myself have nine PhD students in a steady state, and I have a few
>>> large
>>> >>> grants, with a total budget of about 15 million S$ (~12 million USD),
>>> that
>>> >>> allows me to hire a number of research fellows and research assistants
>>> for
>>> >>> the next few years.  In a constant state, I have about 20 people (PhD
>>> >>> students/RA/RF) working with me alone.  Other professors have their own
>>> >>> grants (unlike other countries, it is relatively easy to get large
>>> grants
>>> >>> in Singapore; many overseas Universities, including UIUC, MIT, ETH etc
>>> >>> have research labs funded by Singapore Research Foundation [equivalent
>>> of
>>> >>> NSF]).
>>> >>>
>>> >>> SINGA is a long term project for us -- while it is a platform as it
>>> is, we
>>> >>> are using it for healthcare predictive analytics (by working with a
>>> >>> hospital associated with the University).  Therefore, we will be
>>> working
>>> >>> on SINGA, not solely as a distributed DL platform, but as a tool that
>>> will
>>> >>> enable us to do data analytics on some business domains (eg.
>>> healthcase,
>>> >>> consumer etc)
>>> >>>
>>> >>> For the initial set of committers, three are tenured professors, five
>>> are
>>> >>> students, with 2-5 years to go before they complete their PhD.  Quite
>>> >>> often, some would stay back as a research fellow for a couple of years
>>> >>> before they start looking for a job outside.  We will work with mentors
>>> >>> and new developers (from outside of NUS or Zhejiang University) in
>>> >>> enhancing the system.
>>> >>>
>>> >>> The project should survive in that sense.
>>> >>>
>>> >>> (I have an on-going project CIIDAA that has been around since 2008; it
>>> was
>>> >>> started as another project, epiC,  with a different grant, and then we
>>> >>> continue the development with a new grant for CIIDAA --
>>> >>> http://www.comp.nus.edu.sg/~ciidaa/
>>> >>> )
>>> >>>
>>> >>> Thanks.
>>> >>>
>>> >>> regards
>>> >>> beng chin
>>> >>> ps: i am not sure if my email will get through to the group.
>>> >>>
>>> >>>
>>> >>> ---------------------------- Original Message
>>> ----------------------------
>>> >>> Subject: Re: [DISCUSS] [PROPOSAL] Singa for Apache Incubator
>>> >>> From:    "Henry Saputra" <henry.saputra@gmail.com <javascript:;>>
>>> >>> Date:    Thu, February 5, 2015 2:57 pm
>>> >>> To:      "general@incubator.apache.org <javascript:;>" <
>>> general@incubator.apache.org <javascript:;>>
>>> >>> Cc:      ooibc@comp.nus.edu.sg <javascript:;>
>>> >>>
>>> --------------------------------------------------------------------------
>>> >>>
>>> >>> Several comments:
>>> >>> -) How many users already using this project? I would reccomend to
>>> >>> drop request for singa-user list at the beginning.
>>> >>> -) All the initial committers come from university and seemed like
>>> >>> some of them already ready to leave university. I am not too sure if
>>> >>> this project go survive if all of the inital committers are from
>>> >>> university as students.
>>> >>> -) Need to solicit more mentors if this project ever get to Apache
>>> >>> incubator.
>>> >>>
>>> >>> - Henry
>>> >>>
>>> >>> On Tue, Feb 3, 2015 at 3:58 PM, Thejas Nair <thejas.nair@gmail.com
>>> <javascript:;>> wrote:
>>> >>>> The "Relationship with Other Apache Products" section has been
>>> >>>> updated. The reference to H2O in that section has been removed, and
>>> >>>> other projects have been added.
>>> >>>>  Thanks for the feedback!
>>> >>>>
>>> >>>>
>>> >>>> On Wed, Jan 28, 2015 at 10:27 AM, Thejas Nair <thejas.nair@gmail.com
>>> <javascript:;>>
>>> >>> wrote:
>>> >>>>> Thanks for pointing that out Henry! Yes, looks like H20 is not an
>>> >>>>> apache project, I should have verified that.
>>> >>>>> I will edit that, and revisit that section along with the folks in
>>> >>>>> Singa community.
>>> >>>>>
>>> >>>>>
>>> >>>>> On Tue, Jan 27, 2015 at 6:55 PM, Henry Saputra
>>> >>> <henry.saputra@gmail.com <javascript:;>> wrote:
>>> >>>>>> Quick immediate comment that "Apache H2O" is not really Apache
>>> >>>>>> project.
>>> >>>>>>
>>> >>>>>> I assume you are referring to https://github.com/h2oai/h2o (or
>>> >>>>>> https://github.com/h2oai/h2o-dev) ?
>>> >>>>>>
>>> >>>>>> - Henry
>>> >>>>>>
>>> >>>>>> On Tue, Jan 27, 2015 at 5:29 PM, Thejas Nair <thejas.nair@gmail.com
>>> <javascript:;>>
>>> >>> wrote:
>>> >>>>>>> Hello everyone,
>>> >>>>>>>
>>> >>>>>>> I would like to propose the inclusion of Singa as an Apache
>>> Incubator
>>> >>> project.
>>> >>>>>>>
>>> >>>>>>> Here is the proposal -
>>> >>>>>>> https://wiki.apache.org/incubator/SingaProposal
>>> >>>>>>>
>>> >>>>>>> Please review the proposal and give feedback. I am planning to
>>> start
>>> >>>>>>> a
>>> >>>>>>> vote after 7 days if the proposal looks good.
>>> >>>>>>> We are also seeking additional Apache mentors for the project.
>>> >>>>>>>
>>> >>>>>>> Thanks,
>>> >>>>>>> Thejas
>>> >>>>>>> ==========================================================
>>> >>>>>>> Singa Incubator Proposal
>>> >>>>>>>
>>> >>>>>>> Abstract
>>> >>>>>>>
>>> >>>>>>> SINGA is a distributed deep learning platform.
>>> >>>>>>>
>>> >>>>>>> Proposal
>>> >>>>>>>
>>> >>>>>>> SINGA is an efficient, scalable and easy-to-use distributed
>>> platform
>>> >>>>>>> for training deep learning models, e.g., Deep Convolutional Neural
>>> >>>>>>> Network and Deep Belief Network. It parallelizes the computation
>>> >>>>>>> (i.e., training) onto a cluster of nodes by distributing the
>>> training
>>> >>>>>>> data and model automatically to speed up the training. Built-in
>>> >>>>>>> training algorithms like Back-Propagation and Contrastive
>>> Divergence
>>> >>>>>>> are implemented based on common abstractions of deep learning
>>> models.
>>> >>>>>>> Users can train their own deep learning models by simply
>>> customizing
>>> >>>>>>> these abstractions like implementing the Mapper and Reducer in
>>> >>>>>>> Hadoop.
>>> >>>>>>>
>>> >>>>>>> Background
>>> >>>>>>>
>>> >>>>>>> Deep learning refers to a set of feature (or representation)
>>> learning
>>> >>>>>>> models that consist of multiple (non-linear) layers, where
>>> different
>>> >>>>>>> layers learn different levels of abstractions (representations) of
>>> >>>>>>> the
>>> >>>>>>> raw input data. Larger (in terms of model parameters) and deeper
>>> (in
>>> >>>>>>> terms of number of layers) models have shown better performance,
>>> >>>>>>> e.g.,
>>> >>>>>>> lower image classification error in Large Scale Visual Recognition
>>> >>>>>>> Challenge. However, a larger model requires more memory and larger
>>> >>>>>>> training data to reduce over-fitting. Complex numeric operations
>>> make
>>> >>>>>>> the training computation intensive. In practice, training large
>>> deep
>>> >>>>>>> learning models takes weeks or months on a single node (even with
>>> >>>>>>> GPU).
>>> >>>>>>>
>>> >>>>>>> Rational
>>> >>>>>>>
>>> >>>>>>> Deep learning has gained a lot of attraction in both academia and
>>> >>>>>>> industry due to its success in a wide range of areas such as
>>> computer
>>> >>>>>>> vision and speech recognition. However, training of such models is
>>> >>>>>>> computationally expensive, especially for large and deep models
>>> >>>>>>> (e.g.,
>>> >>>>>>> with billions of parameters and more than 10 layers). Both Google
>>> and
>>> >>>>>>> Microsoft have developed distributed deep learning systems to make
>>> >>>>>>> the
>>> >>>>>>> training more efficient by distributing the computations within a
>>> >>>>>>> cluster of nodes. However, these systems are closed source
>>> softwares.
>>> >>>>>>> Our goal is to leverage the community of open source developers to
>>> >>>>>>> make SINGA efficient, scalable and easy to use. SINGA is a full
>>> >>>>>>> fledged distributed platform, that could benefit the community and
>>> >>>>>>> also benefit from the community in their involvement in
>>> contributing
>>> >>>>>>> to the further work in this area. We believe the nature of SINGA
>>> and
>>> >>>>>>> our visions for the system fit naturally to Apache's philosophy and
>>> >>>>>>> development framework.
>>> >>>>>>>
>>> >>>>>>> Initial Goals
>>> >>>>>>>
>>> >>>>>>> We have developed a system for SINGA running on a commodity
>>> computer
>>> >>>>>>> cluster. The initial goals include, * improving the system in terms
>>> >>>>>>> of
>>> >>>>>>> scalability and efficiency, e.g., using Infiniband for network
>>> >>>>>>> communication and multi-threading for one node computation. We
>>> would
>>> >>>>>>> consider extending SINGA to GPU clusters later. * benchmarking with
>>> >>>>>>> larger datasets (hundreds of millions of training instances) and
>>> >>>>>>> models (billions of parameters). * adding more built-in deep
>>> learning
>>> >>>>>>> models. Users can train the built-in models on their datasets
>>> >>>>>>> directly.
>>> >>>>>>>
>>> >>>>>>> Current Status
>>> >>>>>>>
>>> >>>>>>> Meritocracy
>>> >>>>>>>
>>> >>>>>>> We would like to follow ASF meritocratic principles to encourage
>>> more
>>> >>>>>>> developers to contribute in this project. We know that only active
>>> >>>>>>> and
>>> >>>>>>> excellent developers can make SINGA a successful project. The
>>> >>>>>>> committer list and PMC will be updated based on developers'
>>> >>>>>>> performance and commitment. We are also improving the documentation
>>> >>>>>>> and code to help new developers get started quickly.
>>> >>>>>>>
>>> >>>>>>> Community
>>> >>>>>>>
>>> >>>>>>> SINGA is currently being developed in the Database System Research
>>> >>>>>>> Lab
>>> >>>>>>> at the National University of Singapore (NUS) in collaboration with
>>> >>>>>>> Zhejiang University in China. Our lab has extensive experience in
>>> >>>>>>> building database related systems, including distributed systems.
>>> Six
>>> >>>>>>> PhD students and research assistants (Jinyang Gao, Kaiping Zheng,
>>> >>>>>>> Sheng Wang, Wei Wang, Zhaojing Luo and Zhongle Xie) , a research
>>> >>>>>>> fellow (Anh Dinh) and three professors (Beng Chin Ooi, Gang Chen,
>>> >>>>>>> Kian
>>> >>>>>>> Lee Tan) have been working for a year on this project. We are open
>>> to
>>> >>>>>>> recruiting more developers from diverse backgrounds.
>>> >>>>>>>
>>> >>>>>>> Core Developers
>>> >>>>>>>
>>> >>>>>>> Beng Chin Ooi, Gang Chen and Kian Lee Tan are professors who have
>>> >>>>>>> worked on distributed systems for more than 20 years. They have
>>> >>>>>>> collaborated with the industry and have built various large scale
>>> >>>>>>> systems. Anh Dinh's research is also on distributed systems, albeit
>>> >>>>>>> with more focus on security aspects. Wei Wang's research is on deep
>>> >>>>>>> learning problems including deep learning applications and large
>>> >>>>>>> scale
>>> >>>>>>> training. Sheng Wang and Jinyang are working on efficient indexing,
>>> >>>>>>> querying of large scale data and machine learning. Kaiping,
>>> Zhaojing
>>> >>>>>>> and Zhongle are new PhD students who jointed SINGA recently. They
>>> >>>>>>> will
>>> >>>>>>> work on this project for a longer time (next 4-5 years). While we
>>> >>>>>>> share common research interests, each member also brings diverse
>>> >>>>>>> expertise to the team.
>>> >>>>>>>
>>> >>>>>>> Alignment
>>> >>>>>>>
>>> >>>>>>> ASF is already the home of many distributed platforms, e.g.,
>>> Hadoop,
>>> >>>>>>> Spark and Mahout, each of which targets a different application
>>> >>>>>>> domain. SINGA, being a distributed platform for large-scale deep
>>> >>>>>>> learning, focuses on another important domain for which there still
>>> >>>>>>> lacks a robust and scalable open-source platform. The recent
>>> success
>>> >>>>>>> of deep learning models especially for vision and speech
>>> recognition
>>> >>>>>>> tasks has generated interests in both applying existing deep
>>> learning
>>> >>>>>>> models and in developing new ones. Thus, an open-source platform
>>> for
>>> >>>>>>> deep learning will be able to attract a large community of users
>>> and
>>> >>>>>>> developers. SINGA is a complex system needing many iterations of
>>> >>>>>>> design, implementation and testing. Apache's collaboration
>>> framework
>>> >>>>>>> which encourages active contribution from developers will
>>> inevitably
>>> >>>>>>> help improve the quality of the system, as shown in the success of
>>> >>>>>>> Hadoop, Spark, etc.. Equally important is the community of users
>>> >>>>>>> which
>>> >>>>>>> helps identify real-life applications of deep learning, and helps
>>> to
>>> >>>>>>> evaluate the system's performance and ease-of-use. We hope to
>>> >>>>>>> leverage
>>> >>>>>>> ASF for coordinating and promoting both communities, and in return
>>> >>>>>>> benefit the communities with another useful tool.
>>> >>>>>>>
>>> >>>>>>> Known Risks
>>> >>>>>>>
>>> >>>>>>> Orphaned products
>>> >>>>>>>
>>> >>>>>>> Four core developers (Anh, Wei Wang, Jinyang and Sheng Wang) may
>>> >>>>>>> leave
>>> >>>>>>> the lab in two to four years time. It is possible that some of them
>>> >>>>>>> may not have enough time to focus on this project after that. But,
>>> >>>>>>> SINGA is part of our other bigger research projects on building an
>>> >>>>>>> infrastructure for data intensive applications, which include
>>> >>>>>>> health-care analytics and brain-inspired computing. Beng Chin and
>>> >>>>>>> Kian
>>> >>>>>>> Lee would continue working on it and getting more people involved.
>>> >>>>>>> For
>>> >>>>>>> example, three new developers (Kaiping, Zhaojing and Zhongle)
>>> joined
>>> >>>>>>> us recently. Individual developers are welcome to make SINGA a
>>> >>>>>>> diverse
>>> >>>>>>> community that is robust and independent from any single developer.
>>> >>>>>>>
>>> >>>>>>> Inexperience with Open Source
>>> >>>>>>>
>>> >>>>>>> All the developers are active users and followers of open source
>>> >>>>>>> projects. Our research lab has a strong commitment to open source,
>>> >>>>>>> and
>>> >>>>>>> has released the source code of several systems under open source
>>> >>>>>>> license as a way of contributing back to the open source community.
>>> >>>>>>> But we do not have much real experience in open source projects
>>> with
>>> >>>>>>> large and well organized communities like those in Apache. This is
>>> >>>>>>> one
>>> >>>>>>> reason we choose Apache which is experienced in open source project
>>> >>>>>>> incubation. We hope to get the help from Apache (e.g., champion and
>>> >>>>>>> mentors) to establish a healthy path for SINGA.
>>> >>>>>>>
>>> >>>>>>> Homogenous Developers
>>> >>>>>>>
>>> >>>>>>> Although the current developers are researchers in the
>>> universities,
>>> >>>>>>> they have different research interests and project experiences, as
>>> >>>>>>> mentioned in the section that introduces the core developers. We
>>> know
>>> >>>>>>> that a diverse community is helpful. Hence we are open to the idea
>>> of
>>> >>>>>>> recruiting developers from other regions and organizations.
>>> >>>>>>>
>>> >>>>>>> Reliance on Salaried Developers
>>> >>>>>>>
>>> >>>>>>> As a research project in the university, SINGA's current developing
>>> >>>>>>> community consists of professors, PhD students, research assistants
>>> >>>>>>> and postdoctoral fellows. They are driven by their interests to
>>> work
>>> >>>>>>> on this project and have contributed actively since the start of
>>> the
>>> >>>>>>> project. The research assistants and fellows are expected to leave
>>> >>>>>>> when their contracts expire. However, they are keen to continue to
>>> >>>>>>> work on the project voluntarily. Moreover, as a long term research
>>> >>>>>>> project, new research assistants and fellows are likely to join the
>>> >>>>>>> project.
>>> >>>>>>>
>>> >>>>>>> A Excessive Fascination with the Apache Brand
>>> >>>>>>>
>>> >>>>>>> We choose Apache not for publicity. We have two purposes. First, we
>>> >>>>>>> want to leverage Apache's reputation to recruit more developers to
>>> >>>>>>> make a diverse community. Second, we hope that Apache can help us
>>> to
>>> >>>>>>> establish a healthy path in developing SINGA. Beng Chin and
>>> Kian-Lee
>>> >>>>>>> are established database and distributed system researchers, and
>>> >>>>>>> together with the other contributors, they sincerely believe that
>>> >>>>>>> there is a need for a widely accepted open source distributed deep
>>> >>>>>>> learning platform. The field of deep learning is still at its
>>> >>>>>>> infancy,
>>> >>>>>>> and an open source platform will fuel the research in the area.
>>> >>>>>>> Moreover, such a platform will enable researchers to develop new
>>> >>>>>>> models and algorithms, rather than spending time implementing a
>>> deep
>>> >>>>>>> learning system from scratch. Furthermore, the need for scalability
>>> >>>>>>> for such a platform is obvious.
>>> >>>>>>>
>>> >>>>>>> Relationship with Other Apache Products
>>> >>>>>>>
>>> >>>>>>> Apache H2O implemented two simple deep learning models, namely the
>>> >>>>>>> Multi-Layer Perceptron and Deep Auto-encoders. There are two
>>> >>>>>>> significant differences between H2O and SINGA. First, H2O adopts
>>> the
>>> >>>>>>> Map-Reduce framework which runs a set of computing nodes in
>>> parallel
>>> >>>>>>> againsts of the training set. Model parameters trained by all
>>> >>>>>>> computing nodes are averaged as the final model parameters. This
>>> >>>>>>> training algorithm is different from the distributed training
>>> >>>>>>> algorithm used by DistBelief, Adam and SINGA, which frequently
>>> >>>>>>> synchronizes the parameters trained from different nodes. SINGA
>>> >>>>>>> adopts
>>> >>>>>>> the parameter server framework to support a wide range of
>>> distributed
>>> >>>>>>> training algorithms and parallelization methods (e.g., data
>>> >>>>>>> parallelism, model parallelism and hybrid parallelism. H2O only
>>> >>>>>>> support data parallelism) . Second, in H2O, users are restricted to
>>> >>>>>>> use the two built-in models. In SINGA, we provide simple
>>> programming
>>> >>>>>>> model to let users implement their own deep learning models. A new
>>> >>>>>>> deep learning model can be implemented by customizing the base
>>> Layer
>>> >>>>>>> class for each layer involved in the model. It is similar to
>>> writing
>>> >>>>>>> Hadoop programs where users only need to override the base Mapper
>>> and
>>> >>>>>>> Reducer. We also provide built-in models for users to use directly.
>>> >>>>>>>
>>> >>>>>>> Documentation
>>> >>>>>>>
>>> >>>>>>> The project is hosted at
>>> >>>>>>> http://www.comp.nus.edu.sg/~dbsystem/project/singa.html.
>>> >>>>>>> Documentations can be found at the Github Wiki Page:
>>> >>>>>>> https://github.com/nusinga/singa/wiki. We continue to refine and
>>> >>>>>>> improve the documentation.
>>> >>>>>>>
>>> >>>>>>> Initial Source
>>> >>>>>>>
>>> >>>>>>> We use Github to maintain our source code,
>>> >>> https://github.com/nusinga/singa
>>> >>>>>>>
>>> >>>>>>> Source and Intellectual Property Submission Plan
>>> >>>>>>>
>>> >>>>>>> We plan to make our code base be under Apache License, Version 2.0.
>>> >>>>>>>
>>> >>>>>>> External Dependencies
>>> >>>>>>>
>>> >>>>>>> required by the core code base: glog, gflags, google protobuf,
>>> >>>>>>> open-blas, mpich, armci-mpi.
>>> >>>>>>> required by data preparation and preprocessing: opencv, hdfs,
>>> python.
>>> >>>>>>>
>>> >>>>>>> Cryptography
>>> >>>>>>>
>>> >>>>>>> Not Applicable
>>> >>>>>>>
>>> >>>>>>> Required Resources
>>> >>>>>>>
>>> >>>>>>> Mailing Lists
>>> >>>>>>>
>>> >>>>>>> Currently, we use google group for internal discussion. The mailing
>>> >>>>>>> address is nusinga@googlegroup.com <javascript:;>. We will
>>> migrate the content to
>>> >>>>>>> the
>>> >>>>>>> apache mailing lists in the future.
>>> >>>>>>>
>>> >>>>>>> singa-dev
>>> >>>>>>> singa-user
>>> >>>>>>> singa-commits
>>> >>>>>>> singa-private (for private discussion within PCM)
>>> >>>>>>>
>>> >>>>>>> Git Repository
>>> >>>>>>>
>>> >>>>>>> We want to continue using git for version control. Hence, a git
>>> repo
>>> >>>>>>> is required.
>>> >>>>>>>
>>> >>>>>>> Issue Tracking
>>> >>>>>>>
>>> >>>>>>> JIRA Singa (SINGA)
>>> >>>>>>>
>>> >>>>>>> Initial Committers
>>> >>>>>>>
>>> >>>>>>> Beng Chin Ooi (ooibc @comp.nus.edu.sg)
>>> >>>>>>> Kian Lee Tan (tankl @comp.nus.edu.sg)
>>> >>>>>>> Gang Chen (cg @zju.edu.cn)
>>> >>>>>>> Wei Wang (wangwei @comp.nus.edu.sg)
>>> >>>>>>> Dinh Tien Tuan Anh (dinhtta @comp.nus.edu.sg)
>>> >>>>>>> Jinyang Gao (jinyang.gao @comp.nus.edu.sg)
>>> >>>>>>> Sheng Wang (wangsh @comp.nus.edu.sg)
>>> >>>>>>> Kaiping Zheng (kaiping @comp.nus.edu.sg)
>>> >>>>>>> Zhaojing Luo (zhaojing @comp.nus.edu.sg)
>>> >>>>>>> Zhongle Xie (zhongle @comp.nus.edu.sg)
>>> >>>>>>>
>>> >>>>>>> Affiliations
>>> >>>>>>>
>>> >>>>>>> Beng Chin Ooi, National University of Singapore
>>> >>>>>>> Kian Lee Tan, National University of Singapore
>>> >>>>>>> Gang Chen, Zhejiang University
>>> >>>>>>> Wei Wang, National University of Singapore
>>> >>>>>>> Dinh Tien Tuan Anh, National University of Singapore
>>> >>>>>>> Jinyang Gao, National University of Singapore
>>> >>>>>>> Sheng Wang, National University of Singapore
>>> >>>>>>> Kaiping Zheng, National University of Singapore
>>> >>>>>>> Zhaojing Luo, National University of Singapore
>>> >>>>>>> Zhongle Xie, National University of Singapore
>>> >>>>>>>
>>> >>>>>>> Sponsors
>>> >>>>>>>
>>> >>>>>>> Champion
>>> >>>>>>>
>>> >>>>>>> Thejas Nair (thejas at apache.org) - Hortonworks
>>> >>>>>>>
>>> >>>>>>> Nominated Mentors
>>> >>>>>>>
>>> >>>>>>> Thejas Nair (thejas at apache.org) - Hortonworks
>>> >>>>>>> Alan Gates (gates at apache dot org) - Hortonworks
>>> >>>>>>> (Seeking more volunteers!)
>>> >>>>>>>
>>> >>>>>>> Sponsoring Entity
>>> >>>>>>>
>>> >>>>>>> We are requesting the Incubator to sponsor this project.
>>> >>>>>>>
>>> >>>>>>>
>>> ---------------------------------------------------------------------
>>> >>>>>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>> <javascript:;>
>>> >>>>>>> For additional commands, e-mail: general-help@incubator.apache.org
>>> <javascript:;>
>>> >>>>>>>
>>> >>>>>>
>>> >>>>>>
>>> ---------------------------------------------------------------------
>>> >>>>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>> <javascript:;>
>>> >>>>>> For additional commands, e-mail: general-help@incubator.apache.org
>>> <javascript:;>
>>> >>>>>>
>>> >>>>
>>> >>>> ---------------------------------------------------------------------
>>> >>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>> <javascript:;>
>>> >>>> For additional commands, e-mail: general-help@incubator.apache.org
>>> <javascript:;>
>>> >>>>
>>> >>>
>>> >>>
>>> >>>
>>> >>
>>> >
>>> > ---------------------------------------------------------------------
>>> > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>> <javascript:;>
>>> > For additional commands, e-mail: general-help@incubator.apache.org
>>> <javascript:;>
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>> <javascript:;>
>>> For additional commands, e-mail: general-help@incubator.apache.org
>>> <javascript:;>
>>>
>>>
>>
>> --
>> Sent from My iPad, sorry for any misspellings.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [DISCUSS] [PROPOSAL] Singa for Apache Incubator

Posted by Henry Saputra <he...@gmail.com>.
I was not actually talking about requirement, but for the sake of
podling itself.

If all initial mentors coming from same company, the risk of all of
them absent are greater because all will be subjected to same schedule
and priorities from their daytime employers. Especially for release
VOTEs. Three initial mentors wont be enough for this project, I think.

Not too worries about initial committers coming from same org, but I
have seen that podling that does not have initial community will
struggle to thrive.

Just 2-cents from my experience in incubator.

- Henry

On Thu, Feb 26, 2015 at 11:37 PM, jan i <ja...@apache.org> wrote:
> On Friday, February 27, 2015, Henry Saputra <he...@gmail.com> wrote:
>
>> I am strongly suggest you solicit more (diverse) mentors before start the
>> VOTE.
>>
>> All initial committers are from same org and all initial mentors are
>> from same company (HW).
>
> We do have a requirement for diversity, for me all initial committers from
> the same company is just as big a problem as mentors. when everyone
> involved are from the same company then that signals a serious problem
> which should be addressed before starting a vote.
>
> rgds
> jan i
>
>>
>> I am not sure this is a good start for Apache podling.
>>
>>
>> - Henry
>>
>> On Thu, Feb 26, 2015 at 9:12 AM, Thejas Nair <thejas.nair@gmail.com
>> <javascript:;>> wrote:
>> > The incubator proposal has been updated with the feedback so far.
>> > We have 3 mentors now, but I think it would be good to have additional
>> > mentors. Please let me know if anyone is able to help mentor this
>> > project.
>> >
>> > I am planning to start a vote on the proposal in a day or two.
>> >
>> >
>> > On Fri, Feb 6, 2015 at 5:21 PM,  <ooibc@comp.nus.edu.sg <javascript:;>>
>> wrote:
>> >>
>> >> Regarding the number of users using this project -- at this moment, the
>> >> community is not big.  A few local start-ups have been trying to use it
>> >> (mainly due to announcement in our seminar list), eg. one is using it
>> for
>> >> image recognition (given a phone snapped by a user, it wants to be
>> return
>> >> the same the product, and a list of similar products, such as a luxury
>> bag
>> >> on a passerby).  Researchers from outside of NUS may have been using it
>> >> since we published an application paper on cross domain/modal retrieval
>> in
>> >> VLDB 2014.
>> >>
>> >> We have not announced the project to the outside community yet -- we
>> would
>> >> announce it in dbworld etc in due course.
>> >>
>> >> Thanks and have a good weekend.
>> >>
>> >> regards
>> >> beng chin
>> >>
>> >>>
>> >>> Thanks for the comments and suggestions.
>> >>> With permission from Thejas, I would like to respond to point 2.
>> >>>
>> >>> We have a huge team down at NUS (National University of Singapore) --
>> >>> we have about seven database/data mining data professors (not including
>> >>> those in systems, networking, and machine learning).
>> >>> I myself have nine PhD students in a steady state, and I have a few
>> large
>> >>> grants, with a total budget of about 15 million S$ (~12 million USD),
>> that
>> >>> allows me to hire a number of research fellows and research assistants
>> for
>> >>> the next few years.  In a constant state, I have about 20 people (PhD
>> >>> students/RA/RF) working with me alone.  Other professors have their own
>> >>> grants (unlike other countries, it is relatively easy to get large
>> grants
>> >>> in Singapore; many overseas Universities, including UIUC, MIT, ETH etc
>> >>> have research labs funded by Singapore Research Foundation [equivalent
>> of
>> >>> NSF]).
>> >>>
>> >>> SINGA is a long term project for us -- while it is a platform as it
>> is, we
>> >>> are using it for healthcare predictive analytics (by working with a
>> >>> hospital associated with the University).  Therefore, we will be
>> working
>> >>> on SINGA, not solely as a distributed DL platform, but as a tool that
>> will
>> >>> enable us to do data analytics on some business domains (eg.
>> healthcase,
>> >>> consumer etc)
>> >>>
>> >>> For the initial set of committers, three are tenured professors, five
>> are
>> >>> students, with 2-5 years to go before they complete their PhD.  Quite
>> >>> often, some would stay back as a research fellow for a couple of years
>> >>> before they start looking for a job outside.  We will work with mentors
>> >>> and new developers (from outside of NUS or Zhejiang University) in
>> >>> enhancing the system.
>> >>>
>> >>> The project should survive in that sense.
>> >>>
>> >>> (I have an on-going project CIIDAA that has been around since 2008; it
>> was
>> >>> started as another project, epiC,  with a different grant, and then we
>> >>> continue the development with a new grant for CIIDAA --
>> >>> http://www.comp.nus.edu.sg/~ciidaa/
>> >>> )
>> >>>
>> >>> Thanks.
>> >>>
>> >>> regards
>> >>> beng chin
>> >>> ps: i am not sure if my email will get through to the group.
>> >>>
>> >>>
>> >>> ---------------------------- Original Message
>> ----------------------------
>> >>> Subject: Re: [DISCUSS] [PROPOSAL] Singa for Apache Incubator
>> >>> From:    "Henry Saputra" <henry.saputra@gmail.com <javascript:;>>
>> >>> Date:    Thu, February 5, 2015 2:57 pm
>> >>> To:      "general@incubator.apache.org <javascript:;>" <
>> general@incubator.apache.org <javascript:;>>
>> >>> Cc:      ooibc@comp.nus.edu.sg <javascript:;>
>> >>>
>> --------------------------------------------------------------------------
>> >>>
>> >>> Several comments:
>> >>> -) How many users already using this project? I would reccomend to
>> >>> drop request for singa-user list at the beginning.
>> >>> -) All the initial committers come from university and seemed like
>> >>> some of them already ready to leave university. I am not too sure if
>> >>> this project go survive if all of the inital committers are from
>> >>> university as students.
>> >>> -) Need to solicit more mentors if this project ever get to Apache
>> >>> incubator.
>> >>>
>> >>> - Henry
>> >>>
>> >>> On Tue, Feb 3, 2015 at 3:58 PM, Thejas Nair <thejas.nair@gmail.com
>> <javascript:;>> wrote:
>> >>>> The "Relationship with Other Apache Products" section has been
>> >>>> updated. The reference to H2O in that section has been removed, and
>> >>>> other projects have been added.
>> >>>>  Thanks for the feedback!
>> >>>>
>> >>>>
>> >>>> On Wed, Jan 28, 2015 at 10:27 AM, Thejas Nair <thejas.nair@gmail.com
>> <javascript:;>>
>> >>> wrote:
>> >>>>> Thanks for pointing that out Henry! Yes, looks like H20 is not an
>> >>>>> apache project, I should have verified that.
>> >>>>> I will edit that, and revisit that section along with the folks in
>> >>>>> Singa community.
>> >>>>>
>> >>>>>
>> >>>>> On Tue, Jan 27, 2015 at 6:55 PM, Henry Saputra
>> >>> <henry.saputra@gmail.com <javascript:;>> wrote:
>> >>>>>> Quick immediate comment that "Apache H2O" is not really Apache
>> >>>>>> project.
>> >>>>>>
>> >>>>>> I assume you are referring to https://github.com/h2oai/h2o (or
>> >>>>>> https://github.com/h2oai/h2o-dev) ?
>> >>>>>>
>> >>>>>> - Henry
>> >>>>>>
>> >>>>>> On Tue, Jan 27, 2015 at 5:29 PM, Thejas Nair <thejas.nair@gmail.com
>> <javascript:;>>
>> >>> wrote:
>> >>>>>>> Hello everyone,
>> >>>>>>>
>> >>>>>>> I would like to propose the inclusion of Singa as an Apache
>> Incubator
>> >>> project.
>> >>>>>>>
>> >>>>>>> Here is the proposal -
>> >>>>>>> https://wiki.apache.org/incubator/SingaProposal
>> >>>>>>>
>> >>>>>>> Please review the proposal and give feedback. I am planning to
>> start
>> >>>>>>> a
>> >>>>>>> vote after 7 days if the proposal looks good.
>> >>>>>>> We are also seeking additional Apache mentors for the project.
>> >>>>>>>
>> >>>>>>> Thanks,
>> >>>>>>> Thejas
>> >>>>>>> ==========================================================
>> >>>>>>> Singa Incubator Proposal
>> >>>>>>>
>> >>>>>>> Abstract
>> >>>>>>>
>> >>>>>>> SINGA is a distributed deep learning platform.
>> >>>>>>>
>> >>>>>>> Proposal
>> >>>>>>>
>> >>>>>>> SINGA is an efficient, scalable and easy-to-use distributed
>> platform
>> >>>>>>> for training deep learning models, e.g., Deep Convolutional Neural
>> >>>>>>> Network and Deep Belief Network. It parallelizes the computation
>> >>>>>>> (i.e., training) onto a cluster of nodes by distributing the
>> training
>> >>>>>>> data and model automatically to speed up the training. Built-in
>> >>>>>>> training algorithms like Back-Propagation and Contrastive
>> Divergence
>> >>>>>>> are implemented based on common abstractions of deep learning
>> models.
>> >>>>>>> Users can train their own deep learning models by simply
>> customizing
>> >>>>>>> these abstractions like implementing the Mapper and Reducer in
>> >>>>>>> Hadoop.
>> >>>>>>>
>> >>>>>>> Background
>> >>>>>>>
>> >>>>>>> Deep learning refers to a set of feature (or representation)
>> learning
>> >>>>>>> models that consist of multiple (non-linear) layers, where
>> different
>> >>>>>>> layers learn different levels of abstractions (representations) of
>> >>>>>>> the
>> >>>>>>> raw input data. Larger (in terms of model parameters) and deeper
>> (in
>> >>>>>>> terms of number of layers) models have shown better performance,
>> >>>>>>> e.g.,
>> >>>>>>> lower image classification error in Large Scale Visual Recognition
>> >>>>>>> Challenge. However, a larger model requires more memory and larger
>> >>>>>>> training data to reduce over-fitting. Complex numeric operations
>> make
>> >>>>>>> the training computation intensive. In practice, training large
>> deep
>> >>>>>>> learning models takes weeks or months on a single node (even with
>> >>>>>>> GPU).
>> >>>>>>>
>> >>>>>>> Rational
>> >>>>>>>
>> >>>>>>> Deep learning has gained a lot of attraction in both academia and
>> >>>>>>> industry due to its success in a wide range of areas such as
>> computer
>> >>>>>>> vision and speech recognition. However, training of such models is
>> >>>>>>> computationally expensive, especially for large and deep models
>> >>>>>>> (e.g.,
>> >>>>>>> with billions of parameters and more than 10 layers). Both Google
>> and
>> >>>>>>> Microsoft have developed distributed deep learning systems to make
>> >>>>>>> the
>> >>>>>>> training more efficient by distributing the computations within a
>> >>>>>>> cluster of nodes. However, these systems are closed source
>> softwares.
>> >>>>>>> Our goal is to leverage the community of open source developers to
>> >>>>>>> make SINGA efficient, scalable and easy to use. SINGA is a full
>> >>>>>>> fledged distributed platform, that could benefit the community and
>> >>>>>>> also benefit from the community in their involvement in
>> contributing
>> >>>>>>> to the further work in this area. We believe the nature of SINGA
>> and
>> >>>>>>> our visions for the system fit naturally to Apache's philosophy and
>> >>>>>>> development framework.
>> >>>>>>>
>> >>>>>>> Initial Goals
>> >>>>>>>
>> >>>>>>> We have developed a system for SINGA running on a commodity
>> computer
>> >>>>>>> cluster. The initial goals include, * improving the system in terms
>> >>>>>>> of
>> >>>>>>> scalability and efficiency, e.g., using Infiniband for network
>> >>>>>>> communication and multi-threading for one node computation. We
>> would
>> >>>>>>> consider extending SINGA to GPU clusters later. * benchmarking with
>> >>>>>>> larger datasets (hundreds of millions of training instances) and
>> >>>>>>> models (billions of parameters). * adding more built-in deep
>> learning
>> >>>>>>> models. Users can train the built-in models on their datasets
>> >>>>>>> directly.
>> >>>>>>>
>> >>>>>>> Current Status
>> >>>>>>>
>> >>>>>>> Meritocracy
>> >>>>>>>
>> >>>>>>> We would like to follow ASF meritocratic principles to encourage
>> more
>> >>>>>>> developers to contribute in this project. We know that only active
>> >>>>>>> and
>> >>>>>>> excellent developers can make SINGA a successful project. The
>> >>>>>>> committer list and PMC will be updated based on developers'
>> >>>>>>> performance and commitment. We are also improving the documentation
>> >>>>>>> and code to help new developers get started quickly.
>> >>>>>>>
>> >>>>>>> Community
>> >>>>>>>
>> >>>>>>> SINGA is currently being developed in the Database System Research
>> >>>>>>> Lab
>> >>>>>>> at the National University of Singapore (NUS) in collaboration with
>> >>>>>>> Zhejiang University in China. Our lab has extensive experience in
>> >>>>>>> building database related systems, including distributed systems.
>> Six
>> >>>>>>> PhD students and research assistants (Jinyang Gao, Kaiping Zheng,
>> >>>>>>> Sheng Wang, Wei Wang, Zhaojing Luo and Zhongle Xie) , a research
>> >>>>>>> fellow (Anh Dinh) and three professors (Beng Chin Ooi, Gang Chen,
>> >>>>>>> Kian
>> >>>>>>> Lee Tan) have been working for a year on this project. We are open
>> to
>> >>>>>>> recruiting more developers from diverse backgrounds.
>> >>>>>>>
>> >>>>>>> Core Developers
>> >>>>>>>
>> >>>>>>> Beng Chin Ooi, Gang Chen and Kian Lee Tan are professors who have
>> >>>>>>> worked on distributed systems for more than 20 years. They have
>> >>>>>>> collaborated with the industry and have built various large scale
>> >>>>>>> systems. Anh Dinh's research is also on distributed systems, albeit
>> >>>>>>> with more focus on security aspects. Wei Wang's research is on deep
>> >>>>>>> learning problems including deep learning applications and large
>> >>>>>>> scale
>> >>>>>>> training. Sheng Wang and Jinyang are working on efficient indexing,
>> >>>>>>> querying of large scale data and machine learning. Kaiping,
>> Zhaojing
>> >>>>>>> and Zhongle are new PhD students who jointed SINGA recently. They
>> >>>>>>> will
>> >>>>>>> work on this project for a longer time (next 4-5 years). While we
>> >>>>>>> share common research interests, each member also brings diverse
>> >>>>>>> expertise to the team.
>> >>>>>>>
>> >>>>>>> Alignment
>> >>>>>>>
>> >>>>>>> ASF is already the home of many distributed platforms, e.g.,
>> Hadoop,
>> >>>>>>> Spark and Mahout, each of which targets a different application
>> >>>>>>> domain. SINGA, being a distributed platform for large-scale deep
>> >>>>>>> learning, focuses on another important domain for which there still
>> >>>>>>> lacks a robust and scalable open-source platform. The recent
>> success
>> >>>>>>> of deep learning models especially for vision and speech
>> recognition
>> >>>>>>> tasks has generated interests in both applying existing deep
>> learning
>> >>>>>>> models and in developing new ones. Thus, an open-source platform
>> for
>> >>>>>>> deep learning will be able to attract a large community of users
>> and
>> >>>>>>> developers. SINGA is a complex system needing many iterations of
>> >>>>>>> design, implementation and testing. Apache's collaboration
>> framework
>> >>>>>>> which encourages active contribution from developers will
>> inevitably
>> >>>>>>> help improve the quality of the system, as shown in the success of
>> >>>>>>> Hadoop, Spark, etc.. Equally important is the community of users
>> >>>>>>> which
>> >>>>>>> helps identify real-life applications of deep learning, and helps
>> to
>> >>>>>>> evaluate the system's performance and ease-of-use. We hope to
>> >>>>>>> leverage
>> >>>>>>> ASF for coordinating and promoting both communities, and in return
>> >>>>>>> benefit the communities with another useful tool.
>> >>>>>>>
>> >>>>>>> Known Risks
>> >>>>>>>
>> >>>>>>> Orphaned products
>> >>>>>>>
>> >>>>>>> Four core developers (Anh, Wei Wang, Jinyang and Sheng Wang) may
>> >>>>>>> leave
>> >>>>>>> the lab in two to four years time. It is possible that some of them
>> >>>>>>> may not have enough time to focus on this project after that. But,
>> >>>>>>> SINGA is part of our other bigger research projects on building an
>> >>>>>>> infrastructure for data intensive applications, which include
>> >>>>>>> health-care analytics and brain-inspired computing. Beng Chin and
>> >>>>>>> Kian
>> >>>>>>> Lee would continue working on it and getting more people involved.
>> >>>>>>> For
>> >>>>>>> example, three new developers (Kaiping, Zhaojing and Zhongle)
>> joined
>> >>>>>>> us recently. Individual developers are welcome to make SINGA a
>> >>>>>>> diverse
>> >>>>>>> community that is robust and independent from any single developer.
>> >>>>>>>
>> >>>>>>> Inexperience with Open Source
>> >>>>>>>
>> >>>>>>> All the developers are active users and followers of open source
>> >>>>>>> projects. Our research lab has a strong commitment to open source,
>> >>>>>>> and
>> >>>>>>> has released the source code of several systems under open source
>> >>>>>>> license as a way of contributing back to the open source community.
>> >>>>>>> But we do not have much real experience in open source projects
>> with
>> >>>>>>> large and well organized communities like those in Apache. This is
>> >>>>>>> one
>> >>>>>>> reason we choose Apache which is experienced in open source project
>> >>>>>>> incubation. We hope to get the help from Apache (e.g., champion and
>> >>>>>>> mentors) to establish a healthy path for SINGA.
>> >>>>>>>
>> >>>>>>> Homogenous Developers
>> >>>>>>>
>> >>>>>>> Although the current developers are researchers in the
>> universities,
>> >>>>>>> they have different research interests and project experiences, as
>> >>>>>>> mentioned in the section that introduces the core developers. We
>> know
>> >>>>>>> that a diverse community is helpful. Hence we are open to the idea
>> of
>> >>>>>>> recruiting developers from other regions and organizations.
>> >>>>>>>
>> >>>>>>> Reliance on Salaried Developers
>> >>>>>>>
>> >>>>>>> As a research project in the university, SINGA's current developing
>> >>>>>>> community consists of professors, PhD students, research assistants
>> >>>>>>> and postdoctoral fellows. They are driven by their interests to
>> work
>> >>>>>>> on this project and have contributed actively since the start of
>> the
>> >>>>>>> project. The research assistants and fellows are expected to leave
>> >>>>>>> when their contracts expire. However, they are keen to continue to
>> >>>>>>> work on the project voluntarily. Moreover, as a long term research
>> >>>>>>> project, new research assistants and fellows are likely to join the
>> >>>>>>> project.
>> >>>>>>>
>> >>>>>>> A Excessive Fascination with the Apache Brand
>> >>>>>>>
>> >>>>>>> We choose Apache not for publicity. We have two purposes. First, we
>> >>>>>>> want to leverage Apache's reputation to recruit more developers to
>> >>>>>>> make a diverse community. Second, we hope that Apache can help us
>> to
>> >>>>>>> establish a healthy path in developing SINGA. Beng Chin and
>> Kian-Lee
>> >>>>>>> are established database and distributed system researchers, and
>> >>>>>>> together with the other contributors, they sincerely believe that
>> >>>>>>> there is a need for a widely accepted open source distributed deep
>> >>>>>>> learning platform. The field of deep learning is still at its
>> >>>>>>> infancy,
>> >>>>>>> and an open source platform will fuel the research in the area.
>> >>>>>>> Moreover, such a platform will enable researchers to develop new
>> >>>>>>> models and algorithms, rather than spending time implementing a
>> deep
>> >>>>>>> learning system from scratch. Furthermore, the need for scalability
>> >>>>>>> for such a platform is obvious.
>> >>>>>>>
>> >>>>>>> Relationship with Other Apache Products
>> >>>>>>>
>> >>>>>>> Apache H2O implemented two simple deep learning models, namely the
>> >>>>>>> Multi-Layer Perceptron and Deep Auto-encoders. There are two
>> >>>>>>> significant differences between H2O and SINGA. First, H2O adopts
>> the
>> >>>>>>> Map-Reduce framework which runs a set of computing nodes in
>> parallel
>> >>>>>>> againsts of the training set. Model parameters trained by all
>> >>>>>>> computing nodes are averaged as the final model parameters. This
>> >>>>>>> training algorithm is different from the distributed training
>> >>>>>>> algorithm used by DistBelief, Adam and SINGA, which frequently
>> >>>>>>> synchronizes the parameters trained from different nodes. SINGA
>> >>>>>>> adopts
>> >>>>>>> the parameter server framework to support a wide range of
>> distributed
>> >>>>>>> training algorithms and parallelization methods (e.g., data
>> >>>>>>> parallelism, model parallelism and hybrid parallelism. H2O only
>> >>>>>>> support data parallelism) . Second, in H2O, users are restricted to
>> >>>>>>> use the two built-in models. In SINGA, we provide simple
>> programming
>> >>>>>>> model to let users implement their own deep learning models. A new
>> >>>>>>> deep learning model can be implemented by customizing the base
>> Layer
>> >>>>>>> class for each layer involved in the model. It is similar to
>> writing
>> >>>>>>> Hadoop programs where users only need to override the base Mapper
>> and
>> >>>>>>> Reducer. We also provide built-in models for users to use directly.
>> >>>>>>>
>> >>>>>>> Documentation
>> >>>>>>>
>> >>>>>>> The project is hosted at
>> >>>>>>> http://www.comp.nus.edu.sg/~dbsystem/project/singa.html.
>> >>>>>>> Documentations can be found at the Github Wiki Page:
>> >>>>>>> https://github.com/nusinga/singa/wiki. We continue to refine and
>> >>>>>>> improve the documentation.
>> >>>>>>>
>> >>>>>>> Initial Source
>> >>>>>>>
>> >>>>>>> We use Github to maintain our source code,
>> >>> https://github.com/nusinga/singa
>> >>>>>>>
>> >>>>>>> Source and Intellectual Property Submission Plan
>> >>>>>>>
>> >>>>>>> We plan to make our code base be under Apache License, Version 2.0.
>> >>>>>>>
>> >>>>>>> External Dependencies
>> >>>>>>>
>> >>>>>>> required by the core code base: glog, gflags, google protobuf,
>> >>>>>>> open-blas, mpich, armci-mpi.
>> >>>>>>> required by data preparation and preprocessing: opencv, hdfs,
>> python.
>> >>>>>>>
>> >>>>>>> Cryptography
>> >>>>>>>
>> >>>>>>> Not Applicable
>> >>>>>>>
>> >>>>>>> Required Resources
>> >>>>>>>
>> >>>>>>> Mailing Lists
>> >>>>>>>
>> >>>>>>> Currently, we use google group for internal discussion. The mailing
>> >>>>>>> address is nusinga@googlegroup.com <javascript:;>. We will
>> migrate the content to
>> >>>>>>> the
>> >>>>>>> apache mailing lists in the future.
>> >>>>>>>
>> >>>>>>> singa-dev
>> >>>>>>> singa-user
>> >>>>>>> singa-commits
>> >>>>>>> singa-private (for private discussion within PCM)
>> >>>>>>>
>> >>>>>>> Git Repository
>> >>>>>>>
>> >>>>>>> We want to continue using git for version control. Hence, a git
>> repo
>> >>>>>>> is required.
>> >>>>>>>
>> >>>>>>> Issue Tracking
>> >>>>>>>
>> >>>>>>> JIRA Singa (SINGA)
>> >>>>>>>
>> >>>>>>> Initial Committers
>> >>>>>>>
>> >>>>>>> Beng Chin Ooi (ooibc @comp.nus.edu.sg)
>> >>>>>>> Kian Lee Tan (tankl @comp.nus.edu.sg)
>> >>>>>>> Gang Chen (cg @zju.edu.cn)
>> >>>>>>> Wei Wang (wangwei @comp.nus.edu.sg)
>> >>>>>>> Dinh Tien Tuan Anh (dinhtta @comp.nus.edu.sg)
>> >>>>>>> Jinyang Gao (jinyang.gao @comp.nus.edu.sg)
>> >>>>>>> Sheng Wang (wangsh @comp.nus.edu.sg)
>> >>>>>>> Kaiping Zheng (kaiping @comp.nus.edu.sg)
>> >>>>>>> Zhaojing Luo (zhaojing @comp.nus.edu.sg)
>> >>>>>>> Zhongle Xie (zhongle @comp.nus.edu.sg)
>> >>>>>>>
>> >>>>>>> Affiliations
>> >>>>>>>
>> >>>>>>> Beng Chin Ooi, National University of Singapore
>> >>>>>>> Kian Lee Tan, National University of Singapore
>> >>>>>>> Gang Chen, Zhejiang University
>> >>>>>>> Wei Wang, National University of Singapore
>> >>>>>>> Dinh Tien Tuan Anh, National University of Singapore
>> >>>>>>> Jinyang Gao, National University of Singapore
>> >>>>>>> Sheng Wang, National University of Singapore
>> >>>>>>> Kaiping Zheng, National University of Singapore
>> >>>>>>> Zhaojing Luo, National University of Singapore
>> >>>>>>> Zhongle Xie, National University of Singapore
>> >>>>>>>
>> >>>>>>> Sponsors
>> >>>>>>>
>> >>>>>>> Champion
>> >>>>>>>
>> >>>>>>> Thejas Nair (thejas at apache.org) - Hortonworks
>> >>>>>>>
>> >>>>>>> Nominated Mentors
>> >>>>>>>
>> >>>>>>> Thejas Nair (thejas at apache.org) - Hortonworks
>> >>>>>>> Alan Gates (gates at apache dot org) - Hortonworks
>> >>>>>>> (Seeking more volunteers!)
>> >>>>>>>
>> >>>>>>> Sponsoring Entity
>> >>>>>>>
>> >>>>>>> We are requesting the Incubator to sponsor this project.
>> >>>>>>>
>> >>>>>>>
>> ---------------------------------------------------------------------
>> >>>>>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> <javascript:;>
>> >>>>>>> For additional commands, e-mail: general-help@incubator.apache.org
>> <javascript:;>
>> >>>>>>>
>> >>>>>>
>> >>>>>>
>> ---------------------------------------------------------------------
>> >>>>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> <javascript:;>
>> >>>>>> For additional commands, e-mail: general-help@incubator.apache.org
>> <javascript:;>
>> >>>>>>
>> >>>>
>> >>>> ---------------------------------------------------------------------
>> >>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> <javascript:;>
>> >>>> For additional commands, e-mail: general-help@incubator.apache.org
>> <javascript:;>
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> <javascript:;>
>> > For additional commands, e-mail: general-help@incubator.apache.org
>> <javascript:;>
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> <javascript:;>
>> For additional commands, e-mail: general-help@incubator.apache.org
>> <javascript:;>
>>
>>
>
> --
> Sent from My iPad, sorry for any misspellings.

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [DISCUSS] [PROPOSAL] Singa for Apache Incubator

Posted by jan i <ja...@apache.org>.
On Friday, February 27, 2015, Henry Saputra <he...@gmail.com> wrote:

> I am strongly suggest you solicit more (diverse) mentors before start the
> VOTE.
>
> All initial committers are from same org and all initial mentors are
> from same company (HW).

We do have a requirement for diversity, for me all initial committers from
the same company is just as big a problem as mentors. when everyone
involved are from the same company then that signals a serious problem
which should be addressed before starting a vote.

rgds
jan i

>
> I am not sure this is a good start for Apache podling.
>
>
> - Henry
>
> On Thu, Feb 26, 2015 at 9:12 AM, Thejas Nair <thejas.nair@gmail.com
> <javascript:;>> wrote:
> > The incubator proposal has been updated with the feedback so far.
> > We have 3 mentors now, but I think it would be good to have additional
> > mentors. Please let me know if anyone is able to help mentor this
> > project.
> >
> > I am planning to start a vote on the proposal in a day or two.
> >
> >
> > On Fri, Feb 6, 2015 at 5:21 PM,  <ooibc@comp.nus.edu.sg <javascript:;>>
> wrote:
> >>
> >> Regarding the number of users using this project -- at this moment, the
> >> community is not big.  A few local start-ups have been trying to use it
> >> (mainly due to announcement in our seminar list), eg. one is using it
> for
> >> image recognition (given a phone snapped by a user, it wants to be
> return
> >> the same the product, and a list of similar products, such as a luxury
> bag
> >> on a passerby).  Researchers from outside of NUS may have been using it
> >> since we published an application paper on cross domain/modal retrieval
> in
> >> VLDB 2014.
> >>
> >> We have not announced the project to the outside community yet -- we
> would
> >> announce it in dbworld etc in due course.
> >>
> >> Thanks and have a good weekend.
> >>
> >> regards
> >> beng chin
> >>
> >>>
> >>> Thanks for the comments and suggestions.
> >>> With permission from Thejas, I would like to respond to point 2.
> >>>
> >>> We have a huge team down at NUS (National University of Singapore) --
> >>> we have about seven database/data mining data professors (not including
> >>> those in systems, networking, and machine learning).
> >>> I myself have nine PhD students in a steady state, and I have a few
> large
> >>> grants, with a total budget of about 15 million S$ (~12 million USD),
> that
> >>> allows me to hire a number of research fellows and research assistants
> for
> >>> the next few years.  In a constant state, I have about 20 people (PhD
> >>> students/RA/RF) working with me alone.  Other professors have their own
> >>> grants (unlike other countries, it is relatively easy to get large
> grants
> >>> in Singapore; many overseas Universities, including UIUC, MIT, ETH etc
> >>> have research labs funded by Singapore Research Foundation [equivalent
> of
> >>> NSF]).
> >>>
> >>> SINGA is a long term project for us -- while it is a platform as it
> is, we
> >>> are using it for healthcare predictive analytics (by working with a
> >>> hospital associated with the University).  Therefore, we will be
> working
> >>> on SINGA, not solely as a distributed DL platform, but as a tool that
> will
> >>> enable us to do data analytics on some business domains (eg.
> healthcase,
> >>> consumer etc)
> >>>
> >>> For the initial set of committers, three are tenured professors, five
> are
> >>> students, with 2-5 years to go before they complete their PhD.  Quite
> >>> often, some would stay back as a research fellow for a couple of years
> >>> before they start looking for a job outside.  We will work with mentors
> >>> and new developers (from outside of NUS or Zhejiang University) in
> >>> enhancing the system.
> >>>
> >>> The project should survive in that sense.
> >>>
> >>> (I have an on-going project CIIDAA that has been around since 2008; it
> was
> >>> started as another project, epiC,  with a different grant, and then we
> >>> continue the development with a new grant for CIIDAA --
> >>> http://www.comp.nus.edu.sg/~ciidaa/
> >>> )
> >>>
> >>> Thanks.
> >>>
> >>> regards
> >>> beng chin
> >>> ps: i am not sure if my email will get through to the group.
> >>>
> >>>
> >>> ---------------------------- Original Message
> ----------------------------
> >>> Subject: Re: [DISCUSS] [PROPOSAL] Singa for Apache Incubator
> >>> From:    "Henry Saputra" <henry.saputra@gmail.com <javascript:;>>
> >>> Date:    Thu, February 5, 2015 2:57 pm
> >>> To:      "general@incubator.apache.org <javascript:;>" <
> general@incubator.apache.org <javascript:;>>
> >>> Cc:      ooibc@comp.nus.edu.sg <javascript:;>
> >>>
> --------------------------------------------------------------------------
> >>>
> >>> Several comments:
> >>> -) How many users already using this project? I would reccomend to
> >>> drop request for singa-user list at the beginning.
> >>> -) All the initial committers come from university and seemed like
> >>> some of them already ready to leave university. I am not too sure if
> >>> this project go survive if all of the inital committers are from
> >>> university as students.
> >>> -) Need to solicit more mentors if this project ever get to Apache
> >>> incubator.
> >>>
> >>> - Henry
> >>>
> >>> On Tue, Feb 3, 2015 at 3:58 PM, Thejas Nair <thejas.nair@gmail.com
> <javascript:;>> wrote:
> >>>> The "Relationship with Other Apache Products" section has been
> >>>> updated. The reference to H2O in that section has been removed, and
> >>>> other projects have been added.
> >>>>  Thanks for the feedback!
> >>>>
> >>>>
> >>>> On Wed, Jan 28, 2015 at 10:27 AM, Thejas Nair <thejas.nair@gmail.com
> <javascript:;>>
> >>> wrote:
> >>>>> Thanks for pointing that out Henry! Yes, looks like H20 is not an
> >>>>> apache project, I should have verified that.
> >>>>> I will edit that, and revisit that section along with the folks in
> >>>>> Singa community.
> >>>>>
> >>>>>
> >>>>> On Tue, Jan 27, 2015 at 6:55 PM, Henry Saputra
> >>> <henry.saputra@gmail.com <javascript:;>> wrote:
> >>>>>> Quick immediate comment that "Apache H2O" is not really Apache
> >>>>>> project.
> >>>>>>
> >>>>>> I assume you are referring to https://github.com/h2oai/h2o (or
> >>>>>> https://github.com/h2oai/h2o-dev) ?
> >>>>>>
> >>>>>> - Henry
> >>>>>>
> >>>>>> On Tue, Jan 27, 2015 at 5:29 PM, Thejas Nair <thejas.nair@gmail.com
> <javascript:;>>
> >>> wrote:
> >>>>>>> Hello everyone,
> >>>>>>>
> >>>>>>> I would like to propose the inclusion of Singa as an Apache
> Incubator
> >>> project.
> >>>>>>>
> >>>>>>> Here is the proposal -
> >>>>>>> https://wiki.apache.org/incubator/SingaProposal
> >>>>>>>
> >>>>>>> Please review the proposal and give feedback. I am planning to
> start
> >>>>>>> a
> >>>>>>> vote after 7 days if the proposal looks good.
> >>>>>>> We are also seeking additional Apache mentors for the project.
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Thejas
> >>>>>>> ==========================================================
> >>>>>>> Singa Incubator Proposal
> >>>>>>>
> >>>>>>> Abstract
> >>>>>>>
> >>>>>>> SINGA is a distributed deep learning platform.
> >>>>>>>
> >>>>>>> Proposal
> >>>>>>>
> >>>>>>> SINGA is an efficient, scalable and easy-to-use distributed
> platform
> >>>>>>> for training deep learning models, e.g., Deep Convolutional Neural
> >>>>>>> Network and Deep Belief Network. It parallelizes the computation
> >>>>>>> (i.e., training) onto a cluster of nodes by distributing the
> training
> >>>>>>> data and model automatically to speed up the training. Built-in
> >>>>>>> training algorithms like Back-Propagation and Contrastive
> Divergence
> >>>>>>> are implemented based on common abstractions of deep learning
> models.
> >>>>>>> Users can train their own deep learning models by simply
> customizing
> >>>>>>> these abstractions like implementing the Mapper and Reducer in
> >>>>>>> Hadoop.
> >>>>>>>
> >>>>>>> Background
> >>>>>>>
> >>>>>>> Deep learning refers to a set of feature (or representation)
> learning
> >>>>>>> models that consist of multiple (non-linear) layers, where
> different
> >>>>>>> layers learn different levels of abstractions (representations) of
> >>>>>>> the
> >>>>>>> raw input data. Larger (in terms of model parameters) and deeper
> (in
> >>>>>>> terms of number of layers) models have shown better performance,
> >>>>>>> e.g.,
> >>>>>>> lower image classification error in Large Scale Visual Recognition
> >>>>>>> Challenge. However, a larger model requires more memory and larger
> >>>>>>> training data to reduce over-fitting. Complex numeric operations
> make
> >>>>>>> the training computation intensive. In practice, training large
> deep
> >>>>>>> learning models takes weeks or months on a single node (even with
> >>>>>>> GPU).
> >>>>>>>
> >>>>>>> Rational
> >>>>>>>
> >>>>>>> Deep learning has gained a lot of attraction in both academia and
> >>>>>>> industry due to its success in a wide range of areas such as
> computer
> >>>>>>> vision and speech recognition. However, training of such models is
> >>>>>>> computationally expensive, especially for large and deep models
> >>>>>>> (e.g.,
> >>>>>>> with billions of parameters and more than 10 layers). Both Google
> and
> >>>>>>> Microsoft have developed distributed deep learning systems to make
> >>>>>>> the
> >>>>>>> training more efficient by distributing the computations within a
> >>>>>>> cluster of nodes. However, these systems are closed source
> softwares.
> >>>>>>> Our goal is to leverage the community of open source developers to
> >>>>>>> make SINGA efficient, scalable and easy to use. SINGA is a full
> >>>>>>> fledged distributed platform, that could benefit the community and
> >>>>>>> also benefit from the community in their involvement in
> contributing
> >>>>>>> to the further work in this area. We believe the nature of SINGA
> and
> >>>>>>> our visions for the system fit naturally to Apache's philosophy and
> >>>>>>> development framework.
> >>>>>>>
> >>>>>>> Initial Goals
> >>>>>>>
> >>>>>>> We have developed a system for SINGA running on a commodity
> computer
> >>>>>>> cluster. The initial goals include, * improving the system in terms
> >>>>>>> of
> >>>>>>> scalability and efficiency, e.g., using Infiniband for network
> >>>>>>> communication and multi-threading for one node computation. We
> would
> >>>>>>> consider extending SINGA to GPU clusters later. * benchmarking with
> >>>>>>> larger datasets (hundreds of millions of training instances) and
> >>>>>>> models (billions of parameters). * adding more built-in deep
> learning
> >>>>>>> models. Users can train the built-in models on their datasets
> >>>>>>> directly.
> >>>>>>>
> >>>>>>> Current Status
> >>>>>>>
> >>>>>>> Meritocracy
> >>>>>>>
> >>>>>>> We would like to follow ASF meritocratic principles to encourage
> more
> >>>>>>> developers to contribute in this project. We know that only active
> >>>>>>> and
> >>>>>>> excellent developers can make SINGA a successful project. The
> >>>>>>> committer list and PMC will be updated based on developers'
> >>>>>>> performance and commitment. We are also improving the documentation
> >>>>>>> and code to help new developers get started quickly.
> >>>>>>>
> >>>>>>> Community
> >>>>>>>
> >>>>>>> SINGA is currently being developed in the Database System Research
> >>>>>>> Lab
> >>>>>>> at the National University of Singapore (NUS) in collaboration with
> >>>>>>> Zhejiang University in China. Our lab has extensive experience in
> >>>>>>> building database related systems, including distributed systems.
> Six
> >>>>>>> PhD students and research assistants (Jinyang Gao, Kaiping Zheng,
> >>>>>>> Sheng Wang, Wei Wang, Zhaojing Luo and Zhongle Xie) , a research
> >>>>>>> fellow (Anh Dinh) and three professors (Beng Chin Ooi, Gang Chen,
> >>>>>>> Kian
> >>>>>>> Lee Tan) have been working for a year on this project. We are open
> to
> >>>>>>> recruiting more developers from diverse backgrounds.
> >>>>>>>
> >>>>>>> Core Developers
> >>>>>>>
> >>>>>>> Beng Chin Ooi, Gang Chen and Kian Lee Tan are professors who have
> >>>>>>> worked on distributed systems for more than 20 years. They have
> >>>>>>> collaborated with the industry and have built various large scale
> >>>>>>> systems. Anh Dinh's research is also on distributed systems, albeit
> >>>>>>> with more focus on security aspects. Wei Wang's research is on deep
> >>>>>>> learning problems including deep learning applications and large
> >>>>>>> scale
> >>>>>>> training. Sheng Wang and Jinyang are working on efficient indexing,
> >>>>>>> querying of large scale data and machine learning. Kaiping,
> Zhaojing
> >>>>>>> and Zhongle are new PhD students who jointed SINGA recently. They
> >>>>>>> will
> >>>>>>> work on this project for a longer time (next 4-5 years). While we
> >>>>>>> share common research interests, each member also brings diverse
> >>>>>>> expertise to the team.
> >>>>>>>
> >>>>>>> Alignment
> >>>>>>>
> >>>>>>> ASF is already the home of many distributed platforms, e.g.,
> Hadoop,
> >>>>>>> Spark and Mahout, each of which targets a different application
> >>>>>>> domain. SINGA, being a distributed platform for large-scale deep
> >>>>>>> learning, focuses on another important domain for which there still
> >>>>>>> lacks a robust and scalable open-source platform. The recent
> success
> >>>>>>> of deep learning models especially for vision and speech
> recognition
> >>>>>>> tasks has generated interests in both applying existing deep
> learning
> >>>>>>> models and in developing new ones. Thus, an open-source platform
> for
> >>>>>>> deep learning will be able to attract a large community of users
> and
> >>>>>>> developers. SINGA is a complex system needing many iterations of
> >>>>>>> design, implementation and testing. Apache's collaboration
> framework
> >>>>>>> which encourages active contribution from developers will
> inevitably
> >>>>>>> help improve the quality of the system, as shown in the success of
> >>>>>>> Hadoop, Spark, etc.. Equally important is the community of users
> >>>>>>> which
> >>>>>>> helps identify real-life applications of deep learning, and helps
> to
> >>>>>>> evaluate the system's performance and ease-of-use. We hope to
> >>>>>>> leverage
> >>>>>>> ASF for coordinating and promoting both communities, and in return
> >>>>>>> benefit the communities with another useful tool.
> >>>>>>>
> >>>>>>> Known Risks
> >>>>>>>
> >>>>>>> Orphaned products
> >>>>>>>
> >>>>>>> Four core developers (Anh, Wei Wang, Jinyang and Sheng Wang) may
> >>>>>>> leave
> >>>>>>> the lab in two to four years time. It is possible that some of them
> >>>>>>> may not have enough time to focus on this project after that. But,
> >>>>>>> SINGA is part of our other bigger research projects on building an
> >>>>>>> infrastructure for data intensive applications, which include
> >>>>>>> health-care analytics and brain-inspired computing. Beng Chin and
> >>>>>>> Kian
> >>>>>>> Lee would continue working on it and getting more people involved.
> >>>>>>> For
> >>>>>>> example, three new developers (Kaiping, Zhaojing and Zhongle)
> joined
> >>>>>>> us recently. Individual developers are welcome to make SINGA a
> >>>>>>> diverse
> >>>>>>> community that is robust and independent from any single developer.
> >>>>>>>
> >>>>>>> Inexperience with Open Source
> >>>>>>>
> >>>>>>> All the developers are active users and followers of open source
> >>>>>>> projects. Our research lab has a strong commitment to open source,
> >>>>>>> and
> >>>>>>> has released the source code of several systems under open source
> >>>>>>> license as a way of contributing back to the open source community.
> >>>>>>> But we do not have much real experience in open source projects
> with
> >>>>>>> large and well organized communities like those in Apache. This is
> >>>>>>> one
> >>>>>>> reason we choose Apache which is experienced in open source project
> >>>>>>> incubation. We hope to get the help from Apache (e.g., champion and
> >>>>>>> mentors) to establish a healthy path for SINGA.
> >>>>>>>
> >>>>>>> Homogenous Developers
> >>>>>>>
> >>>>>>> Although the current developers are researchers in the
> universities,
> >>>>>>> they have different research interests and project experiences, as
> >>>>>>> mentioned in the section that introduces the core developers. We
> know
> >>>>>>> that a diverse community is helpful. Hence we are open to the idea
> of
> >>>>>>> recruiting developers from other regions and organizations.
> >>>>>>>
> >>>>>>> Reliance on Salaried Developers
> >>>>>>>
> >>>>>>> As a research project in the university, SINGA's current developing
> >>>>>>> community consists of professors, PhD students, research assistants
> >>>>>>> and postdoctoral fellows. They are driven by their interests to
> work
> >>>>>>> on this project and have contributed actively since the start of
> the
> >>>>>>> project. The research assistants and fellows are expected to leave
> >>>>>>> when their contracts expire. However, they are keen to continue to
> >>>>>>> work on the project voluntarily. Moreover, as a long term research
> >>>>>>> project, new research assistants and fellows are likely to join the
> >>>>>>> project.
> >>>>>>>
> >>>>>>> A Excessive Fascination with the Apache Brand
> >>>>>>>
> >>>>>>> We choose Apache not for publicity. We have two purposes. First, we
> >>>>>>> want to leverage Apache's reputation to recruit more developers to
> >>>>>>> make a diverse community. Second, we hope that Apache can help us
> to
> >>>>>>> establish a healthy path in developing SINGA. Beng Chin and
> Kian-Lee
> >>>>>>> are established database and distributed system researchers, and
> >>>>>>> together with the other contributors, they sincerely believe that
> >>>>>>> there is a need for a widely accepted open source distributed deep
> >>>>>>> learning platform. The field of deep learning is still at its
> >>>>>>> infancy,
> >>>>>>> and an open source platform will fuel the research in the area.
> >>>>>>> Moreover, such a platform will enable researchers to develop new
> >>>>>>> models and algorithms, rather than spending time implementing a
> deep
> >>>>>>> learning system from scratch. Furthermore, the need for scalability
> >>>>>>> for such a platform is obvious.
> >>>>>>>
> >>>>>>> Relationship with Other Apache Products
> >>>>>>>
> >>>>>>> Apache H2O implemented two simple deep learning models, namely the
> >>>>>>> Multi-Layer Perceptron and Deep Auto-encoders. There are two
> >>>>>>> significant differences between H2O and SINGA. First, H2O adopts
> the
> >>>>>>> Map-Reduce framework which runs a set of computing nodes in
> parallel
> >>>>>>> againsts of the training set. Model parameters trained by all
> >>>>>>> computing nodes are averaged as the final model parameters. This
> >>>>>>> training algorithm is different from the distributed training
> >>>>>>> algorithm used by DistBelief, Adam and SINGA, which frequently
> >>>>>>> synchronizes the parameters trained from different nodes. SINGA
> >>>>>>> adopts
> >>>>>>> the parameter server framework to support a wide range of
> distributed
> >>>>>>> training algorithms and parallelization methods (e.g., data
> >>>>>>> parallelism, model parallelism and hybrid parallelism. H2O only
> >>>>>>> support data parallelism) . Second, in H2O, users are restricted to
> >>>>>>> use the two built-in models. In SINGA, we provide simple
> programming
> >>>>>>> model to let users implement their own deep learning models. A new
> >>>>>>> deep learning model can be implemented by customizing the base
> Layer
> >>>>>>> class for each layer involved in the model. It is similar to
> writing
> >>>>>>> Hadoop programs where users only need to override the base Mapper
> and
> >>>>>>> Reducer. We also provide built-in models for users to use directly.
> >>>>>>>
> >>>>>>> Documentation
> >>>>>>>
> >>>>>>> The project is hosted at
> >>>>>>> http://www.comp.nus.edu.sg/~dbsystem/project/singa.html.
> >>>>>>> Documentations can be found at the Github Wiki Page:
> >>>>>>> https://github.com/nusinga/singa/wiki. We continue to refine and
> >>>>>>> improve the documentation.
> >>>>>>>
> >>>>>>> Initial Source
> >>>>>>>
> >>>>>>> We use Github to maintain our source code,
> >>> https://github.com/nusinga/singa
> >>>>>>>
> >>>>>>> Source and Intellectual Property Submission Plan
> >>>>>>>
> >>>>>>> We plan to make our code base be under Apache License, Version 2.0.
> >>>>>>>
> >>>>>>> External Dependencies
> >>>>>>>
> >>>>>>> required by the core code base: glog, gflags, google protobuf,
> >>>>>>> open-blas, mpich, armci-mpi.
> >>>>>>> required by data preparation and preprocessing: opencv, hdfs,
> python.
> >>>>>>>
> >>>>>>> Cryptography
> >>>>>>>
> >>>>>>> Not Applicable
> >>>>>>>
> >>>>>>> Required Resources
> >>>>>>>
> >>>>>>> Mailing Lists
> >>>>>>>
> >>>>>>> Currently, we use google group for internal discussion. The mailing
> >>>>>>> address is nusinga@googlegroup.com <javascript:;>. We will
> migrate the content to
> >>>>>>> the
> >>>>>>> apache mailing lists in the future.
> >>>>>>>
> >>>>>>> singa-dev
> >>>>>>> singa-user
> >>>>>>> singa-commits
> >>>>>>> singa-private (for private discussion within PCM)
> >>>>>>>
> >>>>>>> Git Repository
> >>>>>>>
> >>>>>>> We want to continue using git for version control. Hence, a git
> repo
> >>>>>>> is required.
> >>>>>>>
> >>>>>>> Issue Tracking
> >>>>>>>
> >>>>>>> JIRA Singa (SINGA)
> >>>>>>>
> >>>>>>> Initial Committers
> >>>>>>>
> >>>>>>> Beng Chin Ooi (ooibc @comp.nus.edu.sg)
> >>>>>>> Kian Lee Tan (tankl @comp.nus.edu.sg)
> >>>>>>> Gang Chen (cg @zju.edu.cn)
> >>>>>>> Wei Wang (wangwei @comp.nus.edu.sg)
> >>>>>>> Dinh Tien Tuan Anh (dinhtta @comp.nus.edu.sg)
> >>>>>>> Jinyang Gao (jinyang.gao @comp.nus.edu.sg)
> >>>>>>> Sheng Wang (wangsh @comp.nus.edu.sg)
> >>>>>>> Kaiping Zheng (kaiping @comp.nus.edu.sg)
> >>>>>>> Zhaojing Luo (zhaojing @comp.nus.edu.sg)
> >>>>>>> Zhongle Xie (zhongle @comp.nus.edu.sg)
> >>>>>>>
> >>>>>>> Affiliations
> >>>>>>>
> >>>>>>> Beng Chin Ooi, National University of Singapore
> >>>>>>> Kian Lee Tan, National University of Singapore
> >>>>>>> Gang Chen, Zhejiang University
> >>>>>>> Wei Wang, National University of Singapore
> >>>>>>> Dinh Tien Tuan Anh, National University of Singapore
> >>>>>>> Jinyang Gao, National University of Singapore
> >>>>>>> Sheng Wang, National University of Singapore
> >>>>>>> Kaiping Zheng, National University of Singapore
> >>>>>>> Zhaojing Luo, National University of Singapore
> >>>>>>> Zhongle Xie, National University of Singapore
> >>>>>>>
> >>>>>>> Sponsors
> >>>>>>>
> >>>>>>> Champion
> >>>>>>>
> >>>>>>> Thejas Nair (thejas at apache.org) - Hortonworks
> >>>>>>>
> >>>>>>> Nominated Mentors
> >>>>>>>
> >>>>>>> Thejas Nair (thejas at apache.org) - Hortonworks
> >>>>>>> Alan Gates (gates at apache dot org) - Hortonworks
> >>>>>>> (Seeking more volunteers!)
> >>>>>>>
> >>>>>>> Sponsoring Entity
> >>>>>>>
> >>>>>>> We are requesting the Incubator to sponsor this project.
> >>>>>>>
> >>>>>>>
> ---------------------------------------------------------------------
> >>>>>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> <javascript:;>
> >>>>>>> For additional commands, e-mail: general-help@incubator.apache.org
> <javascript:;>
> >>>>>>>
> >>>>>>
> >>>>>>
> ---------------------------------------------------------------------
> >>>>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> <javascript:;>
> >>>>>> For additional commands, e-mail: general-help@incubator.apache.org
> <javascript:;>
> >>>>>>
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> <javascript:;>
> >>>> For additional commands, e-mail: general-help@incubator.apache.org
> <javascript:;>
> >>>>
> >>>
> >>>
> >>>
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> <javascript:;>
> > For additional commands, e-mail: general-help@incubator.apache.org
> <javascript:;>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> <javascript:;>
> For additional commands, e-mail: general-help@incubator.apache.org
> <javascript:;>
>
>

-- 
Sent from My iPad, sorry for any misspellings.

Re: [Fwd: Re: [DISCUSS] [PROPOSAL] Singa for Apache Incubator]

Posted by Henry Saputra <he...@gmail.com>.
I am strongly suggest you solicit more (diverse) mentors before start the VOTE.

All initial committers are from same org and all initial mentors are
from same company (HW).

I am not sure this is a good start for Apache podling.


- Henry

On Thu, Feb 26, 2015 at 9:12 AM, Thejas Nair <th...@gmail.com> wrote:
> The incubator proposal has been updated with the feedback so far.
> We have 3 mentors now, but I think it would be good to have additional
> mentors. Please let me know if anyone is able to help mentor this
> project.
>
> I am planning to start a vote on the proposal in a day or two.
>
>
> On Fri, Feb 6, 2015 at 5:21 PM,  <oo...@comp.nus.edu.sg> wrote:
>>
>> Regarding the number of users using this project -- at this moment, the
>> community is not big.  A few local start-ups have been trying to use it
>> (mainly due to announcement in our seminar list), eg. one is using it for
>> image recognition (given a phone snapped by a user, it wants to be return
>> the same the product, and a list of similar products, such as a luxury bag
>> on a passerby).  Researchers from outside of NUS may have been using it
>> since we published an application paper on cross domain/modal retrieval in
>> VLDB 2014.
>>
>> We have not announced the project to the outside community yet -- we would
>> announce it in dbworld etc in due course.
>>
>> Thanks and have a good weekend.
>>
>> regards
>> beng chin
>>
>>>
>>> Thanks for the comments and suggestions.
>>> With permission from Thejas, I would like to respond to point 2.
>>>
>>> We have a huge team down at NUS (National University of Singapore) --
>>> we have about seven database/data mining data professors (not including
>>> those in systems, networking, and machine learning).
>>> I myself have nine PhD students in a steady state, and I have a few large
>>> grants, with a total budget of about 15 million S$ (~12 million USD), that
>>> allows me to hire a number of research fellows and research assistants for
>>> the next few years.  In a constant state, I have about 20 people (PhD
>>> students/RA/RF) working with me alone.  Other professors have their own
>>> grants (unlike other countries, it is relatively easy to get large grants
>>> in Singapore; many overseas Universities, including UIUC, MIT, ETH etc
>>> have research labs funded by Singapore Research Foundation [equivalent of
>>> NSF]).
>>>
>>> SINGA is a long term project for us -- while it is a platform as it is, we
>>> are using it for healthcare predictive analytics (by working with a
>>> hospital associated with the University).  Therefore, we will be working
>>> on SINGA, not solely as a distributed DL platform, but as a tool that will
>>> enable us to do data analytics on some business domains (eg. healthcase,
>>> consumer etc)
>>>
>>> For the initial set of committers, three are tenured professors, five are
>>> students, with 2-5 years to go before they complete their PhD.  Quite
>>> often, some would stay back as a research fellow for a couple of years
>>> before they start looking for a job outside.  We will work with mentors
>>> and new developers (from outside of NUS or Zhejiang University) in
>>> enhancing the system.
>>>
>>> The project should survive in that sense.
>>>
>>> (I have an on-going project CIIDAA that has been around since 2008; it was
>>> started as another project, epiC,  with a different grant, and then we
>>> continue the development with a new grant for CIIDAA --
>>> http://www.comp.nus.edu.sg/~ciidaa/
>>> )
>>>
>>> Thanks.
>>>
>>> regards
>>> beng chin
>>> ps: i am not sure if my email will get through to the group.
>>>
>>>
>>> ---------------------------- Original Message ----------------------------
>>> Subject: Re: [DISCUSS] [PROPOSAL] Singa for Apache Incubator
>>> From:    "Henry Saputra" <he...@gmail.com>
>>> Date:    Thu, February 5, 2015 2:57 pm
>>> To:      "general@incubator.apache.org" <ge...@incubator.apache.org>
>>> Cc:      ooibc@comp.nus.edu.sg
>>> --------------------------------------------------------------------------
>>>
>>> Several comments:
>>> -) How many users already using this project? I would reccomend to
>>> drop request for singa-user list at the beginning.
>>> -) All the initial committers come from university and seemed like
>>> some of them already ready to leave university. I am not too sure if
>>> this project go survive if all of the inital committers are from
>>> university as students.
>>> -) Need to solicit more mentors if this project ever get to Apache
>>> incubator.
>>>
>>> - Henry
>>>
>>> On Tue, Feb 3, 2015 at 3:58 PM, Thejas Nair <th...@gmail.com> wrote:
>>>> The "Relationship with Other Apache Products" section has been
>>>> updated. The reference to H2O in that section has been removed, and
>>>> other projects have been added.
>>>>  Thanks for the feedback!
>>>>
>>>>
>>>> On Wed, Jan 28, 2015 at 10:27 AM, Thejas Nair <th...@gmail.com>
>>> wrote:
>>>>> Thanks for pointing that out Henry! Yes, looks like H20 is not an
>>>>> apache project, I should have verified that.
>>>>> I will edit that, and revisit that section along with the folks in
>>>>> Singa community.
>>>>>
>>>>>
>>>>> On Tue, Jan 27, 2015 at 6:55 PM, Henry Saputra
>>> <he...@gmail.com> wrote:
>>>>>> Quick immediate comment that "Apache H2O" is not really Apache
>>>>>> project.
>>>>>>
>>>>>> I assume you are referring to https://github.com/h2oai/h2o (or
>>>>>> https://github.com/h2oai/h2o-dev) ?
>>>>>>
>>>>>> - Henry
>>>>>>
>>>>>> On Tue, Jan 27, 2015 at 5:29 PM, Thejas Nair <th...@gmail.com>
>>> wrote:
>>>>>>> Hello everyone,
>>>>>>>
>>>>>>> I would like to propose the inclusion of Singa as an Apache Incubator
>>> project.
>>>>>>>
>>>>>>> Here is the proposal -
>>>>>>> https://wiki.apache.org/incubator/SingaProposal
>>>>>>>
>>>>>>> Please review the proposal and give feedback. I am planning to start
>>>>>>> a
>>>>>>> vote after 7 days if the proposal looks good.
>>>>>>> We are also seeking additional Apache mentors for the project.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Thejas
>>>>>>> ==========================================================
>>>>>>> Singa Incubator Proposal
>>>>>>>
>>>>>>> Abstract
>>>>>>>
>>>>>>> SINGA is a distributed deep learning platform.
>>>>>>>
>>>>>>> Proposal
>>>>>>>
>>>>>>> SINGA is an efficient, scalable and easy-to-use distributed platform
>>>>>>> for training deep learning models, e.g., Deep Convolutional Neural
>>>>>>> Network and Deep Belief Network. It parallelizes the computation
>>>>>>> (i.e., training) onto a cluster of nodes by distributing the training
>>>>>>> data and model automatically to speed up the training. Built-in
>>>>>>> training algorithms like Back-Propagation and Contrastive Divergence
>>>>>>> are implemented based on common abstractions of deep learning models.
>>>>>>> Users can train their own deep learning models by simply customizing
>>>>>>> these abstractions like implementing the Mapper and Reducer in
>>>>>>> Hadoop.
>>>>>>>
>>>>>>> Background
>>>>>>>
>>>>>>> Deep learning refers to a set of feature (or representation) learning
>>>>>>> models that consist of multiple (non-linear) layers, where different
>>>>>>> layers learn different levels of abstractions (representations) of
>>>>>>> the
>>>>>>> raw input data. Larger (in terms of model parameters) and deeper (in
>>>>>>> terms of number of layers) models have shown better performance,
>>>>>>> e.g.,
>>>>>>> lower image classification error in Large Scale Visual Recognition
>>>>>>> Challenge. However, a larger model requires more memory and larger
>>>>>>> training data to reduce over-fitting. Complex numeric operations make
>>>>>>> the training computation intensive. In practice, training large deep
>>>>>>> learning models takes weeks or months on a single node (even with
>>>>>>> GPU).
>>>>>>>
>>>>>>> Rational
>>>>>>>
>>>>>>> Deep learning has gained a lot of attraction in both academia and
>>>>>>> industry due to its success in a wide range of areas such as computer
>>>>>>> vision and speech recognition. However, training of such models is
>>>>>>> computationally expensive, especially for large and deep models
>>>>>>> (e.g.,
>>>>>>> with billions of parameters and more than 10 layers). Both Google and
>>>>>>> Microsoft have developed distributed deep learning systems to make
>>>>>>> the
>>>>>>> training more efficient by distributing the computations within a
>>>>>>> cluster of nodes. However, these systems are closed source softwares.
>>>>>>> Our goal is to leverage the community of open source developers to
>>>>>>> make SINGA efficient, scalable and easy to use. SINGA is a full
>>>>>>> fledged distributed platform, that could benefit the community and
>>>>>>> also benefit from the community in their involvement in contributing
>>>>>>> to the further work in this area. We believe the nature of SINGA and
>>>>>>> our visions for the system fit naturally to Apache's philosophy and
>>>>>>> development framework.
>>>>>>>
>>>>>>> Initial Goals
>>>>>>>
>>>>>>> We have developed a system for SINGA running on a commodity computer
>>>>>>> cluster. The initial goals include, * improving the system in terms
>>>>>>> of
>>>>>>> scalability and efficiency, e.g., using Infiniband for network
>>>>>>> communication and multi-threading for one node computation. We would
>>>>>>> consider extending SINGA to GPU clusters later. * benchmarking with
>>>>>>> larger datasets (hundreds of millions of training instances) and
>>>>>>> models (billions of parameters). * adding more built-in deep learning
>>>>>>> models. Users can train the built-in models on their datasets
>>>>>>> directly.
>>>>>>>
>>>>>>> Current Status
>>>>>>>
>>>>>>> Meritocracy
>>>>>>>
>>>>>>> We would like to follow ASF meritocratic principles to encourage more
>>>>>>> developers to contribute in this project. We know that only active
>>>>>>> and
>>>>>>> excellent developers can make SINGA a successful project. The
>>>>>>> committer list and PMC will be updated based on developers'
>>>>>>> performance and commitment. We are also improving the documentation
>>>>>>> and code to help new developers get started quickly.
>>>>>>>
>>>>>>> Community
>>>>>>>
>>>>>>> SINGA is currently being developed in the Database System Research
>>>>>>> Lab
>>>>>>> at the National University of Singapore (NUS) in collaboration with
>>>>>>> Zhejiang University in China. Our lab has extensive experience in
>>>>>>> building database related systems, including distributed systems. Six
>>>>>>> PhD students and research assistants (Jinyang Gao, Kaiping Zheng,
>>>>>>> Sheng Wang, Wei Wang, Zhaojing Luo and Zhongle Xie) , a research
>>>>>>> fellow (Anh Dinh) and three professors (Beng Chin Ooi, Gang Chen,
>>>>>>> Kian
>>>>>>> Lee Tan) have been working for a year on this project. We are open to
>>>>>>> recruiting more developers from diverse backgrounds.
>>>>>>>
>>>>>>> Core Developers
>>>>>>>
>>>>>>> Beng Chin Ooi, Gang Chen and Kian Lee Tan are professors who have
>>>>>>> worked on distributed systems for more than 20 years. They have
>>>>>>> collaborated with the industry and have built various large scale
>>>>>>> systems. Anh Dinh's research is also on distributed systems, albeit
>>>>>>> with more focus on security aspects. Wei Wang's research is on deep
>>>>>>> learning problems including deep learning applications and large
>>>>>>> scale
>>>>>>> training. Sheng Wang and Jinyang are working on efficient indexing,
>>>>>>> querying of large scale data and machine learning. Kaiping, Zhaojing
>>>>>>> and Zhongle are new PhD students who jointed SINGA recently. They
>>>>>>> will
>>>>>>> work on this project for a longer time (next 4-5 years). While we
>>>>>>> share common research interests, each member also brings diverse
>>>>>>> expertise to the team.
>>>>>>>
>>>>>>> Alignment
>>>>>>>
>>>>>>> ASF is already the home of many distributed platforms, e.g., Hadoop,
>>>>>>> Spark and Mahout, each of which targets a different application
>>>>>>> domain. SINGA, being a distributed platform for large-scale deep
>>>>>>> learning, focuses on another important domain for which there still
>>>>>>> lacks a robust and scalable open-source platform. The recent success
>>>>>>> of deep learning models especially for vision and speech recognition
>>>>>>> tasks has generated interests in both applying existing deep learning
>>>>>>> models and in developing new ones. Thus, an open-source platform for
>>>>>>> deep learning will be able to attract a large community of users and
>>>>>>> developers. SINGA is a complex system needing many iterations of
>>>>>>> design, implementation and testing. Apache's collaboration framework
>>>>>>> which encourages active contribution from developers will inevitably
>>>>>>> help improve the quality of the system, as shown in the success of
>>>>>>> Hadoop, Spark, etc.. Equally important is the community of users
>>>>>>> which
>>>>>>> helps identify real-life applications of deep learning, and helps to
>>>>>>> evaluate the system's performance and ease-of-use. We hope to
>>>>>>> leverage
>>>>>>> ASF for coordinating and promoting both communities, and in return
>>>>>>> benefit the communities with another useful tool.
>>>>>>>
>>>>>>> Known Risks
>>>>>>>
>>>>>>> Orphaned products
>>>>>>>
>>>>>>> Four core developers (Anh, Wei Wang, Jinyang and Sheng Wang) may
>>>>>>> leave
>>>>>>> the lab in two to four years time. It is possible that some of them
>>>>>>> may not have enough time to focus on this project after that. But,
>>>>>>> SINGA is part of our other bigger research projects on building an
>>>>>>> infrastructure for data intensive applications, which include
>>>>>>> health-care analytics and brain-inspired computing. Beng Chin and
>>>>>>> Kian
>>>>>>> Lee would continue working on it and getting more people involved.
>>>>>>> For
>>>>>>> example, three new developers (Kaiping, Zhaojing and Zhongle) joined
>>>>>>> us recently. Individual developers are welcome to make SINGA a
>>>>>>> diverse
>>>>>>> community that is robust and independent from any single developer.
>>>>>>>
>>>>>>> Inexperience with Open Source
>>>>>>>
>>>>>>> All the developers are active users and followers of open source
>>>>>>> projects. Our research lab has a strong commitment to open source,
>>>>>>> and
>>>>>>> has released the source code of several systems under open source
>>>>>>> license as a way of contributing back to the open source community.
>>>>>>> But we do not have much real experience in open source projects with
>>>>>>> large and well organized communities like those in Apache. This is
>>>>>>> one
>>>>>>> reason we choose Apache which is experienced in open source project
>>>>>>> incubation. We hope to get the help from Apache (e.g., champion and
>>>>>>> mentors) to establish a healthy path for SINGA.
>>>>>>>
>>>>>>> Homogenous Developers
>>>>>>>
>>>>>>> Although the current developers are researchers in the universities,
>>>>>>> they have different research interests and project experiences, as
>>>>>>> mentioned in the section that introduces the core developers. We know
>>>>>>> that a diverse community is helpful. Hence we are open to the idea of
>>>>>>> recruiting developers from other regions and organizations.
>>>>>>>
>>>>>>> Reliance on Salaried Developers
>>>>>>>
>>>>>>> As a research project in the university, SINGA's current developing
>>>>>>> community consists of professors, PhD students, research assistants
>>>>>>> and postdoctoral fellows. They are driven by their interests to work
>>>>>>> on this project and have contributed actively since the start of the
>>>>>>> project. The research assistants and fellows are expected to leave
>>>>>>> when their contracts expire. However, they are keen to continue to
>>>>>>> work on the project voluntarily. Moreover, as a long term research
>>>>>>> project, new research assistants and fellows are likely to join the
>>>>>>> project.
>>>>>>>
>>>>>>> A Excessive Fascination with the Apache Brand
>>>>>>>
>>>>>>> We choose Apache not for publicity. We have two purposes. First, we
>>>>>>> want to leverage Apache's reputation to recruit more developers to
>>>>>>> make a diverse community. Second, we hope that Apache can help us to
>>>>>>> establish a healthy path in developing SINGA. Beng Chin and Kian-Lee
>>>>>>> are established database and distributed system researchers, and
>>>>>>> together with the other contributors, they sincerely believe that
>>>>>>> there is a need for a widely accepted open source distributed deep
>>>>>>> learning platform. The field of deep learning is still at its
>>>>>>> infancy,
>>>>>>> and an open source platform will fuel the research in the area.
>>>>>>> Moreover, such a platform will enable researchers to develop new
>>>>>>> models and algorithms, rather than spending time implementing a deep
>>>>>>> learning system from scratch. Furthermore, the need for scalability
>>>>>>> for such a platform is obvious.
>>>>>>>
>>>>>>> Relationship with Other Apache Products
>>>>>>>
>>>>>>> Apache H2O implemented two simple deep learning models, namely the
>>>>>>> Multi-Layer Perceptron and Deep Auto-encoders. There are two
>>>>>>> significant differences between H2O and SINGA. First, H2O adopts the
>>>>>>> Map-Reduce framework which runs a set of computing nodes in parallel
>>>>>>> againsts of the training set. Model parameters trained by all
>>>>>>> computing nodes are averaged as the final model parameters. This
>>>>>>> training algorithm is different from the distributed training
>>>>>>> algorithm used by DistBelief, Adam and SINGA, which frequently
>>>>>>> synchronizes the parameters trained from different nodes. SINGA
>>>>>>> adopts
>>>>>>> the parameter server framework to support a wide range of distributed
>>>>>>> training algorithms and parallelization methods (e.g., data
>>>>>>> parallelism, model parallelism and hybrid parallelism. H2O only
>>>>>>> support data parallelism) . Second, in H2O, users are restricted to
>>>>>>> use the two built-in models. In SINGA, we provide simple programming
>>>>>>> model to let users implement their own deep learning models. A new
>>>>>>> deep learning model can be implemented by customizing the base Layer
>>>>>>> class for each layer involved in the model. It is similar to writing
>>>>>>> Hadoop programs where users only need to override the base Mapper and
>>>>>>> Reducer. We also provide built-in models for users to use directly.
>>>>>>>
>>>>>>> Documentation
>>>>>>>
>>>>>>> The project is hosted at
>>>>>>> http://www.comp.nus.edu.sg/~dbsystem/project/singa.html.
>>>>>>> Documentations can be found at the Github Wiki Page:
>>>>>>> https://github.com/nusinga/singa/wiki. We continue to refine and
>>>>>>> improve the documentation.
>>>>>>>
>>>>>>> Initial Source
>>>>>>>
>>>>>>> We use Github to maintain our source code,
>>> https://github.com/nusinga/singa
>>>>>>>
>>>>>>> Source and Intellectual Property Submission Plan
>>>>>>>
>>>>>>> We plan to make our code base be under Apache License, Version 2.0.
>>>>>>>
>>>>>>> External Dependencies
>>>>>>>
>>>>>>> required by the core code base: glog, gflags, google protobuf,
>>>>>>> open-blas, mpich, armci-mpi.
>>>>>>> required by data preparation and preprocessing: opencv, hdfs, python.
>>>>>>>
>>>>>>> Cryptography
>>>>>>>
>>>>>>> Not Applicable
>>>>>>>
>>>>>>> Required Resources
>>>>>>>
>>>>>>> Mailing Lists
>>>>>>>
>>>>>>> Currently, we use google group for internal discussion. The mailing
>>>>>>> address is nusinga@googlegroup.com. We will migrate the content to
>>>>>>> the
>>>>>>> apache mailing lists in the future.
>>>>>>>
>>>>>>> singa-dev
>>>>>>> singa-user
>>>>>>> singa-commits
>>>>>>> singa-private (for private discussion within PCM)
>>>>>>>
>>>>>>> Git Repository
>>>>>>>
>>>>>>> We want to continue using git for version control. Hence, a git repo
>>>>>>> is required.
>>>>>>>
>>>>>>> Issue Tracking
>>>>>>>
>>>>>>> JIRA Singa (SINGA)
>>>>>>>
>>>>>>> Initial Committers
>>>>>>>
>>>>>>> Beng Chin Ooi (ooibc @comp.nus.edu.sg)
>>>>>>> Kian Lee Tan (tankl @comp.nus.edu.sg)
>>>>>>> Gang Chen (cg @zju.edu.cn)
>>>>>>> Wei Wang (wangwei @comp.nus.edu.sg)
>>>>>>> Dinh Tien Tuan Anh (dinhtta @comp.nus.edu.sg)
>>>>>>> Jinyang Gao (jinyang.gao @comp.nus.edu.sg)
>>>>>>> Sheng Wang (wangsh @comp.nus.edu.sg)
>>>>>>> Kaiping Zheng (kaiping @comp.nus.edu.sg)
>>>>>>> Zhaojing Luo (zhaojing @comp.nus.edu.sg)
>>>>>>> Zhongle Xie (zhongle @comp.nus.edu.sg)
>>>>>>>
>>>>>>> Affiliations
>>>>>>>
>>>>>>> Beng Chin Ooi, National University of Singapore
>>>>>>> Kian Lee Tan, National University of Singapore
>>>>>>> Gang Chen, Zhejiang University
>>>>>>> Wei Wang, National University of Singapore
>>>>>>> Dinh Tien Tuan Anh, National University of Singapore
>>>>>>> Jinyang Gao, National University of Singapore
>>>>>>> Sheng Wang, National University of Singapore
>>>>>>> Kaiping Zheng, National University of Singapore
>>>>>>> Zhaojing Luo, National University of Singapore
>>>>>>> Zhongle Xie, National University of Singapore
>>>>>>>
>>>>>>> Sponsors
>>>>>>>
>>>>>>> Champion
>>>>>>>
>>>>>>> Thejas Nair (thejas at apache.org) - Hortonworks
>>>>>>>
>>>>>>> Nominated Mentors
>>>>>>>
>>>>>>> Thejas Nair (thejas at apache.org) - Hortonworks
>>>>>>> Alan Gates (gates at apache dot org) - Hortonworks
>>>>>>> (Seeking more volunteers!)
>>>>>>>
>>>>>>> Sponsoring Entity
>>>>>>>
>>>>>>> We are requesting the Incubator to sponsor this project.
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>>>>>> For additional commands, e-mail: general-help@incubator.apache.org
>>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>>>>> For additional commands, e-mail: general-help@incubator.apache.org
>>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>>> For additional commands, e-mail: general-help@incubator.apache.org
>>>>
>>>
>>>
>>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [Fwd: Re: [DISCUSS] [PROPOSAL] Singa for Apache Incubator]

Posted by Thejas Nair <th...@gmail.com>.
I have added Ted as a mentor, so we now have some diversity in mentor
affiliations. (Thanks Ted!)
I also reached out to few other people in mahout community who I
thought might be potentially interested, but I didn't hear from them.

I am planning to put this to a vote in 2 days. Meanwhile, please let
me know if anybody else would be willing to join as a mentor.

Thanks,
Thejas


On Fri, Feb 27, 2015 at 3:44 PM, Thejas Nair <th...@gmail.com> wrote:
> Thanks Ted. That helps a lot !
> I have also reached out to few other folks in Mahout community to see
> if they might also be interested.
>
>
> On Fri, Feb 27, 2015 at 8:06 AM, Ted Dunning <te...@gmail.com> wrote:
>> Thejas,
>>
>> Please add me as a mentor if it helps to have diversity.  I have enormous
>> trust based on previous experience with him that Alan Gates would act as a
>> highly impartial and effective mentor, but would be happy to help if there
>> is a concern that could be addressed by having another mentor from a
>> different company.
>>
>>
>>
>> On Thu, Feb 26, 2015 at 6:12 PM, Thejas Nair <th...@gmail.com> wrote:
>>
>>> The incubator proposal has been updated with the feedback so far.
>>> We have 3 mentors now, but I think it would be good to have additional
>>> mentors. Please let me know if anyone is able to help mentor this
>>> project.
>>>
>>> I am planning to start a vote on the proposal in a day or two.
>>>
>>>
>>> On Fri, Feb 6, 2015 at 5:21 PM,  <oo...@comp.nus.edu.sg> wrote:
>>> >
>>> > Regarding the number of users using this project -- at this moment, the
>>> > community is not big.  A few local start-ups have been trying to use it
>>> > (mainly due to announcement in our seminar list), eg. one is using it for
>>> > image recognition (given a phone snapped by a user, it wants to be return
>>> > the same the product, and a list of similar products, such as a luxury
>>> bag
>>> > on a passerby).  Researchers from outside of NUS may have been using it
>>> > since we published an application paper on cross domain/modal retrieval
>>> in
>>> > VLDB 2014.
>>> >
>>> > We have not announced the project to the outside community yet -- we
>>> would
>>> > announce it in dbworld etc in due course.
>>> >
>>> > Thanks and have a good weekend.
>>> >
>>> > regards
>>> > beng chin
>>> >
>>> >>
>>> >> Thanks for the comments and suggestions.
>>> >> With permission from Thejas, I would like to respond to point 2.
>>> >>
>>> >> We have a huge team down at NUS (National University of Singapore) --
>>> >> we have about seven database/data mining data professors (not including
>>> >> those in systems, networking, and machine learning).
>>> >> I myself have nine PhD students in a steady state, and I have a few
>>> large
>>> >> grants, with a total budget of about 15 million S$ (~12 million USD),
>>> that
>>> >> allows me to hire a number of research fellows and research assistants
>>> for
>>> >> the next few years.  In a constant state, I have about 20 people (PhD
>>> >> students/RA/RF) working with me alone.  Other professors have their own
>>> >> grants (unlike other countries, it is relatively easy to get large
>>> grants
>>> >> in Singapore; many overseas Universities, including UIUC, MIT, ETH etc
>>> >> have research labs funded by Singapore Research Foundation [equivalent
>>> of
>>> >> NSF]).
>>> >>
>>> >> SINGA is a long term project for us -- while it is a platform as it is,
>>> we
>>> >> are using it for healthcare predictive analytics (by working with a
>>> >> hospital associated with the University).  Therefore, we will be working
>>> >> on SINGA, not solely as a distributed DL platform, but as a tool that
>>> will
>>> >> enable us to do data analytics on some business domains (eg. healthcase,
>>> >> consumer etc)
>>> >>
>>> >> For the initial set of committers, three are tenured professors, five
>>> are
>>> >> students, with 2-5 years to go before they complete their PhD.  Quite
>>> >> often, some would stay back as a research fellow for a couple of years
>>> >> before they start looking for a job outside.  We will work with mentors
>>> >> and new developers (from outside of NUS or Zhejiang University) in
>>> >> enhancing the system.
>>> >>
>>> >> The project should survive in that sense.
>>> >>
>>> >> (I have an on-going project CIIDAA that has been around since 2008; it
>>> was
>>> >> started as another project, epiC,  with a different grant, and then we
>>> >> continue the development with a new grant for CIIDAA --
>>> >> http://www.comp.nus.edu.sg/~ciidaa/
>>> >> )
>>> >>
>>> >> Thanks.
>>> >>
>>> >> regards
>>> >> beng chin
>>> >> ps: i am not sure if my email will get through to the group.
>>> >>
>>> >>
>>> >> ---------------------------- Original Message
>>> ----------------------------
>>> >> Subject: Re: [DISCUSS] [PROPOSAL] Singa for Apache Incubator
>>> >> From:    "Henry Saputra" <he...@gmail.com>
>>> >> Date:    Thu, February 5, 2015 2:57 pm
>>> >> To:      "general@incubator.apache.org" <ge...@incubator.apache.org>
>>> >> Cc:      ooibc@comp.nus.edu.sg
>>> >>
>>> --------------------------------------------------------------------------
>>> >>
>>> >> Several comments:
>>> >> -) How many users already using this project? I would reccomend to
>>> >> drop request for singa-user list at the beginning.
>>> >> -) All the initial committers come from university and seemed like
>>> >> some of them already ready to leave university. I am not too sure if
>>> >> this project go survive if all of the inital committers are from
>>> >> university as students.
>>> >> -) Need to solicit more mentors if this project ever get to Apache
>>> >> incubator.
>>> >>
>>> >> - Henry
>>> >>
>>> >> On Tue, Feb 3, 2015 at 3:58 PM, Thejas Nair <th...@gmail.com>
>>> wrote:
>>> >>> The "Relationship with Other Apache Products" section has been
>>> >>> updated. The reference to H2O in that section has been removed, and
>>> >>> other projects have been added.
>>> >>>  Thanks for the feedback!
>>> >>>
>>> >>>
>>> >>> On Wed, Jan 28, 2015 at 10:27 AM, Thejas Nair <th...@gmail.com>
>>> >> wrote:
>>> >>>> Thanks for pointing that out Henry! Yes, looks like H20 is not an
>>> >>>> apache project, I should have verified that.
>>> >>>> I will edit that, and revisit that section along with the folks in
>>> >>>> Singa community.
>>> >>>>
>>> >>>>
>>> >>>> On Tue, Jan 27, 2015 at 6:55 PM, Henry Saputra
>>> >> <he...@gmail.com> wrote:
>>> >>>>> Quick immediate comment that "Apache H2O" is not really Apache
>>> >>>>> project.
>>> >>>>>
>>> >>>>> I assume you are referring to https://github.com/h2oai/h2o (or
>>> >>>>> https://github.com/h2oai/h2o-dev) ?
>>> >>>>>
>>> >>>>> - Henry
>>> >>>>>
>>> >>>>> On Tue, Jan 27, 2015 at 5:29 PM, Thejas Nair <th...@gmail.com>
>>> >> wrote:
>>> >>>>>> Hello everyone,
>>> >>>>>>
>>> >>>>>> I would like to propose the inclusion of Singa as an Apache
>>> Incubator
>>> >> project.
>>> >>>>>>
>>> >>>>>> Here is the proposal -
>>> >>>>>> https://wiki.apache.org/incubator/SingaProposal
>>> >>>>>>
>>> >>>>>> Please review the proposal and give feedback. I am planning to start
>>> >>>>>> a
>>> >>>>>> vote after 7 days if the proposal looks good.
>>> >>>>>> We are also seeking additional Apache mentors for the project.
>>> >>>>>>
>>> >>>>>> Thanks,
>>> >>>>>> Thejas
>>> >>>>>> ==========================================================
>>> >>>>>> Singa Incubator Proposal
>>> >>>>>>
>>> >>>>>> Abstract
>>> >>>>>>
>>> >>>>>> SINGA is a distributed deep learning platform.
>>> >>>>>>
>>> >>>>>> Proposal
>>> >>>>>>
>>> >>>>>> SINGA is an efficient, scalable and easy-to-use distributed platform
>>> >>>>>> for training deep learning models, e.g., Deep Convolutional Neural
>>> >>>>>> Network and Deep Belief Network. It parallelizes the computation
>>> >>>>>> (i.e., training) onto a cluster of nodes by distributing the
>>> training
>>> >>>>>> data and model automatically to speed up the training. Built-in
>>> >>>>>> training algorithms like Back-Propagation and Contrastive Divergence
>>> >>>>>> are implemented based on common abstractions of deep learning
>>> models.
>>> >>>>>> Users can train their own deep learning models by simply customizing
>>> >>>>>> these abstractions like implementing the Mapper and Reducer in
>>> >>>>>> Hadoop.
>>> >>>>>>
>>> >>>>>> Background
>>> >>>>>>
>>> >>>>>> Deep learning refers to a set of feature (or representation)
>>> learning
>>> >>>>>> models that consist of multiple (non-linear) layers, where different
>>> >>>>>> layers learn different levels of abstractions (representations) of
>>> >>>>>> the
>>> >>>>>> raw input data. Larger (in terms of model parameters) and deeper (in
>>> >>>>>> terms of number of layers) models have shown better performance,
>>> >>>>>> e.g.,
>>> >>>>>> lower image classification error in Large Scale Visual Recognition
>>> >>>>>> Challenge. However, a larger model requires more memory and larger
>>> >>>>>> training data to reduce over-fitting. Complex numeric operations
>>> make
>>> >>>>>> the training computation intensive. In practice, training large deep
>>> >>>>>> learning models takes weeks or months on a single node (even with
>>> >>>>>> GPU).
>>> >>>>>>
>>> >>>>>> Rational
>>> >>>>>>
>>> >>>>>> Deep learning has gained a lot of attraction in both academia and
>>> >>>>>> industry due to its success in a wide range of areas such as
>>> computer
>>> >>>>>> vision and speech recognition. However, training of such models is
>>> >>>>>> computationally expensive, especially for large and deep models
>>> >>>>>> (e.g.,
>>> >>>>>> with billions of parameters and more than 10 layers). Both Google
>>> and
>>> >>>>>> Microsoft have developed distributed deep learning systems to make
>>> >>>>>> the
>>> >>>>>> training more efficient by distributing the computations within a
>>> >>>>>> cluster of nodes. However, these systems are closed source
>>> softwares.
>>> >>>>>> Our goal is to leverage the community of open source developers to
>>> >>>>>> make SINGA efficient, scalable and easy to use. SINGA is a full
>>> >>>>>> fledged distributed platform, that could benefit the community and
>>> >>>>>> also benefit from the community in their involvement in contributing
>>> >>>>>> to the further work in this area. We believe the nature of SINGA and
>>> >>>>>> our visions for the system fit naturally to Apache's philosophy and
>>> >>>>>> development framework.
>>> >>>>>>
>>> >>>>>> Initial Goals
>>> >>>>>>
>>> >>>>>> We have developed a system for SINGA running on a commodity computer
>>> >>>>>> cluster. The initial goals include, * improving the system in terms
>>> >>>>>> of
>>> >>>>>> scalability and efficiency, e.g., using Infiniband for network
>>> >>>>>> communication and multi-threading for one node computation. We would
>>> >>>>>> consider extending SINGA to GPU clusters later. * benchmarking with
>>> >>>>>> larger datasets (hundreds of millions of training instances) and
>>> >>>>>> models (billions of parameters). * adding more built-in deep
>>> learning
>>> >>>>>> models. Users can train the built-in models on their datasets
>>> >>>>>> directly.
>>> >>>>>>
>>> >>>>>> Current Status
>>> >>>>>>
>>> >>>>>> Meritocracy
>>> >>>>>>
>>> >>>>>> We would like to follow ASF meritocratic principles to encourage
>>> more
>>> >>>>>> developers to contribute in this project. We know that only active
>>> >>>>>> and
>>> >>>>>> excellent developers can make SINGA a successful project. The
>>> >>>>>> committer list and PMC will be updated based on developers'
>>> >>>>>> performance and commitment. We are also improving the documentation
>>> >>>>>> and code to help new developers get started quickly.
>>> >>>>>>
>>> >>>>>> Community
>>> >>>>>>
>>> >>>>>> SINGA is currently being developed in the Database System Research
>>> >>>>>> Lab
>>> >>>>>> at the National University of Singapore (NUS) in collaboration with
>>> >>>>>> Zhejiang University in China. Our lab has extensive experience in
>>> >>>>>> building database related systems, including distributed systems.
>>> Six
>>> >>>>>> PhD students and research assistants (Jinyang Gao, Kaiping Zheng,
>>> >>>>>> Sheng Wang, Wei Wang, Zhaojing Luo and Zhongle Xie) , a research
>>> >>>>>> fellow (Anh Dinh) and three professors (Beng Chin Ooi, Gang Chen,
>>> >>>>>> Kian
>>> >>>>>> Lee Tan) have been working for a year on this project. We are open
>>> to
>>> >>>>>> recruiting more developers from diverse backgrounds.
>>> >>>>>>
>>> >>>>>> Core Developers
>>> >>>>>>
>>> >>>>>> Beng Chin Ooi, Gang Chen and Kian Lee Tan are professors who have
>>> >>>>>> worked on distributed systems for more than 20 years. They have
>>> >>>>>> collaborated with the industry and have built various large scale
>>> >>>>>> systems. Anh Dinh's research is also on distributed systems, albeit
>>> >>>>>> with more focus on security aspects. Wei Wang's research is on deep
>>> >>>>>> learning problems including deep learning applications and large
>>> >>>>>> scale
>>> >>>>>> training. Sheng Wang and Jinyang are working on efficient indexing,
>>> >>>>>> querying of large scale data and machine learning. Kaiping, Zhaojing
>>> >>>>>> and Zhongle are new PhD students who jointed SINGA recently. They
>>> >>>>>> will
>>> >>>>>> work on this project for a longer time (next 4-5 years). While we
>>> >>>>>> share common research interests, each member also brings diverse
>>> >>>>>> expertise to the team.
>>> >>>>>>
>>> >>>>>> Alignment
>>> >>>>>>
>>> >>>>>> ASF is already the home of many distributed platforms, e.g., Hadoop,
>>> >>>>>> Spark and Mahout, each of which targets a different application
>>> >>>>>> domain. SINGA, being a distributed platform for large-scale deep
>>> >>>>>> learning, focuses on another important domain for which there still
>>> >>>>>> lacks a robust and scalable open-source platform. The recent success
>>> >>>>>> of deep learning models especially for vision and speech recognition
>>> >>>>>> tasks has generated interests in both applying existing deep
>>> learning
>>> >>>>>> models and in developing new ones. Thus, an open-source platform for
>>> >>>>>> deep learning will be able to attract a large community of users and
>>> >>>>>> developers. SINGA is a complex system needing many iterations of
>>> >>>>>> design, implementation and testing. Apache's collaboration framework
>>> >>>>>> which encourages active contribution from developers will inevitably
>>> >>>>>> help improve the quality of the system, as shown in the success of
>>> >>>>>> Hadoop, Spark, etc.. Equally important is the community of users
>>> >>>>>> which
>>> >>>>>> helps identify real-life applications of deep learning, and helps to
>>> >>>>>> evaluate the system's performance and ease-of-use. We hope to
>>> >>>>>> leverage
>>> >>>>>> ASF for coordinating and promoting both communities, and in return
>>> >>>>>> benefit the communities with another useful tool.
>>> >>>>>>
>>> >>>>>> Known Risks
>>> >>>>>>
>>> >>>>>> Orphaned products
>>> >>>>>>
>>> >>>>>> Four core developers (Anh, Wei Wang, Jinyang and Sheng Wang) may
>>> >>>>>> leave
>>> >>>>>> the lab in two to four years time. It is possible that some of them
>>> >>>>>> may not have enough time to focus on this project after that. But,
>>> >>>>>> SINGA is part of our other bigger research projects on building an
>>> >>>>>> infrastructure for data intensive applications, which include
>>> >>>>>> health-care analytics and brain-inspired computing. Beng Chin and
>>> >>>>>> Kian
>>> >>>>>> Lee would continue working on it and getting more people involved.
>>> >>>>>> For
>>> >>>>>> example, three new developers (Kaiping, Zhaojing and Zhongle) joined
>>> >>>>>> us recently. Individual developers are welcome to make SINGA a
>>> >>>>>> diverse
>>> >>>>>> community that is robust and independent from any single developer.
>>> >>>>>>
>>> >>>>>> Inexperience with Open Source
>>> >>>>>>
>>> >>>>>> All the developers are active users and followers of open source
>>> >>>>>> projects. Our research lab has a strong commitment to open source,
>>> >>>>>> and
>>> >>>>>> has released the source code of several systems under open source
>>> >>>>>> license as a way of contributing back to the open source community.
>>> >>>>>> But we do not have much real experience in open source projects with
>>> >>>>>> large and well organized communities like those in Apache. This is
>>> >>>>>> one
>>> >>>>>> reason we choose Apache which is experienced in open source project
>>> >>>>>> incubation. We hope to get the help from Apache (e.g., champion and
>>> >>>>>> mentors) to establish a healthy path for SINGA.
>>> >>>>>>
>>> >>>>>> Homogenous Developers
>>> >>>>>>
>>> >>>>>> Although the current developers are researchers in the universities,
>>> >>>>>> they have different research interests and project experiences, as
>>> >>>>>> mentioned in the section that introduces the core developers. We
>>> know
>>> >>>>>> that a diverse community is helpful. Hence we are open to the idea
>>> of
>>> >>>>>> recruiting developers from other regions and organizations.
>>> >>>>>>
>>> >>>>>> Reliance on Salaried Developers
>>> >>>>>>
>>> >>>>>> As a research project in the university, SINGA's current developing
>>> >>>>>> community consists of professors, PhD students, research assistants
>>> >>>>>> and postdoctoral fellows. They are driven by their interests to work
>>> >>>>>> on this project and have contributed actively since the start of the
>>> >>>>>> project. The research assistants and fellows are expected to leave
>>> >>>>>> when their contracts expire. However, they are keen to continue to
>>> >>>>>> work on the project voluntarily. Moreover, as a long term research
>>> >>>>>> project, new research assistants and fellows are likely to join the
>>> >>>>>> project.
>>> >>>>>>
>>> >>>>>> A Excessive Fascination with the Apache Brand
>>> >>>>>>
>>> >>>>>> We choose Apache not for publicity. We have two purposes. First, we
>>> >>>>>> want to leverage Apache's reputation to recruit more developers to
>>> >>>>>> make a diverse community. Second, we hope that Apache can help us to
>>> >>>>>> establish a healthy path in developing SINGA. Beng Chin and Kian-Lee
>>> >>>>>> are established database and distributed system researchers, and
>>> >>>>>> together with the other contributors, they sincerely believe that
>>> >>>>>> there is a need for a widely accepted open source distributed deep
>>> >>>>>> learning platform. The field of deep learning is still at its
>>> >>>>>> infancy,
>>> >>>>>> and an open source platform will fuel the research in the area.
>>> >>>>>> Moreover, such a platform will enable researchers to develop new
>>> >>>>>> models and algorithms, rather than spending time implementing a deep
>>> >>>>>> learning system from scratch. Furthermore, the need for scalability
>>> >>>>>> for such a platform is obvious.
>>> >>>>>>
>>> >>>>>> Relationship with Other Apache Products
>>> >>>>>>
>>> >>>>>> Apache H2O implemented two simple deep learning models, namely the
>>> >>>>>> Multi-Layer Perceptron and Deep Auto-encoders. There are two
>>> >>>>>> significant differences between H2O and SINGA. First, H2O adopts the
>>> >>>>>> Map-Reduce framework which runs a set of computing nodes in parallel
>>> >>>>>> againsts of the training set. Model parameters trained by all
>>> >>>>>> computing nodes are averaged as the final model parameters. This
>>> >>>>>> training algorithm is different from the distributed training
>>> >>>>>> algorithm used by DistBelief, Adam and SINGA, which frequently
>>> >>>>>> synchronizes the parameters trained from different nodes. SINGA
>>> >>>>>> adopts
>>> >>>>>> the parameter server framework to support a wide range of
>>> distributed
>>> >>>>>> training algorithms and parallelization methods (e.g., data
>>> >>>>>> parallelism, model parallelism and hybrid parallelism. H2O only
>>> >>>>>> support data parallelism) . Second, in H2O, users are restricted to
>>> >>>>>> use the two built-in models. In SINGA, we provide simple programming
>>> >>>>>> model to let users implement their own deep learning models. A new
>>> >>>>>> deep learning model can be implemented by customizing the base Layer
>>> >>>>>> class for each layer involved in the model. It is similar to writing
>>> >>>>>> Hadoop programs where users only need to override the base Mapper
>>> and
>>> >>>>>> Reducer. We also provide built-in models for users to use directly.
>>> >>>>>>
>>> >>>>>> Documentation
>>> >>>>>>
>>> >>>>>> The project is hosted at
>>> >>>>>> http://www.comp.nus.edu.sg/~dbsystem/project/singa.html.
>>> >>>>>> Documentations can be found at the Github Wiki Page:
>>> >>>>>> https://github.com/nusinga/singa/wiki. We continue to refine and
>>> >>>>>> improve the documentation.
>>> >>>>>>
>>> >>>>>> Initial Source
>>> >>>>>>
>>> >>>>>> We use Github to maintain our source code,
>>> >> https://github.com/nusinga/singa
>>> >>>>>>
>>> >>>>>> Source and Intellectual Property Submission Plan
>>> >>>>>>
>>> >>>>>> We plan to make our code base be under Apache License, Version 2.0.
>>> >>>>>>
>>> >>>>>> External Dependencies
>>> >>>>>>
>>> >>>>>> required by the core code base: glog, gflags, google protobuf,
>>> >>>>>> open-blas, mpich, armci-mpi.
>>> >>>>>> required by data preparation and preprocessing: opencv, hdfs,
>>> python.
>>> >>>>>>
>>> >>>>>> Cryptography
>>> >>>>>>
>>> >>>>>> Not Applicable
>>> >>>>>>
>>> >>>>>> Required Resources
>>> >>>>>>
>>> >>>>>> Mailing Lists
>>> >>>>>>
>>> >>>>>> Currently, we use google group for internal discussion. The mailing
>>> >>>>>> address is nusinga@googlegroup.com. We will migrate the content to
>>> >>>>>> the
>>> >>>>>> apache mailing lists in the future.
>>> >>>>>>
>>> >>>>>> singa-dev
>>> >>>>>> singa-user
>>> >>>>>> singa-commits
>>> >>>>>> singa-private (for private discussion within PCM)
>>> >>>>>>
>>> >>>>>> Git Repository
>>> >>>>>>
>>> >>>>>> We want to continue using git for version control. Hence, a git repo
>>> >>>>>> is required.
>>> >>>>>>
>>> >>>>>> Issue Tracking
>>> >>>>>>
>>> >>>>>> JIRA Singa (SINGA)
>>> >>>>>>
>>> >>>>>> Initial Committers
>>> >>>>>>
>>> >>>>>> Beng Chin Ooi (ooibc @comp.nus.edu.sg)
>>> >>>>>> Kian Lee Tan (tankl @comp.nus.edu.sg)
>>> >>>>>> Gang Chen (cg @zju.edu.cn)
>>> >>>>>> Wei Wang (wangwei @comp.nus.edu.sg)
>>> >>>>>> Dinh Tien Tuan Anh (dinhtta @comp.nus.edu.sg)
>>> >>>>>> Jinyang Gao (jinyang.gao @comp.nus.edu.sg)
>>> >>>>>> Sheng Wang (wangsh @comp.nus.edu.sg)
>>> >>>>>> Kaiping Zheng (kaiping @comp.nus.edu.sg)
>>> >>>>>> Zhaojing Luo (zhaojing @comp.nus.edu.sg)
>>> >>>>>> Zhongle Xie (zhongle @comp.nus.edu.sg)
>>> >>>>>>
>>> >>>>>> Affiliations
>>> >>>>>>
>>> >>>>>> Beng Chin Ooi, National University of Singapore
>>> >>>>>> Kian Lee Tan, National University of Singapore
>>> >>>>>> Gang Chen, Zhejiang University
>>> >>>>>> Wei Wang, National University of Singapore
>>> >>>>>> Dinh Tien Tuan Anh, National University of Singapore
>>> >>>>>> Jinyang Gao, National University of Singapore
>>> >>>>>> Sheng Wang, National University of Singapore
>>> >>>>>> Kaiping Zheng, National University of Singapore
>>> >>>>>> Zhaojing Luo, National University of Singapore
>>> >>>>>> Zhongle Xie, National University of Singapore
>>> >>>>>>
>>> >>>>>> Sponsors
>>> >>>>>>
>>> >>>>>> Champion
>>> >>>>>>
>>> >>>>>> Thejas Nair (thejas at apache.org) - Hortonworks
>>> >>>>>>
>>> >>>>>> Nominated Mentors
>>> >>>>>>
>>> >>>>>> Thejas Nair (thejas at apache.org) - Hortonworks
>>> >>>>>> Alan Gates (gates at apache dot org) - Hortonworks
>>> >>>>>> (Seeking more volunteers!)
>>> >>>>>>
>>> >>>>>> Sponsoring Entity
>>> >>>>>>
>>> >>>>>> We are requesting the Incubator to sponsor this project.
>>> >>>>>>
>>> >>>>>>
>>> ---------------------------------------------------------------------
>>> >>>>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>> >>>>>> For additional commands, e-mail: general-help@incubator.apache.org
>>> >>>>>>
>>> >>>>>
>>> >>>>> ---------------------------------------------------------------------
>>> >>>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>> >>>>> For additional commands, e-mail: general-help@incubator.apache.org
>>> >>>>>
>>> >>>
>>> >>> ---------------------------------------------------------------------
>>> >>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>> >>> For additional commands, e-mail: general-help@incubator.apache.org
>>> >>>
>>> >>
>>> >>
>>> >>
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>> For additional commands, e-mail: general-help@incubator.apache.org
>>>
>>>

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [Fwd: Re: [DISCUSS] [PROPOSAL] Singa for Apache Incubator]

Posted by Thejas Nair <th...@gmail.com>.
Thanks Ted. That helps a lot !
I have also reached out to few other folks in Mahout community to see
if they might also be interested.


On Fri, Feb 27, 2015 at 8:06 AM, Ted Dunning <te...@gmail.com> wrote:
> Thejas,
>
> Please add me as a mentor if it helps to have diversity.  I have enormous
> trust based on previous experience with him that Alan Gates would act as a
> highly impartial and effective mentor, but would be happy to help if there
> is a concern that could be addressed by having another mentor from a
> different company.
>
>
>
> On Thu, Feb 26, 2015 at 6:12 PM, Thejas Nair <th...@gmail.com> wrote:
>
>> The incubator proposal has been updated with the feedback so far.
>> We have 3 mentors now, but I think it would be good to have additional
>> mentors. Please let me know if anyone is able to help mentor this
>> project.
>>
>> I am planning to start a vote on the proposal in a day or two.
>>
>>
>> On Fri, Feb 6, 2015 at 5:21 PM,  <oo...@comp.nus.edu.sg> wrote:
>> >
>> > Regarding the number of users using this project -- at this moment, the
>> > community is not big.  A few local start-ups have been trying to use it
>> > (mainly due to announcement in our seminar list), eg. one is using it for
>> > image recognition (given a phone snapped by a user, it wants to be return
>> > the same the product, and a list of similar products, such as a luxury
>> bag
>> > on a passerby).  Researchers from outside of NUS may have been using it
>> > since we published an application paper on cross domain/modal retrieval
>> in
>> > VLDB 2014.
>> >
>> > We have not announced the project to the outside community yet -- we
>> would
>> > announce it in dbworld etc in due course.
>> >
>> > Thanks and have a good weekend.
>> >
>> > regards
>> > beng chin
>> >
>> >>
>> >> Thanks for the comments and suggestions.
>> >> With permission from Thejas, I would like to respond to point 2.
>> >>
>> >> We have a huge team down at NUS (National University of Singapore) --
>> >> we have about seven database/data mining data professors (not including
>> >> those in systems, networking, and machine learning).
>> >> I myself have nine PhD students in a steady state, and I have a few
>> large
>> >> grants, with a total budget of about 15 million S$ (~12 million USD),
>> that
>> >> allows me to hire a number of research fellows and research assistants
>> for
>> >> the next few years.  In a constant state, I have about 20 people (PhD
>> >> students/RA/RF) working with me alone.  Other professors have their own
>> >> grants (unlike other countries, it is relatively easy to get large
>> grants
>> >> in Singapore; many overseas Universities, including UIUC, MIT, ETH etc
>> >> have research labs funded by Singapore Research Foundation [equivalent
>> of
>> >> NSF]).
>> >>
>> >> SINGA is a long term project for us -- while it is a platform as it is,
>> we
>> >> are using it for healthcare predictive analytics (by working with a
>> >> hospital associated with the University).  Therefore, we will be working
>> >> on SINGA, not solely as a distributed DL platform, but as a tool that
>> will
>> >> enable us to do data analytics on some business domains (eg. healthcase,
>> >> consumer etc)
>> >>
>> >> For the initial set of committers, three are tenured professors, five
>> are
>> >> students, with 2-5 years to go before they complete their PhD.  Quite
>> >> often, some would stay back as a research fellow for a couple of years
>> >> before they start looking for a job outside.  We will work with mentors
>> >> and new developers (from outside of NUS or Zhejiang University) in
>> >> enhancing the system.
>> >>
>> >> The project should survive in that sense.
>> >>
>> >> (I have an on-going project CIIDAA that has been around since 2008; it
>> was
>> >> started as another project, epiC,  with a different grant, and then we
>> >> continue the development with a new grant for CIIDAA --
>> >> http://www.comp.nus.edu.sg/~ciidaa/
>> >> )
>> >>
>> >> Thanks.
>> >>
>> >> regards
>> >> beng chin
>> >> ps: i am not sure if my email will get through to the group.
>> >>
>> >>
>> >> ---------------------------- Original Message
>> ----------------------------
>> >> Subject: Re: [DISCUSS] [PROPOSAL] Singa for Apache Incubator
>> >> From:    "Henry Saputra" <he...@gmail.com>
>> >> Date:    Thu, February 5, 2015 2:57 pm
>> >> To:      "general@incubator.apache.org" <ge...@incubator.apache.org>
>> >> Cc:      ooibc@comp.nus.edu.sg
>> >>
>> --------------------------------------------------------------------------
>> >>
>> >> Several comments:
>> >> -) How many users already using this project? I would reccomend to
>> >> drop request for singa-user list at the beginning.
>> >> -) All the initial committers come from university and seemed like
>> >> some of them already ready to leave university. I am not too sure if
>> >> this project go survive if all of the inital committers are from
>> >> university as students.
>> >> -) Need to solicit more mentors if this project ever get to Apache
>> >> incubator.
>> >>
>> >> - Henry
>> >>
>> >> On Tue, Feb 3, 2015 at 3:58 PM, Thejas Nair <th...@gmail.com>
>> wrote:
>> >>> The "Relationship with Other Apache Products" section has been
>> >>> updated. The reference to H2O in that section has been removed, and
>> >>> other projects have been added.
>> >>>  Thanks for the feedback!
>> >>>
>> >>>
>> >>> On Wed, Jan 28, 2015 at 10:27 AM, Thejas Nair <th...@gmail.com>
>> >> wrote:
>> >>>> Thanks for pointing that out Henry! Yes, looks like H20 is not an
>> >>>> apache project, I should have verified that.
>> >>>> I will edit that, and revisit that section along with the folks in
>> >>>> Singa community.
>> >>>>
>> >>>>
>> >>>> On Tue, Jan 27, 2015 at 6:55 PM, Henry Saputra
>> >> <he...@gmail.com> wrote:
>> >>>>> Quick immediate comment that "Apache H2O" is not really Apache
>> >>>>> project.
>> >>>>>
>> >>>>> I assume you are referring to https://github.com/h2oai/h2o (or
>> >>>>> https://github.com/h2oai/h2o-dev) ?
>> >>>>>
>> >>>>> - Henry
>> >>>>>
>> >>>>> On Tue, Jan 27, 2015 at 5:29 PM, Thejas Nair <th...@gmail.com>
>> >> wrote:
>> >>>>>> Hello everyone,
>> >>>>>>
>> >>>>>> I would like to propose the inclusion of Singa as an Apache
>> Incubator
>> >> project.
>> >>>>>>
>> >>>>>> Here is the proposal -
>> >>>>>> https://wiki.apache.org/incubator/SingaProposal
>> >>>>>>
>> >>>>>> Please review the proposal and give feedback. I am planning to start
>> >>>>>> a
>> >>>>>> vote after 7 days if the proposal looks good.
>> >>>>>> We are also seeking additional Apache mentors for the project.
>> >>>>>>
>> >>>>>> Thanks,
>> >>>>>> Thejas
>> >>>>>> ==========================================================
>> >>>>>> Singa Incubator Proposal
>> >>>>>>
>> >>>>>> Abstract
>> >>>>>>
>> >>>>>> SINGA is a distributed deep learning platform.
>> >>>>>>
>> >>>>>> Proposal
>> >>>>>>
>> >>>>>> SINGA is an efficient, scalable and easy-to-use distributed platform
>> >>>>>> for training deep learning models, e.g., Deep Convolutional Neural
>> >>>>>> Network and Deep Belief Network. It parallelizes the computation
>> >>>>>> (i.e., training) onto a cluster of nodes by distributing the
>> training
>> >>>>>> data and model automatically to speed up the training. Built-in
>> >>>>>> training algorithms like Back-Propagation and Contrastive Divergence
>> >>>>>> are implemented based on common abstractions of deep learning
>> models.
>> >>>>>> Users can train their own deep learning models by simply customizing
>> >>>>>> these abstractions like implementing the Mapper and Reducer in
>> >>>>>> Hadoop.
>> >>>>>>
>> >>>>>> Background
>> >>>>>>
>> >>>>>> Deep learning refers to a set of feature (or representation)
>> learning
>> >>>>>> models that consist of multiple (non-linear) layers, where different
>> >>>>>> layers learn different levels of abstractions (representations) of
>> >>>>>> the
>> >>>>>> raw input data. Larger (in terms of model parameters) and deeper (in
>> >>>>>> terms of number of layers) models have shown better performance,
>> >>>>>> e.g.,
>> >>>>>> lower image classification error in Large Scale Visual Recognition
>> >>>>>> Challenge. However, a larger model requires more memory and larger
>> >>>>>> training data to reduce over-fitting. Complex numeric operations
>> make
>> >>>>>> the training computation intensive. In practice, training large deep
>> >>>>>> learning models takes weeks or months on a single node (even with
>> >>>>>> GPU).
>> >>>>>>
>> >>>>>> Rational
>> >>>>>>
>> >>>>>> Deep learning has gained a lot of attraction in both academia and
>> >>>>>> industry due to its success in a wide range of areas such as
>> computer
>> >>>>>> vision and speech recognition. However, training of such models is
>> >>>>>> computationally expensive, especially for large and deep models
>> >>>>>> (e.g.,
>> >>>>>> with billions of parameters and more than 10 layers). Both Google
>> and
>> >>>>>> Microsoft have developed distributed deep learning systems to make
>> >>>>>> the
>> >>>>>> training more efficient by distributing the computations within a
>> >>>>>> cluster of nodes. However, these systems are closed source
>> softwares.
>> >>>>>> Our goal is to leverage the community of open source developers to
>> >>>>>> make SINGA efficient, scalable and easy to use. SINGA is a full
>> >>>>>> fledged distributed platform, that could benefit the community and
>> >>>>>> also benefit from the community in their involvement in contributing
>> >>>>>> to the further work in this area. We believe the nature of SINGA and
>> >>>>>> our visions for the system fit naturally to Apache's philosophy and
>> >>>>>> development framework.
>> >>>>>>
>> >>>>>> Initial Goals
>> >>>>>>
>> >>>>>> We have developed a system for SINGA running on a commodity computer
>> >>>>>> cluster. The initial goals include, * improving the system in terms
>> >>>>>> of
>> >>>>>> scalability and efficiency, e.g., using Infiniband for network
>> >>>>>> communication and multi-threading for one node computation. We would
>> >>>>>> consider extending SINGA to GPU clusters later. * benchmarking with
>> >>>>>> larger datasets (hundreds of millions of training instances) and
>> >>>>>> models (billions of parameters). * adding more built-in deep
>> learning
>> >>>>>> models. Users can train the built-in models on their datasets
>> >>>>>> directly.
>> >>>>>>
>> >>>>>> Current Status
>> >>>>>>
>> >>>>>> Meritocracy
>> >>>>>>
>> >>>>>> We would like to follow ASF meritocratic principles to encourage
>> more
>> >>>>>> developers to contribute in this project. We know that only active
>> >>>>>> and
>> >>>>>> excellent developers can make SINGA a successful project. The
>> >>>>>> committer list and PMC will be updated based on developers'
>> >>>>>> performance and commitment. We are also improving the documentation
>> >>>>>> and code to help new developers get started quickly.
>> >>>>>>
>> >>>>>> Community
>> >>>>>>
>> >>>>>> SINGA is currently being developed in the Database System Research
>> >>>>>> Lab
>> >>>>>> at the National University of Singapore (NUS) in collaboration with
>> >>>>>> Zhejiang University in China. Our lab has extensive experience in
>> >>>>>> building database related systems, including distributed systems.
>> Six
>> >>>>>> PhD students and research assistants (Jinyang Gao, Kaiping Zheng,
>> >>>>>> Sheng Wang, Wei Wang, Zhaojing Luo and Zhongle Xie) , a research
>> >>>>>> fellow (Anh Dinh) and three professors (Beng Chin Ooi, Gang Chen,
>> >>>>>> Kian
>> >>>>>> Lee Tan) have been working for a year on this project. We are open
>> to
>> >>>>>> recruiting more developers from diverse backgrounds.
>> >>>>>>
>> >>>>>> Core Developers
>> >>>>>>
>> >>>>>> Beng Chin Ooi, Gang Chen and Kian Lee Tan are professors who have
>> >>>>>> worked on distributed systems for more than 20 years. They have
>> >>>>>> collaborated with the industry and have built various large scale
>> >>>>>> systems. Anh Dinh's research is also on distributed systems, albeit
>> >>>>>> with more focus on security aspects. Wei Wang's research is on deep
>> >>>>>> learning problems including deep learning applications and large
>> >>>>>> scale
>> >>>>>> training. Sheng Wang and Jinyang are working on efficient indexing,
>> >>>>>> querying of large scale data and machine learning. Kaiping, Zhaojing
>> >>>>>> and Zhongle are new PhD students who jointed SINGA recently. They
>> >>>>>> will
>> >>>>>> work on this project for a longer time (next 4-5 years). While we
>> >>>>>> share common research interests, each member also brings diverse
>> >>>>>> expertise to the team.
>> >>>>>>
>> >>>>>> Alignment
>> >>>>>>
>> >>>>>> ASF is already the home of many distributed platforms, e.g., Hadoop,
>> >>>>>> Spark and Mahout, each of which targets a different application
>> >>>>>> domain. SINGA, being a distributed platform for large-scale deep
>> >>>>>> learning, focuses on another important domain for which there still
>> >>>>>> lacks a robust and scalable open-source platform. The recent success
>> >>>>>> of deep learning models especially for vision and speech recognition
>> >>>>>> tasks has generated interests in both applying existing deep
>> learning
>> >>>>>> models and in developing new ones. Thus, an open-source platform for
>> >>>>>> deep learning will be able to attract a large community of users and
>> >>>>>> developers. SINGA is a complex system needing many iterations of
>> >>>>>> design, implementation and testing. Apache's collaboration framework
>> >>>>>> which encourages active contribution from developers will inevitably
>> >>>>>> help improve the quality of the system, as shown in the success of
>> >>>>>> Hadoop, Spark, etc.. Equally important is the community of users
>> >>>>>> which
>> >>>>>> helps identify real-life applications of deep learning, and helps to
>> >>>>>> evaluate the system's performance and ease-of-use. We hope to
>> >>>>>> leverage
>> >>>>>> ASF for coordinating and promoting both communities, and in return
>> >>>>>> benefit the communities with another useful tool.
>> >>>>>>
>> >>>>>> Known Risks
>> >>>>>>
>> >>>>>> Orphaned products
>> >>>>>>
>> >>>>>> Four core developers (Anh, Wei Wang, Jinyang and Sheng Wang) may
>> >>>>>> leave
>> >>>>>> the lab in two to four years time. It is possible that some of them
>> >>>>>> may not have enough time to focus on this project after that. But,
>> >>>>>> SINGA is part of our other bigger research projects on building an
>> >>>>>> infrastructure for data intensive applications, which include
>> >>>>>> health-care analytics and brain-inspired computing. Beng Chin and
>> >>>>>> Kian
>> >>>>>> Lee would continue working on it and getting more people involved.
>> >>>>>> For
>> >>>>>> example, three new developers (Kaiping, Zhaojing and Zhongle) joined
>> >>>>>> us recently. Individual developers are welcome to make SINGA a
>> >>>>>> diverse
>> >>>>>> community that is robust and independent from any single developer.
>> >>>>>>
>> >>>>>> Inexperience with Open Source
>> >>>>>>
>> >>>>>> All the developers are active users and followers of open source
>> >>>>>> projects. Our research lab has a strong commitment to open source,
>> >>>>>> and
>> >>>>>> has released the source code of several systems under open source
>> >>>>>> license as a way of contributing back to the open source community.
>> >>>>>> But we do not have much real experience in open source projects with
>> >>>>>> large and well organized communities like those in Apache. This is
>> >>>>>> one
>> >>>>>> reason we choose Apache which is experienced in open source project
>> >>>>>> incubation. We hope to get the help from Apache (e.g., champion and
>> >>>>>> mentors) to establish a healthy path for SINGA.
>> >>>>>>
>> >>>>>> Homogenous Developers
>> >>>>>>
>> >>>>>> Although the current developers are researchers in the universities,
>> >>>>>> they have different research interests and project experiences, as
>> >>>>>> mentioned in the section that introduces the core developers. We
>> know
>> >>>>>> that a diverse community is helpful. Hence we are open to the idea
>> of
>> >>>>>> recruiting developers from other regions and organizations.
>> >>>>>>
>> >>>>>> Reliance on Salaried Developers
>> >>>>>>
>> >>>>>> As a research project in the university, SINGA's current developing
>> >>>>>> community consists of professors, PhD students, research assistants
>> >>>>>> and postdoctoral fellows. They are driven by their interests to work
>> >>>>>> on this project and have contributed actively since the start of the
>> >>>>>> project. The research assistants and fellows are expected to leave
>> >>>>>> when their contracts expire. However, they are keen to continue to
>> >>>>>> work on the project voluntarily. Moreover, as a long term research
>> >>>>>> project, new research assistants and fellows are likely to join the
>> >>>>>> project.
>> >>>>>>
>> >>>>>> A Excessive Fascination with the Apache Brand
>> >>>>>>
>> >>>>>> We choose Apache not for publicity. We have two purposes. First, we
>> >>>>>> want to leverage Apache's reputation to recruit more developers to
>> >>>>>> make a diverse community. Second, we hope that Apache can help us to
>> >>>>>> establish a healthy path in developing SINGA. Beng Chin and Kian-Lee
>> >>>>>> are established database and distributed system researchers, and
>> >>>>>> together with the other contributors, they sincerely believe that
>> >>>>>> there is a need for a widely accepted open source distributed deep
>> >>>>>> learning platform. The field of deep learning is still at its
>> >>>>>> infancy,
>> >>>>>> and an open source platform will fuel the research in the area.
>> >>>>>> Moreover, such a platform will enable researchers to develop new
>> >>>>>> models and algorithms, rather than spending time implementing a deep
>> >>>>>> learning system from scratch. Furthermore, the need for scalability
>> >>>>>> for such a platform is obvious.
>> >>>>>>
>> >>>>>> Relationship with Other Apache Products
>> >>>>>>
>> >>>>>> Apache H2O implemented two simple deep learning models, namely the
>> >>>>>> Multi-Layer Perceptron and Deep Auto-encoders. There are two
>> >>>>>> significant differences between H2O and SINGA. First, H2O adopts the
>> >>>>>> Map-Reduce framework which runs a set of computing nodes in parallel
>> >>>>>> againsts of the training set. Model parameters trained by all
>> >>>>>> computing nodes are averaged as the final model parameters. This
>> >>>>>> training algorithm is different from the distributed training
>> >>>>>> algorithm used by DistBelief, Adam and SINGA, which frequently
>> >>>>>> synchronizes the parameters trained from different nodes. SINGA
>> >>>>>> adopts
>> >>>>>> the parameter server framework to support a wide range of
>> distributed
>> >>>>>> training algorithms and parallelization methods (e.g., data
>> >>>>>> parallelism, model parallelism and hybrid parallelism. H2O only
>> >>>>>> support data parallelism) . Second, in H2O, users are restricted to
>> >>>>>> use the two built-in models. In SINGA, we provide simple programming
>> >>>>>> model to let users implement their own deep learning models. A new
>> >>>>>> deep learning model can be implemented by customizing the base Layer
>> >>>>>> class for each layer involved in the model. It is similar to writing
>> >>>>>> Hadoop programs where users only need to override the base Mapper
>> and
>> >>>>>> Reducer. We also provide built-in models for users to use directly.
>> >>>>>>
>> >>>>>> Documentation
>> >>>>>>
>> >>>>>> The project is hosted at
>> >>>>>> http://www.comp.nus.edu.sg/~dbsystem/project/singa.html.
>> >>>>>> Documentations can be found at the Github Wiki Page:
>> >>>>>> https://github.com/nusinga/singa/wiki. We continue to refine and
>> >>>>>> improve the documentation.
>> >>>>>>
>> >>>>>> Initial Source
>> >>>>>>
>> >>>>>> We use Github to maintain our source code,
>> >> https://github.com/nusinga/singa
>> >>>>>>
>> >>>>>> Source and Intellectual Property Submission Plan
>> >>>>>>
>> >>>>>> We plan to make our code base be under Apache License, Version 2.0.
>> >>>>>>
>> >>>>>> External Dependencies
>> >>>>>>
>> >>>>>> required by the core code base: glog, gflags, google protobuf,
>> >>>>>> open-blas, mpich, armci-mpi.
>> >>>>>> required by data preparation and preprocessing: opencv, hdfs,
>> python.
>> >>>>>>
>> >>>>>> Cryptography
>> >>>>>>
>> >>>>>> Not Applicable
>> >>>>>>
>> >>>>>> Required Resources
>> >>>>>>
>> >>>>>> Mailing Lists
>> >>>>>>
>> >>>>>> Currently, we use google group for internal discussion. The mailing
>> >>>>>> address is nusinga@googlegroup.com. We will migrate the content to
>> >>>>>> the
>> >>>>>> apache mailing lists in the future.
>> >>>>>>
>> >>>>>> singa-dev
>> >>>>>> singa-user
>> >>>>>> singa-commits
>> >>>>>> singa-private (for private discussion within PCM)
>> >>>>>>
>> >>>>>> Git Repository
>> >>>>>>
>> >>>>>> We want to continue using git for version control. Hence, a git repo
>> >>>>>> is required.
>> >>>>>>
>> >>>>>> Issue Tracking
>> >>>>>>
>> >>>>>> JIRA Singa (SINGA)
>> >>>>>>
>> >>>>>> Initial Committers
>> >>>>>>
>> >>>>>> Beng Chin Ooi (ooibc @comp.nus.edu.sg)
>> >>>>>> Kian Lee Tan (tankl @comp.nus.edu.sg)
>> >>>>>> Gang Chen (cg @zju.edu.cn)
>> >>>>>> Wei Wang (wangwei @comp.nus.edu.sg)
>> >>>>>> Dinh Tien Tuan Anh (dinhtta @comp.nus.edu.sg)
>> >>>>>> Jinyang Gao (jinyang.gao @comp.nus.edu.sg)
>> >>>>>> Sheng Wang (wangsh @comp.nus.edu.sg)
>> >>>>>> Kaiping Zheng (kaiping @comp.nus.edu.sg)
>> >>>>>> Zhaojing Luo (zhaojing @comp.nus.edu.sg)
>> >>>>>> Zhongle Xie (zhongle @comp.nus.edu.sg)
>> >>>>>>
>> >>>>>> Affiliations
>> >>>>>>
>> >>>>>> Beng Chin Ooi, National University of Singapore
>> >>>>>> Kian Lee Tan, National University of Singapore
>> >>>>>> Gang Chen, Zhejiang University
>> >>>>>> Wei Wang, National University of Singapore
>> >>>>>> Dinh Tien Tuan Anh, National University of Singapore
>> >>>>>> Jinyang Gao, National University of Singapore
>> >>>>>> Sheng Wang, National University of Singapore
>> >>>>>> Kaiping Zheng, National University of Singapore
>> >>>>>> Zhaojing Luo, National University of Singapore
>> >>>>>> Zhongle Xie, National University of Singapore
>> >>>>>>
>> >>>>>> Sponsors
>> >>>>>>
>> >>>>>> Champion
>> >>>>>>
>> >>>>>> Thejas Nair (thejas at apache.org) - Hortonworks
>> >>>>>>
>> >>>>>> Nominated Mentors
>> >>>>>>
>> >>>>>> Thejas Nair (thejas at apache.org) - Hortonworks
>> >>>>>> Alan Gates (gates at apache dot org) - Hortonworks
>> >>>>>> (Seeking more volunteers!)
>> >>>>>>
>> >>>>>> Sponsoring Entity
>> >>>>>>
>> >>>>>> We are requesting the Incubator to sponsor this project.
>> >>>>>>
>> >>>>>>
>> ---------------------------------------------------------------------
>> >>>>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> >>>>>> For additional commands, e-mail: general-help@incubator.apache.org
>> >>>>>>
>> >>>>>
>> >>>>> ---------------------------------------------------------------------
>> >>>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> >>>>> For additional commands, e-mail: general-help@incubator.apache.org
>> >>>>>
>> >>>
>> >>> ---------------------------------------------------------------------
>> >>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> >>> For additional commands, e-mail: general-help@incubator.apache.org
>> >>>
>> >>
>> >>
>> >>
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: general-help@incubator.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [Fwd: Re: [DISCUSS] [PROPOSAL] Singa for Apache Incubator]

Posted by Ted Dunning <te...@gmail.com>.
Thejas,

Please add me as a mentor if it helps to have diversity.  I have enormous
trust based on previous experience with him that Alan Gates would act as a
highly impartial and effective mentor, but would be happy to help if there
is a concern that could be addressed by having another mentor from a
different company.



On Thu, Feb 26, 2015 at 6:12 PM, Thejas Nair <th...@gmail.com> wrote:

> The incubator proposal has been updated with the feedback so far.
> We have 3 mentors now, but I think it would be good to have additional
> mentors. Please let me know if anyone is able to help mentor this
> project.
>
> I am planning to start a vote on the proposal in a day or two.
>
>
> On Fri, Feb 6, 2015 at 5:21 PM,  <oo...@comp.nus.edu.sg> wrote:
> >
> > Regarding the number of users using this project -- at this moment, the
> > community is not big.  A few local start-ups have been trying to use it
> > (mainly due to announcement in our seminar list), eg. one is using it for
> > image recognition (given a phone snapped by a user, it wants to be return
> > the same the product, and a list of similar products, such as a luxury
> bag
> > on a passerby).  Researchers from outside of NUS may have been using it
> > since we published an application paper on cross domain/modal retrieval
> in
> > VLDB 2014.
> >
> > We have not announced the project to the outside community yet -- we
> would
> > announce it in dbworld etc in due course.
> >
> > Thanks and have a good weekend.
> >
> > regards
> > beng chin
> >
> >>
> >> Thanks for the comments and suggestions.
> >> With permission from Thejas, I would like to respond to point 2.
> >>
> >> We have a huge team down at NUS (National University of Singapore) --
> >> we have about seven database/data mining data professors (not including
> >> those in systems, networking, and machine learning).
> >> I myself have nine PhD students in a steady state, and I have a few
> large
> >> grants, with a total budget of about 15 million S$ (~12 million USD),
> that
> >> allows me to hire a number of research fellows and research assistants
> for
> >> the next few years.  In a constant state, I have about 20 people (PhD
> >> students/RA/RF) working with me alone.  Other professors have their own
> >> grants (unlike other countries, it is relatively easy to get large
> grants
> >> in Singapore; many overseas Universities, including UIUC, MIT, ETH etc
> >> have research labs funded by Singapore Research Foundation [equivalent
> of
> >> NSF]).
> >>
> >> SINGA is a long term project for us -- while it is a platform as it is,
> we
> >> are using it for healthcare predictive analytics (by working with a
> >> hospital associated with the University).  Therefore, we will be working
> >> on SINGA, not solely as a distributed DL platform, but as a tool that
> will
> >> enable us to do data analytics on some business domains (eg. healthcase,
> >> consumer etc)
> >>
> >> For the initial set of committers, three are tenured professors, five
> are
> >> students, with 2-5 years to go before they complete their PhD.  Quite
> >> often, some would stay back as a research fellow for a couple of years
> >> before they start looking for a job outside.  We will work with mentors
> >> and new developers (from outside of NUS or Zhejiang University) in
> >> enhancing the system.
> >>
> >> The project should survive in that sense.
> >>
> >> (I have an on-going project CIIDAA that has been around since 2008; it
> was
> >> started as another project, epiC,  with a different grant, and then we
> >> continue the development with a new grant for CIIDAA --
> >> http://www.comp.nus.edu.sg/~ciidaa/
> >> )
> >>
> >> Thanks.
> >>
> >> regards
> >> beng chin
> >> ps: i am not sure if my email will get through to the group.
> >>
> >>
> >> ---------------------------- Original Message
> ----------------------------
> >> Subject: Re: [DISCUSS] [PROPOSAL] Singa for Apache Incubator
> >> From:    "Henry Saputra" <he...@gmail.com>
> >> Date:    Thu, February 5, 2015 2:57 pm
> >> To:      "general@incubator.apache.org" <ge...@incubator.apache.org>
> >> Cc:      ooibc@comp.nus.edu.sg
> >>
> --------------------------------------------------------------------------
> >>
> >> Several comments:
> >> -) How many users already using this project? I would reccomend to
> >> drop request for singa-user list at the beginning.
> >> -) All the initial committers come from university and seemed like
> >> some of them already ready to leave university. I am not too sure if
> >> this project go survive if all of the inital committers are from
> >> university as students.
> >> -) Need to solicit more mentors if this project ever get to Apache
> >> incubator.
> >>
> >> - Henry
> >>
> >> On Tue, Feb 3, 2015 at 3:58 PM, Thejas Nair <th...@gmail.com>
> wrote:
> >>> The "Relationship with Other Apache Products" section has been
> >>> updated. The reference to H2O in that section has been removed, and
> >>> other projects have been added.
> >>>  Thanks for the feedback!
> >>>
> >>>
> >>> On Wed, Jan 28, 2015 at 10:27 AM, Thejas Nair <th...@gmail.com>
> >> wrote:
> >>>> Thanks for pointing that out Henry! Yes, looks like H20 is not an
> >>>> apache project, I should have verified that.
> >>>> I will edit that, and revisit that section along with the folks in
> >>>> Singa community.
> >>>>
> >>>>
> >>>> On Tue, Jan 27, 2015 at 6:55 PM, Henry Saputra
> >> <he...@gmail.com> wrote:
> >>>>> Quick immediate comment that "Apache H2O" is not really Apache
> >>>>> project.
> >>>>>
> >>>>> I assume you are referring to https://github.com/h2oai/h2o (or
> >>>>> https://github.com/h2oai/h2o-dev) ?
> >>>>>
> >>>>> - Henry
> >>>>>
> >>>>> On Tue, Jan 27, 2015 at 5:29 PM, Thejas Nair <th...@gmail.com>
> >> wrote:
> >>>>>> Hello everyone,
> >>>>>>
> >>>>>> I would like to propose the inclusion of Singa as an Apache
> Incubator
> >> project.
> >>>>>>
> >>>>>> Here is the proposal -
> >>>>>> https://wiki.apache.org/incubator/SingaProposal
> >>>>>>
> >>>>>> Please review the proposal and give feedback. I am planning to start
> >>>>>> a
> >>>>>> vote after 7 days if the proposal looks good.
> >>>>>> We are also seeking additional Apache mentors for the project.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Thejas
> >>>>>> ==========================================================
> >>>>>> Singa Incubator Proposal
> >>>>>>
> >>>>>> Abstract
> >>>>>>
> >>>>>> SINGA is a distributed deep learning platform.
> >>>>>>
> >>>>>> Proposal
> >>>>>>
> >>>>>> SINGA is an efficient, scalable and easy-to-use distributed platform
> >>>>>> for training deep learning models, e.g., Deep Convolutional Neural
> >>>>>> Network and Deep Belief Network. It parallelizes the computation
> >>>>>> (i.e., training) onto a cluster of nodes by distributing the
> training
> >>>>>> data and model automatically to speed up the training. Built-in
> >>>>>> training algorithms like Back-Propagation and Contrastive Divergence
> >>>>>> are implemented based on common abstractions of deep learning
> models.
> >>>>>> Users can train their own deep learning models by simply customizing
> >>>>>> these abstractions like implementing the Mapper and Reducer in
> >>>>>> Hadoop.
> >>>>>>
> >>>>>> Background
> >>>>>>
> >>>>>> Deep learning refers to a set of feature (or representation)
> learning
> >>>>>> models that consist of multiple (non-linear) layers, where different
> >>>>>> layers learn different levels of abstractions (representations) of
> >>>>>> the
> >>>>>> raw input data. Larger (in terms of model parameters) and deeper (in
> >>>>>> terms of number of layers) models have shown better performance,
> >>>>>> e.g.,
> >>>>>> lower image classification error in Large Scale Visual Recognition
> >>>>>> Challenge. However, a larger model requires more memory and larger
> >>>>>> training data to reduce over-fitting. Complex numeric operations
> make
> >>>>>> the training computation intensive. In practice, training large deep
> >>>>>> learning models takes weeks or months on a single node (even with
> >>>>>> GPU).
> >>>>>>
> >>>>>> Rational
> >>>>>>
> >>>>>> Deep learning has gained a lot of attraction in both academia and
> >>>>>> industry due to its success in a wide range of areas such as
> computer
> >>>>>> vision and speech recognition. However, training of such models is
> >>>>>> computationally expensive, especially for large and deep models
> >>>>>> (e.g.,
> >>>>>> with billions of parameters and more than 10 layers). Both Google
> and
> >>>>>> Microsoft have developed distributed deep learning systems to make
> >>>>>> the
> >>>>>> training more efficient by distributing the computations within a
> >>>>>> cluster of nodes. However, these systems are closed source
> softwares.
> >>>>>> Our goal is to leverage the community of open source developers to
> >>>>>> make SINGA efficient, scalable and easy to use. SINGA is a full
> >>>>>> fledged distributed platform, that could benefit the community and
> >>>>>> also benefit from the community in their involvement in contributing
> >>>>>> to the further work in this area. We believe the nature of SINGA and
> >>>>>> our visions for the system fit naturally to Apache's philosophy and
> >>>>>> development framework.
> >>>>>>
> >>>>>> Initial Goals
> >>>>>>
> >>>>>> We have developed a system for SINGA running on a commodity computer
> >>>>>> cluster. The initial goals include, * improving the system in terms
> >>>>>> of
> >>>>>> scalability and efficiency, e.g., using Infiniband for network
> >>>>>> communication and multi-threading for one node computation. We would
> >>>>>> consider extending SINGA to GPU clusters later. * benchmarking with
> >>>>>> larger datasets (hundreds of millions of training instances) and
> >>>>>> models (billions of parameters). * adding more built-in deep
> learning
> >>>>>> models. Users can train the built-in models on their datasets
> >>>>>> directly.
> >>>>>>
> >>>>>> Current Status
> >>>>>>
> >>>>>> Meritocracy
> >>>>>>
> >>>>>> We would like to follow ASF meritocratic principles to encourage
> more
> >>>>>> developers to contribute in this project. We know that only active
> >>>>>> and
> >>>>>> excellent developers can make SINGA a successful project. The
> >>>>>> committer list and PMC will be updated based on developers'
> >>>>>> performance and commitment. We are also improving the documentation
> >>>>>> and code to help new developers get started quickly.
> >>>>>>
> >>>>>> Community
> >>>>>>
> >>>>>> SINGA is currently being developed in the Database System Research
> >>>>>> Lab
> >>>>>> at the National University of Singapore (NUS) in collaboration with
> >>>>>> Zhejiang University in China. Our lab has extensive experience in
> >>>>>> building database related systems, including distributed systems.
> Six
> >>>>>> PhD students and research assistants (Jinyang Gao, Kaiping Zheng,
> >>>>>> Sheng Wang, Wei Wang, Zhaojing Luo and Zhongle Xie) , a research
> >>>>>> fellow (Anh Dinh) and three professors (Beng Chin Ooi, Gang Chen,
> >>>>>> Kian
> >>>>>> Lee Tan) have been working for a year on this project. We are open
> to
> >>>>>> recruiting more developers from diverse backgrounds.
> >>>>>>
> >>>>>> Core Developers
> >>>>>>
> >>>>>> Beng Chin Ooi, Gang Chen and Kian Lee Tan are professors who have
> >>>>>> worked on distributed systems for more than 20 years. They have
> >>>>>> collaborated with the industry and have built various large scale
> >>>>>> systems. Anh Dinh's research is also on distributed systems, albeit
> >>>>>> with more focus on security aspects. Wei Wang's research is on deep
> >>>>>> learning problems including deep learning applications and large
> >>>>>> scale
> >>>>>> training. Sheng Wang and Jinyang are working on efficient indexing,
> >>>>>> querying of large scale data and machine learning. Kaiping, Zhaojing
> >>>>>> and Zhongle are new PhD students who jointed SINGA recently. They
> >>>>>> will
> >>>>>> work on this project for a longer time (next 4-5 years). While we
> >>>>>> share common research interests, each member also brings diverse
> >>>>>> expertise to the team.
> >>>>>>
> >>>>>> Alignment
> >>>>>>
> >>>>>> ASF is already the home of many distributed platforms, e.g., Hadoop,
> >>>>>> Spark and Mahout, each of which targets a different application
> >>>>>> domain. SINGA, being a distributed platform for large-scale deep
> >>>>>> learning, focuses on another important domain for which there still
> >>>>>> lacks a robust and scalable open-source platform. The recent success
> >>>>>> of deep learning models especially for vision and speech recognition
> >>>>>> tasks has generated interests in both applying existing deep
> learning
> >>>>>> models and in developing new ones. Thus, an open-source platform for
> >>>>>> deep learning will be able to attract a large community of users and
> >>>>>> developers. SINGA is a complex system needing many iterations of
> >>>>>> design, implementation and testing. Apache's collaboration framework
> >>>>>> which encourages active contribution from developers will inevitably
> >>>>>> help improve the quality of the system, as shown in the success of
> >>>>>> Hadoop, Spark, etc.. Equally important is the community of users
> >>>>>> which
> >>>>>> helps identify real-life applications of deep learning, and helps to
> >>>>>> evaluate the system's performance and ease-of-use. We hope to
> >>>>>> leverage
> >>>>>> ASF for coordinating and promoting both communities, and in return
> >>>>>> benefit the communities with another useful tool.
> >>>>>>
> >>>>>> Known Risks
> >>>>>>
> >>>>>> Orphaned products
> >>>>>>
> >>>>>> Four core developers (Anh, Wei Wang, Jinyang and Sheng Wang) may
> >>>>>> leave
> >>>>>> the lab in two to four years time. It is possible that some of them
> >>>>>> may not have enough time to focus on this project after that. But,
> >>>>>> SINGA is part of our other bigger research projects on building an
> >>>>>> infrastructure for data intensive applications, which include
> >>>>>> health-care analytics and brain-inspired computing. Beng Chin and
> >>>>>> Kian
> >>>>>> Lee would continue working on it and getting more people involved.
> >>>>>> For
> >>>>>> example, three new developers (Kaiping, Zhaojing and Zhongle) joined
> >>>>>> us recently. Individual developers are welcome to make SINGA a
> >>>>>> diverse
> >>>>>> community that is robust and independent from any single developer.
> >>>>>>
> >>>>>> Inexperience with Open Source
> >>>>>>
> >>>>>> All the developers are active users and followers of open source
> >>>>>> projects. Our research lab has a strong commitment to open source,
> >>>>>> and
> >>>>>> has released the source code of several systems under open source
> >>>>>> license as a way of contributing back to the open source community.
> >>>>>> But we do not have much real experience in open source projects with
> >>>>>> large and well organized communities like those in Apache. This is
> >>>>>> one
> >>>>>> reason we choose Apache which is experienced in open source project
> >>>>>> incubation. We hope to get the help from Apache (e.g., champion and
> >>>>>> mentors) to establish a healthy path for SINGA.
> >>>>>>
> >>>>>> Homogenous Developers
> >>>>>>
> >>>>>> Although the current developers are researchers in the universities,
> >>>>>> they have different research interests and project experiences, as
> >>>>>> mentioned in the section that introduces the core developers. We
> know
> >>>>>> that a diverse community is helpful. Hence we are open to the idea
> of
> >>>>>> recruiting developers from other regions and organizations.
> >>>>>>
> >>>>>> Reliance on Salaried Developers
> >>>>>>
> >>>>>> As a research project in the university, SINGA's current developing
> >>>>>> community consists of professors, PhD students, research assistants
> >>>>>> and postdoctoral fellows. They are driven by their interests to work
> >>>>>> on this project and have contributed actively since the start of the
> >>>>>> project. The research assistants and fellows are expected to leave
> >>>>>> when their contracts expire. However, they are keen to continue to
> >>>>>> work on the project voluntarily. Moreover, as a long term research
> >>>>>> project, new research assistants and fellows are likely to join the
> >>>>>> project.
> >>>>>>
> >>>>>> A Excessive Fascination with the Apache Brand
> >>>>>>
> >>>>>> We choose Apache not for publicity. We have two purposes. First, we
> >>>>>> want to leverage Apache's reputation to recruit more developers to
> >>>>>> make a diverse community. Second, we hope that Apache can help us to
> >>>>>> establish a healthy path in developing SINGA. Beng Chin and Kian-Lee
> >>>>>> are established database and distributed system researchers, and
> >>>>>> together with the other contributors, they sincerely believe that
> >>>>>> there is a need for a widely accepted open source distributed deep
> >>>>>> learning platform. The field of deep learning is still at its
> >>>>>> infancy,
> >>>>>> and an open source platform will fuel the research in the area.
> >>>>>> Moreover, such a platform will enable researchers to develop new
> >>>>>> models and algorithms, rather than spending time implementing a deep
> >>>>>> learning system from scratch. Furthermore, the need for scalability
> >>>>>> for such a platform is obvious.
> >>>>>>
> >>>>>> Relationship with Other Apache Products
> >>>>>>
> >>>>>> Apache H2O implemented two simple deep learning models, namely the
> >>>>>> Multi-Layer Perceptron and Deep Auto-encoders. There are two
> >>>>>> significant differences between H2O and SINGA. First, H2O adopts the
> >>>>>> Map-Reduce framework which runs a set of computing nodes in parallel
> >>>>>> againsts of the training set. Model parameters trained by all
> >>>>>> computing nodes are averaged as the final model parameters. This
> >>>>>> training algorithm is different from the distributed training
> >>>>>> algorithm used by DistBelief, Adam and SINGA, which frequently
> >>>>>> synchronizes the parameters trained from different nodes. SINGA
> >>>>>> adopts
> >>>>>> the parameter server framework to support a wide range of
> distributed
> >>>>>> training algorithms and parallelization methods (e.g., data
> >>>>>> parallelism, model parallelism and hybrid parallelism. H2O only
> >>>>>> support data parallelism) . Second, in H2O, users are restricted to
> >>>>>> use the two built-in models. In SINGA, we provide simple programming
> >>>>>> model to let users implement their own deep learning models. A new
> >>>>>> deep learning model can be implemented by customizing the base Layer
> >>>>>> class for each layer involved in the model. It is similar to writing
> >>>>>> Hadoop programs where users only need to override the base Mapper
> and
> >>>>>> Reducer. We also provide built-in models for users to use directly.
> >>>>>>
> >>>>>> Documentation
> >>>>>>
> >>>>>> The project is hosted at
> >>>>>> http://www.comp.nus.edu.sg/~dbsystem/project/singa.html.
> >>>>>> Documentations can be found at the Github Wiki Page:
> >>>>>> https://github.com/nusinga/singa/wiki. We continue to refine and
> >>>>>> improve the documentation.
> >>>>>>
> >>>>>> Initial Source
> >>>>>>
> >>>>>> We use Github to maintain our source code,
> >> https://github.com/nusinga/singa
> >>>>>>
> >>>>>> Source and Intellectual Property Submission Plan
> >>>>>>
> >>>>>> We plan to make our code base be under Apache License, Version 2.0.
> >>>>>>
> >>>>>> External Dependencies
> >>>>>>
> >>>>>> required by the core code base: glog, gflags, google protobuf,
> >>>>>> open-blas, mpich, armci-mpi.
> >>>>>> required by data preparation and preprocessing: opencv, hdfs,
> python.
> >>>>>>
> >>>>>> Cryptography
> >>>>>>
> >>>>>> Not Applicable
> >>>>>>
> >>>>>> Required Resources
> >>>>>>
> >>>>>> Mailing Lists
> >>>>>>
> >>>>>> Currently, we use google group for internal discussion. The mailing
> >>>>>> address is nusinga@googlegroup.com. We will migrate the content to
> >>>>>> the
> >>>>>> apache mailing lists in the future.
> >>>>>>
> >>>>>> singa-dev
> >>>>>> singa-user
> >>>>>> singa-commits
> >>>>>> singa-private (for private discussion within PCM)
> >>>>>>
> >>>>>> Git Repository
> >>>>>>
> >>>>>> We want to continue using git for version control. Hence, a git repo
> >>>>>> is required.
> >>>>>>
> >>>>>> Issue Tracking
> >>>>>>
> >>>>>> JIRA Singa (SINGA)
> >>>>>>
> >>>>>> Initial Committers
> >>>>>>
> >>>>>> Beng Chin Ooi (ooibc @comp.nus.edu.sg)
> >>>>>> Kian Lee Tan (tankl @comp.nus.edu.sg)
> >>>>>> Gang Chen (cg @zju.edu.cn)
> >>>>>> Wei Wang (wangwei @comp.nus.edu.sg)
> >>>>>> Dinh Tien Tuan Anh (dinhtta @comp.nus.edu.sg)
> >>>>>> Jinyang Gao (jinyang.gao @comp.nus.edu.sg)
> >>>>>> Sheng Wang (wangsh @comp.nus.edu.sg)
> >>>>>> Kaiping Zheng (kaiping @comp.nus.edu.sg)
> >>>>>> Zhaojing Luo (zhaojing @comp.nus.edu.sg)
> >>>>>> Zhongle Xie (zhongle @comp.nus.edu.sg)
> >>>>>>
> >>>>>> Affiliations
> >>>>>>
> >>>>>> Beng Chin Ooi, National University of Singapore
> >>>>>> Kian Lee Tan, National University of Singapore
> >>>>>> Gang Chen, Zhejiang University
> >>>>>> Wei Wang, National University of Singapore
> >>>>>> Dinh Tien Tuan Anh, National University of Singapore
> >>>>>> Jinyang Gao, National University of Singapore
> >>>>>> Sheng Wang, National University of Singapore
> >>>>>> Kaiping Zheng, National University of Singapore
> >>>>>> Zhaojing Luo, National University of Singapore
> >>>>>> Zhongle Xie, National University of Singapore
> >>>>>>
> >>>>>> Sponsors
> >>>>>>
> >>>>>> Champion
> >>>>>>
> >>>>>> Thejas Nair (thejas at apache.org) - Hortonworks
> >>>>>>
> >>>>>> Nominated Mentors
> >>>>>>
> >>>>>> Thejas Nair (thejas at apache.org) - Hortonworks
> >>>>>> Alan Gates (gates at apache dot org) - Hortonworks
> >>>>>> (Seeking more volunteers!)
> >>>>>>
> >>>>>> Sponsoring Entity
> >>>>>>
> >>>>>> We are requesting the Incubator to sponsor this project.
> >>>>>>
> >>>>>>
> ---------------------------------------------------------------------
> >>>>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> >>>>>> For additional commands, e-mail: general-help@incubator.apache.org
> >>>>>>
> >>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> >>>>> For additional commands, e-mail: general-help@incubator.apache.org
> >>>>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> >>> For additional commands, e-mail: general-help@incubator.apache.org
> >>>
> >>
> >>
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>

Re: [Fwd: Re: [DISCUSS] [PROPOSAL] Singa for Apache Incubator]

Posted by Thejas Nair <th...@gmail.com>.
The incubator proposal has been updated with the feedback so far.
We have 3 mentors now, but I think it would be good to have additional
mentors. Please let me know if anyone is able to help mentor this
project.

I am planning to start a vote on the proposal in a day or two.


On Fri, Feb 6, 2015 at 5:21 PM,  <oo...@comp.nus.edu.sg> wrote:
>
> Regarding the number of users using this project -- at this moment, the
> community is not big.  A few local start-ups have been trying to use it
> (mainly due to announcement in our seminar list), eg. one is using it for
> image recognition (given a phone snapped by a user, it wants to be return
> the same the product, and a list of similar products, such as a luxury bag
> on a passerby).  Researchers from outside of NUS may have been using it
> since we published an application paper on cross domain/modal retrieval in
> VLDB 2014.
>
> We have not announced the project to the outside community yet -- we would
> announce it in dbworld etc in due course.
>
> Thanks and have a good weekend.
>
> regards
> beng chin
>
>>
>> Thanks for the comments and suggestions.
>> With permission from Thejas, I would like to respond to point 2.
>>
>> We have a huge team down at NUS (National University of Singapore) --
>> we have about seven database/data mining data professors (not including
>> those in systems, networking, and machine learning).
>> I myself have nine PhD students in a steady state, and I have a few large
>> grants, with a total budget of about 15 million S$ (~12 million USD), that
>> allows me to hire a number of research fellows and research assistants for
>> the next few years.  In a constant state, I have about 20 people (PhD
>> students/RA/RF) working with me alone.  Other professors have their own
>> grants (unlike other countries, it is relatively easy to get large grants
>> in Singapore; many overseas Universities, including UIUC, MIT, ETH etc
>> have research labs funded by Singapore Research Foundation [equivalent of
>> NSF]).
>>
>> SINGA is a long term project for us -- while it is a platform as it is, we
>> are using it for healthcare predictive analytics (by working with a
>> hospital associated with the University).  Therefore, we will be working
>> on SINGA, not solely as a distributed DL platform, but as a tool that will
>> enable us to do data analytics on some business domains (eg. healthcase,
>> consumer etc)
>>
>> For the initial set of committers, three are tenured professors, five are
>> students, with 2-5 years to go before they complete their PhD.  Quite
>> often, some would stay back as a research fellow for a couple of years
>> before they start looking for a job outside.  We will work with mentors
>> and new developers (from outside of NUS or Zhejiang University) in
>> enhancing the system.
>>
>> The project should survive in that sense.
>>
>> (I have an on-going project CIIDAA that has been around since 2008; it was
>> started as another project, epiC,  with a different grant, and then we
>> continue the development with a new grant for CIIDAA --
>> http://www.comp.nus.edu.sg/~ciidaa/
>> )
>>
>> Thanks.
>>
>> regards
>> beng chin
>> ps: i am not sure if my email will get through to the group.
>>
>>
>> ---------------------------- Original Message ----------------------------
>> Subject: Re: [DISCUSS] [PROPOSAL] Singa for Apache Incubator
>> From:    "Henry Saputra" <he...@gmail.com>
>> Date:    Thu, February 5, 2015 2:57 pm
>> To:      "general@incubator.apache.org" <ge...@incubator.apache.org>
>> Cc:      ooibc@comp.nus.edu.sg
>> --------------------------------------------------------------------------
>>
>> Several comments:
>> -) How many users already using this project? I would reccomend to
>> drop request for singa-user list at the beginning.
>> -) All the initial committers come from university and seemed like
>> some of them already ready to leave university. I am not too sure if
>> this project go survive if all of the inital committers are from
>> university as students.
>> -) Need to solicit more mentors if this project ever get to Apache
>> incubator.
>>
>> - Henry
>>
>> On Tue, Feb 3, 2015 at 3:58 PM, Thejas Nair <th...@gmail.com> wrote:
>>> The "Relationship with Other Apache Products" section has been
>>> updated. The reference to H2O in that section has been removed, and
>>> other projects have been added.
>>>  Thanks for the feedback!
>>>
>>>
>>> On Wed, Jan 28, 2015 at 10:27 AM, Thejas Nair <th...@gmail.com>
>> wrote:
>>>> Thanks for pointing that out Henry! Yes, looks like H20 is not an
>>>> apache project, I should have verified that.
>>>> I will edit that, and revisit that section along with the folks in
>>>> Singa community.
>>>>
>>>>
>>>> On Tue, Jan 27, 2015 at 6:55 PM, Henry Saputra
>> <he...@gmail.com> wrote:
>>>>> Quick immediate comment that "Apache H2O" is not really Apache
>>>>> project.
>>>>>
>>>>> I assume you are referring to https://github.com/h2oai/h2o (or
>>>>> https://github.com/h2oai/h2o-dev) ?
>>>>>
>>>>> - Henry
>>>>>
>>>>> On Tue, Jan 27, 2015 at 5:29 PM, Thejas Nair <th...@gmail.com>
>> wrote:
>>>>>> Hello everyone,
>>>>>>
>>>>>> I would like to propose the inclusion of Singa as an Apache Incubator
>> project.
>>>>>>
>>>>>> Here is the proposal -
>>>>>> https://wiki.apache.org/incubator/SingaProposal
>>>>>>
>>>>>> Please review the proposal and give feedback. I am planning to start
>>>>>> a
>>>>>> vote after 7 days if the proposal looks good.
>>>>>> We are also seeking additional Apache mentors for the project.
>>>>>>
>>>>>> Thanks,
>>>>>> Thejas
>>>>>> ==========================================================
>>>>>> Singa Incubator Proposal
>>>>>>
>>>>>> Abstract
>>>>>>
>>>>>> SINGA is a distributed deep learning platform.
>>>>>>
>>>>>> Proposal
>>>>>>
>>>>>> SINGA is an efficient, scalable and easy-to-use distributed platform
>>>>>> for training deep learning models, e.g., Deep Convolutional Neural
>>>>>> Network and Deep Belief Network. It parallelizes the computation
>>>>>> (i.e., training) onto a cluster of nodes by distributing the training
>>>>>> data and model automatically to speed up the training. Built-in
>>>>>> training algorithms like Back-Propagation and Contrastive Divergence
>>>>>> are implemented based on common abstractions of deep learning models.
>>>>>> Users can train their own deep learning models by simply customizing
>>>>>> these abstractions like implementing the Mapper and Reducer in
>>>>>> Hadoop.
>>>>>>
>>>>>> Background
>>>>>>
>>>>>> Deep learning refers to a set of feature (or representation) learning
>>>>>> models that consist of multiple (non-linear) layers, where different
>>>>>> layers learn different levels of abstractions (representations) of
>>>>>> the
>>>>>> raw input data. Larger (in terms of model parameters) and deeper (in
>>>>>> terms of number of layers) models have shown better performance,
>>>>>> e.g.,
>>>>>> lower image classification error in Large Scale Visual Recognition
>>>>>> Challenge. However, a larger model requires more memory and larger
>>>>>> training data to reduce over-fitting. Complex numeric operations make
>>>>>> the training computation intensive. In practice, training large deep
>>>>>> learning models takes weeks or months on a single node (even with
>>>>>> GPU).
>>>>>>
>>>>>> Rational
>>>>>>
>>>>>> Deep learning has gained a lot of attraction in both academia and
>>>>>> industry due to its success in a wide range of areas such as computer
>>>>>> vision and speech recognition. However, training of such models is
>>>>>> computationally expensive, especially for large and deep models
>>>>>> (e.g.,
>>>>>> with billions of parameters and more than 10 layers). Both Google and
>>>>>> Microsoft have developed distributed deep learning systems to make
>>>>>> the
>>>>>> training more efficient by distributing the computations within a
>>>>>> cluster of nodes. However, these systems are closed source softwares.
>>>>>> Our goal is to leverage the community of open source developers to
>>>>>> make SINGA efficient, scalable and easy to use. SINGA is a full
>>>>>> fledged distributed platform, that could benefit the community and
>>>>>> also benefit from the community in their involvement in contributing
>>>>>> to the further work in this area. We believe the nature of SINGA and
>>>>>> our visions for the system fit naturally to Apache's philosophy and
>>>>>> development framework.
>>>>>>
>>>>>> Initial Goals
>>>>>>
>>>>>> We have developed a system for SINGA running on a commodity computer
>>>>>> cluster. The initial goals include, * improving the system in terms
>>>>>> of
>>>>>> scalability and efficiency, e.g., using Infiniband for network
>>>>>> communication and multi-threading for one node computation. We would
>>>>>> consider extending SINGA to GPU clusters later. * benchmarking with
>>>>>> larger datasets (hundreds of millions of training instances) and
>>>>>> models (billions of parameters). * adding more built-in deep learning
>>>>>> models. Users can train the built-in models on their datasets
>>>>>> directly.
>>>>>>
>>>>>> Current Status
>>>>>>
>>>>>> Meritocracy
>>>>>>
>>>>>> We would like to follow ASF meritocratic principles to encourage more
>>>>>> developers to contribute in this project. We know that only active
>>>>>> and
>>>>>> excellent developers can make SINGA a successful project. The
>>>>>> committer list and PMC will be updated based on developers'
>>>>>> performance and commitment. We are also improving the documentation
>>>>>> and code to help new developers get started quickly.
>>>>>>
>>>>>> Community
>>>>>>
>>>>>> SINGA is currently being developed in the Database System Research
>>>>>> Lab
>>>>>> at the National University of Singapore (NUS) in collaboration with
>>>>>> Zhejiang University in China. Our lab has extensive experience in
>>>>>> building database related systems, including distributed systems. Six
>>>>>> PhD students and research assistants (Jinyang Gao, Kaiping Zheng,
>>>>>> Sheng Wang, Wei Wang, Zhaojing Luo and Zhongle Xie) , a research
>>>>>> fellow (Anh Dinh) and three professors (Beng Chin Ooi, Gang Chen,
>>>>>> Kian
>>>>>> Lee Tan) have been working for a year on this project. We are open to
>>>>>> recruiting more developers from diverse backgrounds.
>>>>>>
>>>>>> Core Developers
>>>>>>
>>>>>> Beng Chin Ooi, Gang Chen and Kian Lee Tan are professors who have
>>>>>> worked on distributed systems for more than 20 years. They have
>>>>>> collaborated with the industry and have built various large scale
>>>>>> systems. Anh Dinh's research is also on distributed systems, albeit
>>>>>> with more focus on security aspects. Wei Wang's research is on deep
>>>>>> learning problems including deep learning applications and large
>>>>>> scale
>>>>>> training. Sheng Wang and Jinyang are working on efficient indexing,
>>>>>> querying of large scale data and machine learning. Kaiping, Zhaojing
>>>>>> and Zhongle are new PhD students who jointed SINGA recently. They
>>>>>> will
>>>>>> work on this project for a longer time (next 4-5 years). While we
>>>>>> share common research interests, each member also brings diverse
>>>>>> expertise to the team.
>>>>>>
>>>>>> Alignment
>>>>>>
>>>>>> ASF is already the home of many distributed platforms, e.g., Hadoop,
>>>>>> Spark and Mahout, each of which targets a different application
>>>>>> domain. SINGA, being a distributed platform for large-scale deep
>>>>>> learning, focuses on another important domain for which there still
>>>>>> lacks a robust and scalable open-source platform. The recent success
>>>>>> of deep learning models especially for vision and speech recognition
>>>>>> tasks has generated interests in both applying existing deep learning
>>>>>> models and in developing new ones. Thus, an open-source platform for
>>>>>> deep learning will be able to attract a large community of users and
>>>>>> developers. SINGA is a complex system needing many iterations of
>>>>>> design, implementation and testing. Apache's collaboration framework
>>>>>> which encourages active contribution from developers will inevitably
>>>>>> help improve the quality of the system, as shown in the success of
>>>>>> Hadoop, Spark, etc.. Equally important is the community of users
>>>>>> which
>>>>>> helps identify real-life applications of deep learning, and helps to
>>>>>> evaluate the system's performance and ease-of-use. We hope to
>>>>>> leverage
>>>>>> ASF for coordinating and promoting both communities, and in return
>>>>>> benefit the communities with another useful tool.
>>>>>>
>>>>>> Known Risks
>>>>>>
>>>>>> Orphaned products
>>>>>>
>>>>>> Four core developers (Anh, Wei Wang, Jinyang and Sheng Wang) may
>>>>>> leave
>>>>>> the lab in two to four years time. It is possible that some of them
>>>>>> may not have enough time to focus on this project after that. But,
>>>>>> SINGA is part of our other bigger research projects on building an
>>>>>> infrastructure for data intensive applications, which include
>>>>>> health-care analytics and brain-inspired computing. Beng Chin and
>>>>>> Kian
>>>>>> Lee would continue working on it and getting more people involved.
>>>>>> For
>>>>>> example, three new developers (Kaiping, Zhaojing and Zhongle) joined
>>>>>> us recently. Individual developers are welcome to make SINGA a
>>>>>> diverse
>>>>>> community that is robust and independent from any single developer.
>>>>>>
>>>>>> Inexperience with Open Source
>>>>>>
>>>>>> All the developers are active users and followers of open source
>>>>>> projects. Our research lab has a strong commitment to open source,
>>>>>> and
>>>>>> has released the source code of several systems under open source
>>>>>> license as a way of contributing back to the open source community.
>>>>>> But we do not have much real experience in open source projects with
>>>>>> large and well organized communities like those in Apache. This is
>>>>>> one
>>>>>> reason we choose Apache which is experienced in open source project
>>>>>> incubation. We hope to get the help from Apache (e.g., champion and
>>>>>> mentors) to establish a healthy path for SINGA.
>>>>>>
>>>>>> Homogenous Developers
>>>>>>
>>>>>> Although the current developers are researchers in the universities,
>>>>>> they have different research interests and project experiences, as
>>>>>> mentioned in the section that introduces the core developers. We know
>>>>>> that a diverse community is helpful. Hence we are open to the idea of
>>>>>> recruiting developers from other regions and organizations.
>>>>>>
>>>>>> Reliance on Salaried Developers
>>>>>>
>>>>>> As a research project in the university, SINGA's current developing
>>>>>> community consists of professors, PhD students, research assistants
>>>>>> and postdoctoral fellows. They are driven by their interests to work
>>>>>> on this project and have contributed actively since the start of the
>>>>>> project. The research assistants and fellows are expected to leave
>>>>>> when their contracts expire. However, they are keen to continue to
>>>>>> work on the project voluntarily. Moreover, as a long term research
>>>>>> project, new research assistants and fellows are likely to join the
>>>>>> project.
>>>>>>
>>>>>> A Excessive Fascination with the Apache Brand
>>>>>>
>>>>>> We choose Apache not for publicity. We have two purposes. First, we
>>>>>> want to leverage Apache's reputation to recruit more developers to
>>>>>> make a diverse community. Second, we hope that Apache can help us to
>>>>>> establish a healthy path in developing SINGA. Beng Chin and Kian-Lee
>>>>>> are established database and distributed system researchers, and
>>>>>> together with the other contributors, they sincerely believe that
>>>>>> there is a need for a widely accepted open source distributed deep
>>>>>> learning platform. The field of deep learning is still at its
>>>>>> infancy,
>>>>>> and an open source platform will fuel the research in the area.
>>>>>> Moreover, such a platform will enable researchers to develop new
>>>>>> models and algorithms, rather than spending time implementing a deep
>>>>>> learning system from scratch. Furthermore, the need for scalability
>>>>>> for such a platform is obvious.
>>>>>>
>>>>>> Relationship with Other Apache Products
>>>>>>
>>>>>> Apache H2O implemented two simple deep learning models, namely the
>>>>>> Multi-Layer Perceptron and Deep Auto-encoders. There are two
>>>>>> significant differences between H2O and SINGA. First, H2O adopts the
>>>>>> Map-Reduce framework which runs a set of computing nodes in parallel
>>>>>> againsts of the training set. Model parameters trained by all
>>>>>> computing nodes are averaged as the final model parameters. This
>>>>>> training algorithm is different from the distributed training
>>>>>> algorithm used by DistBelief, Adam and SINGA, which frequently
>>>>>> synchronizes the parameters trained from different nodes. SINGA
>>>>>> adopts
>>>>>> the parameter server framework to support a wide range of distributed
>>>>>> training algorithms and parallelization methods (e.g., data
>>>>>> parallelism, model parallelism and hybrid parallelism. H2O only
>>>>>> support data parallelism) . Second, in H2O, users are restricted to
>>>>>> use the two built-in models. In SINGA, we provide simple programming
>>>>>> model to let users implement their own deep learning models. A new
>>>>>> deep learning model can be implemented by customizing the base Layer
>>>>>> class for each layer involved in the model. It is similar to writing
>>>>>> Hadoop programs where users only need to override the base Mapper and
>>>>>> Reducer. We also provide built-in models for users to use directly.
>>>>>>
>>>>>> Documentation
>>>>>>
>>>>>> The project is hosted at
>>>>>> http://www.comp.nus.edu.sg/~dbsystem/project/singa.html.
>>>>>> Documentations can be found at the Github Wiki Page:
>>>>>> https://github.com/nusinga/singa/wiki. We continue to refine and
>>>>>> improve the documentation.
>>>>>>
>>>>>> Initial Source
>>>>>>
>>>>>> We use Github to maintain our source code,
>> https://github.com/nusinga/singa
>>>>>>
>>>>>> Source and Intellectual Property Submission Plan
>>>>>>
>>>>>> We plan to make our code base be under Apache License, Version 2.0.
>>>>>>
>>>>>> External Dependencies
>>>>>>
>>>>>> required by the core code base: glog, gflags, google protobuf,
>>>>>> open-blas, mpich, armci-mpi.
>>>>>> required by data preparation and preprocessing: opencv, hdfs, python.
>>>>>>
>>>>>> Cryptography
>>>>>>
>>>>>> Not Applicable
>>>>>>
>>>>>> Required Resources
>>>>>>
>>>>>> Mailing Lists
>>>>>>
>>>>>> Currently, we use google group for internal discussion. The mailing
>>>>>> address is nusinga@googlegroup.com. We will migrate the content to
>>>>>> the
>>>>>> apache mailing lists in the future.
>>>>>>
>>>>>> singa-dev
>>>>>> singa-user
>>>>>> singa-commits
>>>>>> singa-private (for private discussion within PCM)
>>>>>>
>>>>>> Git Repository
>>>>>>
>>>>>> We want to continue using git for version control. Hence, a git repo
>>>>>> is required.
>>>>>>
>>>>>> Issue Tracking
>>>>>>
>>>>>> JIRA Singa (SINGA)
>>>>>>
>>>>>> Initial Committers
>>>>>>
>>>>>> Beng Chin Ooi (ooibc @comp.nus.edu.sg)
>>>>>> Kian Lee Tan (tankl @comp.nus.edu.sg)
>>>>>> Gang Chen (cg @zju.edu.cn)
>>>>>> Wei Wang (wangwei @comp.nus.edu.sg)
>>>>>> Dinh Tien Tuan Anh (dinhtta @comp.nus.edu.sg)
>>>>>> Jinyang Gao (jinyang.gao @comp.nus.edu.sg)
>>>>>> Sheng Wang (wangsh @comp.nus.edu.sg)
>>>>>> Kaiping Zheng (kaiping @comp.nus.edu.sg)
>>>>>> Zhaojing Luo (zhaojing @comp.nus.edu.sg)
>>>>>> Zhongle Xie (zhongle @comp.nus.edu.sg)
>>>>>>
>>>>>> Affiliations
>>>>>>
>>>>>> Beng Chin Ooi, National University of Singapore
>>>>>> Kian Lee Tan, National University of Singapore
>>>>>> Gang Chen, Zhejiang University
>>>>>> Wei Wang, National University of Singapore
>>>>>> Dinh Tien Tuan Anh, National University of Singapore
>>>>>> Jinyang Gao, National University of Singapore
>>>>>> Sheng Wang, National University of Singapore
>>>>>> Kaiping Zheng, National University of Singapore
>>>>>> Zhaojing Luo, National University of Singapore
>>>>>> Zhongle Xie, National University of Singapore
>>>>>>
>>>>>> Sponsors
>>>>>>
>>>>>> Champion
>>>>>>
>>>>>> Thejas Nair (thejas at apache.org) - Hortonworks
>>>>>>
>>>>>> Nominated Mentors
>>>>>>
>>>>>> Thejas Nair (thejas at apache.org) - Hortonworks
>>>>>> Alan Gates (gates at apache dot org) - Hortonworks
>>>>>> (Seeking more volunteers!)
>>>>>>
>>>>>> Sponsoring Entity
>>>>>>
>>>>>> We are requesting the Incubator to sponsor this project.
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>>>>> For additional commands, e-mail: general-help@incubator.apache.org
>>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>>>> For additional commands, e-mail: general-help@incubator.apache.org
>>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>> For additional commands, e-mail: general-help@incubator.apache.org
>>>
>>
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [Fwd: Re: [DISCUSS] [PROPOSAL] Singa for Apache Incubator]

Posted by oo...@comp.nus.edu.sg.
Regarding the number of users using this project -- at this moment, the
community is not big.  A few local start-ups have been trying to use it
(mainly due to announcement in our seminar list), eg. one is using it for
image recognition (given a phone snapped by a user, it wants to be return
the same the product, and a list of similar products, such as a luxury bag
on a passerby).  Researchers from outside of NUS may have been using it
since we published an application paper on cross domain/modal retrieval in
VLDB 2014.

We have not announced the project to the outside community yet -- we would
announce it in dbworld etc in due course.

Thanks and have a good weekend.

regards
beng chin

>
> Thanks for the comments and suggestions.
> With permission from Thejas, I would like to respond to point 2.
>
> We have a huge team down at NUS (National University of Singapore) --
> we have about seven database/data mining data professors (not including
> those in systems, networking, and machine learning).
> I myself have nine PhD students in a steady state, and I have a few large
> grants, with a total budget of about 15 million S$ (~12 million USD), that
> allows me to hire a number of research fellows and research assistants for
> the next few years.  In a constant state, I have about 20 people (PhD
> students/RA/RF) working with me alone.  Other professors have their own
> grants (unlike other countries, it is relatively easy to get large grants
> in Singapore; many overseas Universities, including UIUC, MIT, ETH etc
> have research labs funded by Singapore Research Foundation [equivalent of
> NSF]).
>
> SINGA is a long term project for us -- while it is a platform as it is, we
> are using it for healthcare predictive analytics (by working with a
> hospital associated with the University).  Therefore, we will be working
> on SINGA, not solely as a distributed DL platform, but as a tool that will
> enable us to do data analytics on some business domains (eg. healthcase,
> consumer etc)
>
> For the initial set of committers, three are tenured professors, five are
> students, with 2-5 years to go before they complete their PhD.  Quite
> often, some would stay back as a research fellow for a couple of years
> before they start looking for a job outside.  We will work with mentors
> and new developers (from outside of NUS or Zhejiang University) in
> enhancing the system.
>
> The project should survive in that sense.
>
> (I have an on-going project CIIDAA that has been around since 2008; it was
> started as another project, epiC,  with a different grant, and then we
> continue the development with a new grant for CIIDAA --
> http://www.comp.nus.edu.sg/~ciidaa/
> )
>
> Thanks.
>
> regards
> beng chin
> ps: i am not sure if my email will get through to the group.
>
>
> ---------------------------- Original Message ----------------------------
> Subject: Re: [DISCUSS] [PROPOSAL] Singa for Apache Incubator
> From:    "Henry Saputra" <he...@gmail.com>
> Date:    Thu, February 5, 2015 2:57 pm
> To:      "general@incubator.apache.org" <ge...@incubator.apache.org>
> Cc:      ooibc@comp.nus.edu.sg
> --------------------------------------------------------------------------
>
> Several comments:
> -) How many users already using this project? I would reccomend to
> drop request for singa-user list at the beginning.
> -) All the initial committers come from university and seemed like
> some of them already ready to leave university. I am not too sure if
> this project go survive if all of the inital committers are from
> university as students.
> -) Need to solicit more mentors if this project ever get to Apache
> incubator.
>
> - Henry
>
> On Tue, Feb 3, 2015 at 3:58 PM, Thejas Nair <th...@gmail.com> wrote:
>> The "Relationship with Other Apache Products" section has been
>> updated. The reference to H2O in that section has been removed, and
>> other projects have been added.
>>  Thanks for the feedback!
>>
>>
>> On Wed, Jan 28, 2015 at 10:27 AM, Thejas Nair <th...@gmail.com>
> wrote:
>>> Thanks for pointing that out Henry! Yes, looks like H20 is not an
>>> apache project, I should have verified that.
>>> I will edit that, and revisit that section along with the folks in
>>> Singa community.
>>>
>>>
>>> On Tue, Jan 27, 2015 at 6:55 PM, Henry Saputra
> <he...@gmail.com> wrote:
>>>> Quick immediate comment that "Apache H2O" is not really Apache
>>>> project.
>>>>
>>>> I assume you are referring to https://github.com/h2oai/h2o (or
>>>> https://github.com/h2oai/h2o-dev) ?
>>>>
>>>> - Henry
>>>>
>>>> On Tue, Jan 27, 2015 at 5:29 PM, Thejas Nair <th...@gmail.com>
> wrote:
>>>>> Hello everyone,
>>>>>
>>>>> I would like to propose the inclusion of Singa as an Apache Incubator
> project.
>>>>>
>>>>> Here is the proposal -
>>>>> https://wiki.apache.org/incubator/SingaProposal
>>>>>
>>>>> Please review the proposal and give feedback. I am planning to start
>>>>> a
>>>>> vote after 7 days if the proposal looks good.
>>>>> We are also seeking additional Apache mentors for the project.
>>>>>
>>>>> Thanks,
>>>>> Thejas
>>>>> ==========================================================
>>>>> Singa Incubator Proposal
>>>>>
>>>>> Abstract
>>>>>
>>>>> SINGA is a distributed deep learning platform.
>>>>>
>>>>> Proposal
>>>>>
>>>>> SINGA is an efficient, scalable and easy-to-use distributed platform
>>>>> for training deep learning models, e.g., Deep Convolutional Neural
>>>>> Network and Deep Belief Network. It parallelizes the computation
>>>>> (i.e., training) onto a cluster of nodes by distributing the training
>>>>> data and model automatically to speed up the training. Built-in
>>>>> training algorithms like Back-Propagation and Contrastive Divergence
>>>>> are implemented based on common abstractions of deep learning models.
>>>>> Users can train their own deep learning models by simply customizing
>>>>> these abstractions like implementing the Mapper and Reducer in
>>>>> Hadoop.
>>>>>
>>>>> Background
>>>>>
>>>>> Deep learning refers to a set of feature (or representation) learning
>>>>> models that consist of multiple (non-linear) layers, where different
>>>>> layers learn different levels of abstractions (representations) of
>>>>> the
>>>>> raw input data. Larger (in terms of model parameters) and deeper (in
>>>>> terms of number of layers) models have shown better performance,
>>>>> e.g.,
>>>>> lower image classification error in Large Scale Visual Recognition
>>>>> Challenge. However, a larger model requires more memory and larger
>>>>> training data to reduce over-fitting. Complex numeric operations make
>>>>> the training computation intensive. In practice, training large deep
>>>>> learning models takes weeks or months on a single node (even with
>>>>> GPU).
>>>>>
>>>>> Rational
>>>>>
>>>>> Deep learning has gained a lot of attraction in both academia and
>>>>> industry due to its success in a wide range of areas such as computer
>>>>> vision and speech recognition. However, training of such models is
>>>>> computationally expensive, especially for large and deep models
>>>>> (e.g.,
>>>>> with billions of parameters and more than 10 layers). Both Google and
>>>>> Microsoft have developed distributed deep learning systems to make
>>>>> the
>>>>> training more efficient by distributing the computations within a
>>>>> cluster of nodes. However, these systems are closed source softwares.
>>>>> Our goal is to leverage the community of open source developers to
>>>>> make SINGA efficient, scalable and easy to use. SINGA is a full
>>>>> fledged distributed platform, that could benefit the community and
>>>>> also benefit from the community in their involvement in contributing
>>>>> to the further work in this area. We believe the nature of SINGA and
>>>>> our visions for the system fit naturally to Apache's philosophy and
>>>>> development framework.
>>>>>
>>>>> Initial Goals
>>>>>
>>>>> We have developed a system for SINGA running on a commodity computer
>>>>> cluster. The initial goals include, * improving the system in terms
>>>>> of
>>>>> scalability and efficiency, e.g., using Infiniband for network
>>>>> communication and multi-threading for one node computation. We would
>>>>> consider extending SINGA to GPU clusters later. * benchmarking with
>>>>> larger datasets (hundreds of millions of training instances) and
>>>>> models (billions of parameters). * adding more built-in deep learning
>>>>> models. Users can train the built-in models on their datasets
>>>>> directly.
>>>>>
>>>>> Current Status
>>>>>
>>>>> Meritocracy
>>>>>
>>>>> We would like to follow ASF meritocratic principles to encourage more
>>>>> developers to contribute in this project. We know that only active
>>>>> and
>>>>> excellent developers can make SINGA a successful project. The
>>>>> committer list and PMC will be updated based on developers'
>>>>> performance and commitment. We are also improving the documentation
>>>>> and code to help new developers get started quickly.
>>>>>
>>>>> Community
>>>>>
>>>>> SINGA is currently being developed in the Database System Research
>>>>> Lab
>>>>> at the National University of Singapore (NUS) in collaboration with
>>>>> Zhejiang University in China. Our lab has extensive experience in
>>>>> building database related systems, including distributed systems. Six
>>>>> PhD students and research assistants (Jinyang Gao, Kaiping Zheng,
>>>>> Sheng Wang, Wei Wang, Zhaojing Luo and Zhongle Xie) , a research
>>>>> fellow (Anh Dinh) and three professors (Beng Chin Ooi, Gang Chen,
>>>>> Kian
>>>>> Lee Tan) have been working for a year on this project. We are open to
>>>>> recruiting more developers from diverse backgrounds.
>>>>>
>>>>> Core Developers
>>>>>
>>>>> Beng Chin Ooi, Gang Chen and Kian Lee Tan are professors who have
>>>>> worked on distributed systems for more than 20 years. They have
>>>>> collaborated with the industry and have built various large scale
>>>>> systems. Anh Dinh's research is also on distributed systems, albeit
>>>>> with more focus on security aspects. Wei Wang's research is on deep
>>>>> learning problems including deep learning applications and large
>>>>> scale
>>>>> training. Sheng Wang and Jinyang are working on efficient indexing,
>>>>> querying of large scale data and machine learning. Kaiping, Zhaojing
>>>>> and Zhongle are new PhD students who jointed SINGA recently. They
>>>>> will
>>>>> work on this project for a longer time (next 4-5 years). While we
>>>>> share common research interests, each member also brings diverse
>>>>> expertise to the team.
>>>>>
>>>>> Alignment
>>>>>
>>>>> ASF is already the home of many distributed platforms, e.g., Hadoop,
>>>>> Spark and Mahout, each of which targets a different application
>>>>> domain. SINGA, being a distributed platform for large-scale deep
>>>>> learning, focuses on another important domain for which there still
>>>>> lacks a robust and scalable open-source platform. The recent success
>>>>> of deep learning models especially for vision and speech recognition
>>>>> tasks has generated interests in both applying existing deep learning
>>>>> models and in developing new ones. Thus, an open-source platform for
>>>>> deep learning will be able to attract a large community of users and
>>>>> developers. SINGA is a complex system needing many iterations of
>>>>> design, implementation and testing. Apache's collaboration framework
>>>>> which encourages active contribution from developers will inevitably
>>>>> help improve the quality of the system, as shown in the success of
>>>>> Hadoop, Spark, etc.. Equally important is the community of users
>>>>> which
>>>>> helps identify real-life applications of deep learning, and helps to
>>>>> evaluate the system's performance and ease-of-use. We hope to
>>>>> leverage
>>>>> ASF for coordinating and promoting both communities, and in return
>>>>> benefit the communities with another useful tool.
>>>>>
>>>>> Known Risks
>>>>>
>>>>> Orphaned products
>>>>>
>>>>> Four core developers (Anh, Wei Wang, Jinyang and Sheng Wang) may
>>>>> leave
>>>>> the lab in two to four years time. It is possible that some of them
>>>>> may not have enough time to focus on this project after that. But,
>>>>> SINGA is part of our other bigger research projects on building an
>>>>> infrastructure for data intensive applications, which include
>>>>> health-care analytics and brain-inspired computing. Beng Chin and
>>>>> Kian
>>>>> Lee would continue working on it and getting more people involved.
>>>>> For
>>>>> example, three new developers (Kaiping, Zhaojing and Zhongle) joined
>>>>> us recently. Individual developers are welcome to make SINGA a
>>>>> diverse
>>>>> community that is robust and independent from any single developer.
>>>>>
>>>>> Inexperience with Open Source
>>>>>
>>>>> All the developers are active users and followers of open source
>>>>> projects. Our research lab has a strong commitment to open source,
>>>>> and
>>>>> has released the source code of several systems under open source
>>>>> license as a way of contributing back to the open source community.
>>>>> But we do not have much real experience in open source projects with
>>>>> large and well organized communities like those in Apache. This is
>>>>> one
>>>>> reason we choose Apache which is experienced in open source project
>>>>> incubation. We hope to get the help from Apache (e.g., champion and
>>>>> mentors) to establish a healthy path for SINGA.
>>>>>
>>>>> Homogenous Developers
>>>>>
>>>>> Although the current developers are researchers in the universities,
>>>>> they have different research interests and project experiences, as
>>>>> mentioned in the section that introduces the core developers. We know
>>>>> that a diverse community is helpful. Hence we are open to the idea of
>>>>> recruiting developers from other regions and organizations.
>>>>>
>>>>> Reliance on Salaried Developers
>>>>>
>>>>> As a research project in the university, SINGA's current developing
>>>>> community consists of professors, PhD students, research assistants
>>>>> and postdoctoral fellows. They are driven by their interests to work
>>>>> on this project and have contributed actively since the start of the
>>>>> project. The research assistants and fellows are expected to leave
>>>>> when their contracts expire. However, they are keen to continue to
>>>>> work on the project voluntarily. Moreover, as a long term research
>>>>> project, new research assistants and fellows are likely to join the
>>>>> project.
>>>>>
>>>>> A Excessive Fascination with the Apache Brand
>>>>>
>>>>> We choose Apache not for publicity. We have two purposes. First, we
>>>>> want to leverage Apache's reputation to recruit more developers to
>>>>> make a diverse community. Second, we hope that Apache can help us to
>>>>> establish a healthy path in developing SINGA. Beng Chin and Kian-Lee
>>>>> are established database and distributed system researchers, and
>>>>> together with the other contributors, they sincerely believe that
>>>>> there is a need for a widely accepted open source distributed deep
>>>>> learning platform. The field of deep learning is still at its
>>>>> infancy,
>>>>> and an open source platform will fuel the research in the area.
>>>>> Moreover, such a platform will enable researchers to develop new
>>>>> models and algorithms, rather than spending time implementing a deep
>>>>> learning system from scratch. Furthermore, the need for scalability
>>>>> for such a platform is obvious.
>>>>>
>>>>> Relationship with Other Apache Products
>>>>>
>>>>> Apache H2O implemented two simple deep learning models, namely the
>>>>> Multi-Layer Perceptron and Deep Auto-encoders. There are two
>>>>> significant differences between H2O and SINGA. First, H2O adopts the
>>>>> Map-Reduce framework which runs a set of computing nodes in parallel
>>>>> againsts of the training set. Model parameters trained by all
>>>>> computing nodes are averaged as the final model parameters. This
>>>>> training algorithm is different from the distributed training
>>>>> algorithm used by DistBelief, Adam and SINGA, which frequently
>>>>> synchronizes the parameters trained from different nodes. SINGA
>>>>> adopts
>>>>> the parameter server framework to support a wide range of distributed
>>>>> training algorithms and parallelization methods (e.g., data
>>>>> parallelism, model parallelism and hybrid parallelism. H2O only
>>>>> support data parallelism) . Second, in H2O, users are restricted to
>>>>> use the two built-in models. In SINGA, we provide simple programming
>>>>> model to let users implement their own deep learning models. A new
>>>>> deep learning model can be implemented by customizing the base Layer
>>>>> class for each layer involved in the model. It is similar to writing
>>>>> Hadoop programs where users only need to override the base Mapper and
>>>>> Reducer. We also provide built-in models for users to use directly.
>>>>>
>>>>> Documentation
>>>>>
>>>>> The project is hosted at
>>>>> http://www.comp.nus.edu.sg/~dbsystem/project/singa.html.
>>>>> Documentations can be found at the Github Wiki Page:
>>>>> https://github.com/nusinga/singa/wiki. We continue to refine and
>>>>> improve the documentation.
>>>>>
>>>>> Initial Source
>>>>>
>>>>> We use Github to maintain our source code,
> https://github.com/nusinga/singa
>>>>>
>>>>> Source and Intellectual Property Submission Plan
>>>>>
>>>>> We plan to make our code base be under Apache License, Version 2.0.
>>>>>
>>>>> External Dependencies
>>>>>
>>>>> required by the core code base: glog, gflags, google protobuf,
>>>>> open-blas, mpich, armci-mpi.
>>>>> required by data preparation and preprocessing: opencv, hdfs, python.
>>>>>
>>>>> Cryptography
>>>>>
>>>>> Not Applicable
>>>>>
>>>>> Required Resources
>>>>>
>>>>> Mailing Lists
>>>>>
>>>>> Currently, we use google group for internal discussion. The mailing
>>>>> address is nusinga@googlegroup.com. We will migrate the content to
>>>>> the
>>>>> apache mailing lists in the future.
>>>>>
>>>>> singa-dev
>>>>> singa-user
>>>>> singa-commits
>>>>> singa-private (for private discussion within PCM)
>>>>>
>>>>> Git Repository
>>>>>
>>>>> We want to continue using git for version control. Hence, a git repo
>>>>> is required.
>>>>>
>>>>> Issue Tracking
>>>>>
>>>>> JIRA Singa (SINGA)
>>>>>
>>>>> Initial Committers
>>>>>
>>>>> Beng Chin Ooi (ooibc @comp.nus.edu.sg)
>>>>> Kian Lee Tan (tankl @comp.nus.edu.sg)
>>>>> Gang Chen (cg @zju.edu.cn)
>>>>> Wei Wang (wangwei @comp.nus.edu.sg)
>>>>> Dinh Tien Tuan Anh (dinhtta @comp.nus.edu.sg)
>>>>> Jinyang Gao (jinyang.gao @comp.nus.edu.sg)
>>>>> Sheng Wang (wangsh @comp.nus.edu.sg)
>>>>> Kaiping Zheng (kaiping @comp.nus.edu.sg)
>>>>> Zhaojing Luo (zhaojing @comp.nus.edu.sg)
>>>>> Zhongle Xie (zhongle @comp.nus.edu.sg)
>>>>>
>>>>> Affiliations
>>>>>
>>>>> Beng Chin Ooi, National University of Singapore
>>>>> Kian Lee Tan, National University of Singapore
>>>>> Gang Chen, Zhejiang University
>>>>> Wei Wang, National University of Singapore
>>>>> Dinh Tien Tuan Anh, National University of Singapore
>>>>> Jinyang Gao, National University of Singapore
>>>>> Sheng Wang, National University of Singapore
>>>>> Kaiping Zheng, National University of Singapore
>>>>> Zhaojing Luo, National University of Singapore
>>>>> Zhongle Xie, National University of Singapore
>>>>>
>>>>> Sponsors
>>>>>
>>>>> Champion
>>>>>
>>>>> Thejas Nair (thejas at apache.org) - Hortonworks
>>>>>
>>>>> Nominated Mentors
>>>>>
>>>>> Thejas Nair (thejas at apache.org) - Hortonworks
>>>>> Alan Gates (gates at apache dot org) - Hortonworks
>>>>> (Seeking more volunteers!)
>>>>>
>>>>> Sponsoring Entity
>>>>>
>>>>> We are requesting the Incubator to sponsor this project.
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>>>> For additional commands, e-mail: general-help@incubator.apache.org
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>>> For additional commands, e-mail: general-help@incubator.apache.org
>>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: general-help@incubator.apache.org
>>
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org