You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hama.apache.org by Sachin Ghai <sa...@impetus.co.in> on 2017/02/27 08:16:11 UTC

Proposal for an Apache Hama sub-project

Hama Community,

I would like to propose a sub-project for Apache Hama and initiate discussion around the proposal. The proposed sub-project named 'Scalar' is a scalable orchestration, training and serving system for machine learning and deep learning. Scalar would leverage Apache Hama to automate the distributed training, model deployment and prediction serving.

More details about the proposal are listed below as per Apache project proposal template:
Abstract
Scalar is a general purpose framework for simplifying massive scale big data analytics and deep learning modelling, deployment, serving with high performance.
Proposal
It is a goal of Scalar to provide an abstraction framework which allows user to easily scale the functions of training a model, deploying a model and serving the prediction from underlying machine learning or deep learning framework. It is also the characteristic of its execution framework to orchestrate heterogeneous workload graphs utilizing Apache Hama, Apache Hadoop, Apache Spark and TensorFlow resources.
Background
The initial Scalar code was developed in 2016 and has been successfully beta tested for one of the largest insurance organizations in a client specific PoC. The motivation behind this work is to build a framework that provides abstraction on heterogeneous data science frameworks and helps users leverage them in the most performant way.
Rationale
There is a sudden deluge of machine learning and deep learning frameworks in the industry. As an application developer, it becomes a hard choice to switch from one framework to another without rewriting the application. Also, there is additional plumbing to be done to retrieve the prediction results for each model in different frameworks. We aim to provide an abstraction framework which can be used to seamlessly train and deploy the model at scale on multiple frameworks like TensorFlow, Apache Horn or Caffe. The abstraction further provides a unified layer for serving the prediction in the most performant, scalable and efficient way for a multi-tenant deployment. The key performance metrics will be reduction in training time, lower error rate and lower latency time for serving models.
Scalar consists of a core engine which can be used to create flows described in terms of state, sequences and algorithms. The engine invokes execution context of Apache Hama to train and deploy models on target framework. Apache Hama is used for a variety of functions including parameter tuning and scheduling computations on a distributed cluster. A data object layer provides access to data from heterogeneous sources like HDFS, local, S3 etc. A REST API layer is utilized for serving the prediction functions to client applications. A caching layer in the middle acts as a latency improver for various functions.
Initial Goals
Some current goals include:

  *   Build community.
  *   Provide general purpose API for machine learning and deep learning training, deployment and serving.
  *   Serve the predictions with low latency.
  *   Run massive workloads via Apache Hama on TensorFlow, Apache Spark and Caffe.
  *   Provide CPU and GPU support on-premise or on cloud to run the algorithms.
Current Status
Meritocracy
The core developers understand what it means to have a process based on meritocracy. We will provide continuous efforts to build an environment that supports this, encouraging community members to contribute.
Community
A small community has formed within the Apache Hama project community and companies such as enterprise services and product company and artificial intelligence startup. There is a lot of interest in data science serving systems and Artificial intelligence simplification systems. By bringing Scalar into Apache, we believe that the community will grow even bigger.
Core Developers
Edward J. Yoon, Sachin Ghai, Ishwardeep Singh, Rachna Gogia, Abhishek Soni, Nikunj Limbaseeya, Mayur Choubey
Known Risks
Orphaned Products
Apache Hama is already a core open source component being utilized at Samsung Electronics, and Scalar is already getting adopted by major enterprise organizations. There is no direct risk for Scalar project to be orphaned.
Inexperience with Open Source
All contributors have experience using and/or working on Apache open source projects.
Homogeneous Developers
The initial committers are from different organizations such as Impetus, Chalk Digital, and Samsung Electronics.
Reliance on Salaried Developers
Few will be working as full-time open source developer. Other developers will also start working on the project in their spare time.
Relationships with Other Apache Products

  *   Scalar is being built on top of Apache Hama
  *   Apache Spark is being used for machine learning.
  *   Apache Horn is being used for deep learning.
  *   The framework will run natively on Apache Hadoop and Apache Mesos.
An Excessive Fascination with the Apache Brand
Scalar itself will hopefully have benefits from Apache, in terms of attracting a community and establishing a solid group of developers, but also the relation with Apache Hadoop, Spark and Hama. These are the main reasons for us to send this proposal.
Documentation
Initial design of Scalar can be found at this link<https://drive.google.com/file/d/0B7mbLUemi6LFVHlFSzhONmZ4aU0/view?usp=sharing>.
Initial Source
Impetus Technologies (Impetus) will contribute the initial orchestration code base to create this project. Impetus plans to contribute the Scalar code base, test cases, build files, and documentation to the ASF under the terms specified in the ASF Corporate Contributor License and further develop it with wider community. Once at Apache, the project will be licensed under the ASF license.
Cryptography
Not applicable.
Required Resources
Mailing Lists

  *   scalar-dev
  *   scalar-pmc
Subversion Directory

  *   Git is the preferred source control system: git://git.apache.org/scalar
Issue Tracking

  *   a JIRA issue tracker, SCALAR
Initial Committers

  *   Sachin Ghai (sachin.ghai AT impetus DOT co DOT in)
  *   Edward J. Yoon (edwardyoon AT apache DOT org)
  *   Abhishek Soni (abhishek.soni AT impetus DOT co DOT in)
  *   Ishwardeep Singh ( ishwardeep AT chalkdigital DOT com )
  *   Nikunj Limbaseeya (nikunj.limbaseeya AT impetus DOT co DOT in)
  *   Rachna Gogia (rachna AT hadoopsphere DOT org)
  *   Mayur Choubey (mayur.choubey AT impetus DOT co DOT in)
Affiliations

  *   Sachin Ghai (Impetus)
  *   Edward J. Yoon (Samsung Electronics)
  *   Abhishek Soni (Impetus)
  *   Ishwardeep Singh ( Chalk Digital)
  *   Nikunj Limbaseeya (Impetus)
  *   Rachna Gogia (HadoopSphere)
  *   Mayur Choubey (Impetus)
Sponsors
<proposed>
Champion

  *   Edward J. Yoon <ASF member, Samsung Electronics >
Nominated Mentors

  *   Edward J. Yoon <ASF member, Samsung Electronics >
Sponsoring Entity
The Apache Hama project

-- End of proposal --

Thanks,
Sachin Ghai

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

RE: Proposal for an Apache Hama sub-project

Posted by Sachin Ghai <sa...@impetus.co.in>.
Thank you Edward for further feedback. Congratulations on the new job.

REST based prediction serving would be a key part of this distributed system design and services would be independently deployable. (for instance, a K-means service can be independent of the regression service.) I believe it would be good to rely on some of the best patterns from micro-services architecture and make design decisions in accordance as we proceed ahead.

Requesting feedback from other members of Hama community as well in this discussion and look forward to co-creating Scalar.

Thanks,
Sachin Ghai


-----Original Message-----
From: Edward J. Yoon [mailto:edward.yoon@samsung.com]
Sent: 28 February 2017 05:43 AM
To: dev@hama.apache.org
Cc: general@incubator.apache.org
Subject: RE: Proposal for an Apache Hama sub-project

Thanks for your proposal.

I of course think Apache Hama can be used for scheduling sync and async communication/computation networks with various topologies and resource allocation. However, I'm not sure whether this approach is also fit for modern microservice architecture? In my opinion, this can be discussed and cooked in Hama community as a sub-project until it's mature enough (CC'ing general@i.a.o. I'll be happy to read more feedbacks from ASF incubator community).

P.S., It seems you referred to incubation proposal template. There's no need to add me as initial committer (I don't have much time to actively contribute to your project). And, I recently quit Samsung Electronics and joined to $200 billion sized O2O e-commerce company as a CTO.

-----Original Message-----
From: Sachin Ghai [mailto:sachin.ghai@impetus.co.in]
Sent: Monday, February 27, 2017 5:16 PM
To: dev@hama.apache.org
Subject: Proposal for an Apache Hama sub-project

Hama Community,

I would like to propose a sub-project for Apache Hama and initiate discussion around the proposal. The proposed sub-project named 'Scalar' is a scalable orchestration, training and serving system for machine learning and deep learning. Scalar would leverage Apache Hama to automate the distributed training, model deployment and prediction serving.

More details about the proposal are listed below as per Apache project proposal template:
Abstract
Scalar is a general purpose framework for simplifying massive scale big data analytics and deep learning modelling, deployment, serving with high performance.
Proposal
It is a goal of Scalar to provide an abstraction framework which allows user to easily scale the functions of training a model, deploying a model and serving the prediction from underlying machine learning or deep learning framework. It is also the characteristic of its execution framework to orchestrate heterogeneous workload graphs utilizing Apache Hama, Apache Hadoop, Apache Spark and TensorFlow resources.
Background
The initial Scalar code was developed in 2016 and has been successfully beta tested for one of the largest insurance organizations in a client specific PoC. The motivation behind this work is to build a framework that provides abstraction on heterogeneous data science frameworks and helps users leverage them in the most performant way.
Rationale
There is a sudden deluge of machine learning and deep learning frameworks in the industry. As an application developer, it becomes a hard choice to switch from one framework to another without rewriting the application.
Also, there is additional plumbing to be done to retrieve the prediction results for each model in different frameworks. We aim to provide an abstraction framework which can be used to seamlessly train and deploy the model at scale on multiple frameworks like TensorFlow, Apache Horn or Caffe.
The abstraction further provides a unified layer for serving the prediction in the most performant, scalable and efficient way for a multi-tenant deployment. The key performance metrics will be reduction in training time, lower error rate and lower latency time for serving models.
Scalar consists of a core engine which can be used to create flows described in terms of state, sequences and algorithms. The engine invokes execution context of Apache Hama to train and deploy models on target framework.
Apache Hama is used for a variety of functions including parameter tuning and scheduling computations on a distributed cluster. A data object layer provides access to data from heterogeneous sources like HDFS, local, S3 etc.
A REST API layer is utilized for serving the prediction functions to client applications. A caching layer in the middle acts as a latency improver for various functions.
Initial Goals
Some current goals include:

  *   Build community.
  *   Provide general purpose API for machine learning and deep learning
training, deployment and serving.
  *   Serve the predictions with low latency.
  *   Run massive workloads via Apache Hama on TensorFlow, Apache Spark and
Caffe.
  *   Provide CPU and GPU support on-premise or on cloud to run the
algorithms.
Current Status
Meritocracy
The core developers understand what it means to have a process based on meritocracy. We will provide continuous efforts to build an environment that supports this, encouraging community members to contribute.
Community
A small community has formed within the Apache Hama project community and companies such as enterprise services and product company and artificial intelligence startup. There is a lot of interest in data science serving systems and Artificial intelligence simplification systems. By bringing Scalar into Apache, we believe that the community will grow even bigger.
Core Developers
Edward J. Yoon, Sachin Ghai, Ishwardeep Singh, Rachna Gogia, Abhishek Soni, Nikunj Limbaseeya, Mayur Choubey Known Risks Orphaned Products Apache Hama is already a core open source component being utilized at Samsung Electronics, and Scalar is already getting adopted by major enterprise organizations. There is no direct risk for Scalar project to be orphaned.
Inexperience with Open Source
All contributors have experience using and/or working on Apache open source projects.
Homogeneous Developers
The initial committers are from different organizations such as Impetus, Chalk Digital, and Samsung Electronics.
Reliance on Salaried Developers
Few will be working as full-time open source developer. Other developers will also start working on the project in their spare time.
Relationships with Other Apache Products

  *   Scalar is being built on top of Apache Hama
  *   Apache Spark is being used for machine learning.
  *   Apache Horn is being used for deep learning.
  *   The framework will run natively on Apache Hadoop and Apache Mesos.
An Excessive Fascination with the Apache Brand Scalar itself will hopefully have benefits from Apache, in terms of attracting a community and establishing a solid group of developers, but also the relation with Apache Hadoop, Spark and Hama. These are the main reasons for us to send this proposal.
Documentation
Initial design of Scalar can be found at this link<https://drive.google.com/file/d/0B7mbLUemi6LFVHlFSzhONmZ4aU0/view?usp=s
haring>.
Initial Source
Impetus Technologies (Impetus) will contribute the initial orchestration code base to create this project. Impetus plans to contribute the Scalar code base, test cases, build files, and documentation to the ASF under the terms specified in the ASF Corporate Contributor License and further develop it with wider community. Once at Apache, the project will be licensed under the ASF license.
Cryptography
Not applicable.
Required Resources
Mailing Lists

  *   scalar-dev
  *   scalar-pmc
Subversion Directory

  *   Git is the preferred source control system:
git://git.apache.org/scalar
Issue Tracking

  *   a JIRA issue tracker, SCALAR
Initial Committers

  *   Sachin Ghai (sachin.ghai AT impetus DOT co DOT in)
  *   Edward J. Yoon (edwardyoon AT apache DOT org)
  *   Abhishek Soni (abhishek.soni AT impetus DOT co DOT in)
  *   Ishwardeep Singh ( ishwardeep AT chalkdigital DOT com )
  *   Nikunj Limbaseeya (nikunj.limbaseeya AT impetus DOT co DOT in)
  *   Rachna Gogia (rachna AT hadoopsphere DOT org)
  *   Mayur Choubey (mayur.choubey AT impetus DOT co DOT in)
Affiliations

  *   Sachin Ghai (Impetus)
  *   Edward J. Yoon (Samsung Electronics)
  *   Abhishek Soni (Impetus)
  *   Ishwardeep Singh ( Chalk Digital)
  *   Nikunj Limbaseeya (Impetus)
  *   Rachna Gogia (HadoopSphere)
  *   Mayur Choubey (Impetus)
Sponsors
<proposed>
Champion

  *   Edward J. Yoon <ASF member, Samsung Electronics >
Nominated Mentors

  *   Edward J. Yoon <ASF member, Samsung Electronics >
Sponsoring Entity
The Apache Hama project

-- End of proposal --

Thanks,
Sachin Ghai

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.



________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

RE: Proposal for an Apache Hama sub-project

Posted by Sachin Ghai <sa...@impetus.co.in>.
Thank you Edward Capriolo for willingness to be a mentor and contributor.

We look forward to building a wider community and rev up activity. More contributors and mentors are welcome.
'Scalar' project proposal is listed in mail chain below.

Thanks,
Sachin Ghai

https://drive.google.com/file/d/0B7mbLUemi6LFbzFQLXB1Z1p2dm8/view?usp=sharing

-----Original Message-----
From: Edward Capriolo [mailto:edlinuxguru@gmail.com]
Sent: 02 March 2017 09:05 PM
To: general@incubator.apache.org
Subject: Re: Proposal for an Apache Hama sub-project

On Mon, Feb 27, 2017 at 7:13 PM, Edward J. Yoon <ed...@samsung.com>
wrote:

> Thanks for your proposal.
>
> I of course think Apache Hama can be used for scheduling sync and
> async communication/computation networks with various topologies and
> resource allocation. However, I'm not sure whether this approach is
> also fit for modern microservice architecture? In my opinion, this can
> be discussed and cooked in Hama community as a sub-project until it's
> mature enough (CC'ing general@i.a.o. I'll be happy to read more
> feedbacks from ASF incubator community).
>
> P.S., It seems you referred to incubation proposal template. There's
> no need to add me as initial committer (I don't have much time to
> actively contribute to your project). And, I recently quit Samsung
> Electronics and joined to $200 billion sized O2O e-commerce company as
> a CTO.
>
> -----Original Message-----
> From: Sachin Ghai [mailto:sachin.ghai@impetus.co.in]
> Sent: Monday, February 27, 2017 5:16 PM
> To: dev@hama.apache.org
> Subject: Proposal for an Apache Hama sub-project
>
> Hama Community,
>
> I would like to propose a sub-project for Apache Hama and initiate
> discussion around the proposal. The proposed sub-project named
> 'Scalar' is a scalable orchestration, training and serving system for
> machine learning and deep learning. Scalar would leverage Apache Hama
> to automate the distributed training, model deployment and prediction
> serving.
>
> More details about the proposal are listed below as per Apache project
> proposal template:
> Abstract
> Scalar is a general purpose framework for simplifying massive scale
> big data analytics and deep learning modelling, deployment, serving
> with high performance.
> Proposal
> It is a goal of Scalar to provide an abstraction framework which
> allows user to easily scale the functions of training a model,
> deploying a model and serving the prediction from underlying machine
> learning or deep learning framework. It is also the characteristic of
> its execution framework to orchestrate heterogeneous workload graphs
> utilizing Apache Hama, Apache Hadoop, Apache Spark and TensorFlow
> resources.
> Background
> The initial Scalar code was developed in 2016 and has been
> successfully beta tested for one of the largest insurance
> organizations in a client specific PoC. The motivation behind this
> work is to build a framework that provides abstraction on
> heterogeneous data science frameworks and helps users leverage them in
> the most performant way.
> Rationale
> There is a sudden deluge of machine learning and deep learning
> frameworks in the industry. As an application developer, it becomes a
> hard choice to switch from one framework to another without rewriting
> the application.
> Also, there is additional plumbing to be done to retrieve the
> prediction results for each model in different frameworks. We aim to
> provide an abstraction framework which can be used to seamlessly train
> and deploy the model at scale on multiple frameworks like TensorFlow,
> Apache Horn or Caffe.
> The abstraction further provides a unified layer for serving the
> prediction in the most performant, scalable and efficient way for a
> multi-tenant deployment. The key performance metrics will be reduction
> in training time, lower error rate and lower latency time for serving models.
> Scalar consists of a core engine which can be used to create flows
> described in terms of state, sequences and algorithms. The engine
> invokes execution context of Apache Hama to train and deploy models on
> target framework.
> Apache Hama is used for a variety of functions including parameter
> tuning and scheduling computations on a distributed cluster. A data
> object layer provides access to data from heterogeneous sources like
> HDFS, local, S3 etc.
> A REST API layer is utilized for serving the prediction functions to
> client applications. A caching layer in the middle acts as a latency
> improver for various functions.
> Initial Goals
> Some current goals include:
>
>   *   Build community.
>   *   Provide general purpose API for machine learning and deep learning
> training, deployment and serving.
>   *   Serve the predictions with low latency.
>   *   Run massive workloads via Apache Hama on TensorFlow, Apache Spark and
> Caffe.
>   *   Provide CPU and GPU support on-premise or on cloud to run the
> algorithms.
> Current Status
> Meritocracy
> The core developers understand what it means to have a process based
> on meritocracy. We will provide continuous efforts to build an
> environment that supports this, encouraging community members to
> contribute.
> Community
> A small community has formed within the Apache Hama project community
> and companies such as enterprise services and product company and
> artificial intelligence startup. There is a lot of interest in data
> science serving systems and Artificial intelligence simplification
> systems. By bringing Scalar into Apache, we believe that the community will grow even bigger.
> Core Developers
> Edward J. Yoon, Sachin Ghai, Ishwardeep Singh, Rachna Gogia, Abhishek
> Soni, Nikunj Limbaseeya, Mayur Choubey Known Risks Orphaned Products
> Apache Hama is already a core open source component being utilized at
> Samsung Electronics, and Scalar is already getting adopted by major
> enterprise organizations. There is no direct risk for Scalar project
> to be orphaned.
> Inexperience with Open Source
> All contributors have experience using and/or working on Apache open
> source projects.
> Homogeneous Developers
> The initial committers are from different organizations such as
> Impetus, Chalk Digital, and Samsung Electronics.
> Reliance on Salaried Developers
> Few will be working as full-time open source developer. Other
> developers will also start working on the project in their spare time.
> Relationships with Other Apache Products
>
>   *   Scalar is being built on top of Apache Hama
>   *   Apache Spark is being used for machine learning.
>   *   Apache Horn is being used for deep learning.
>   *   The framework will run natively on Apache Hadoop and Apache Mesos.
> An Excessive Fascination with the Apache Brand Scalar itself will
> hopefully have benefits from Apache, in terms of attracting a
> community and establishing a solid group of developers, but also the
> relation with Apache Hadoop, Spark and Hama. These are the main
> reasons for us to send this proposal.
> Documentation
> Initial design of Scalar can be found at this
> link<https://drive.google.com/file/d/0B7mbLUemi6LFVHlFSzhONm
> Z4aU0/view?usp=s
> haring>.
> Initial Source
> Impetus Technologies (Impetus) will contribute the initial
> orchestration code base to create this project. Impetus plans to
> contribute the Scalar code base, test cases, build files, and
> documentation to the ASF under the terms specified in the ASF
> Corporate Contributor License and further develop it with wider
> community. Once at Apache, the project will be licensed under the ASF
> license.
> Cryptography
> Not applicable.
> Required Resources
> Mailing Lists
>
>   *   scalar-dev
>   *   scalar-pmc
> Subversion Directory
>
>   *   Git is the preferred source control system:
> git://git.apache.org/scalar
> Issue Tracking
>
>   *   a JIRA issue tracker, SCALAR
> Initial Committers
>
>   *   Sachin Ghai (sachin.ghai AT impetus DOT co DOT in)
>   *   Edward J. Yoon (edwardyoon AT apache DOT org)
>   *   Abhishek Soni (abhishek.soni AT impetus DOT co DOT in)
>   *   Ishwardeep Singh ( ishwardeep AT chalkdigital DOT com )
>   *   Nikunj Limbaseeya (nikunj.limbaseeya AT impetus DOT co DOT in)
>   *   Rachna Gogia (rachna AT hadoopsphere DOT org)
>   *   Mayur Choubey (mayur.choubey AT impetus DOT co DOT in)
> Affiliations
>
>   *   Sachin Ghai (Impetus)
>   *   Edward J. Yoon (Samsung Electronics)
>   *   Abhishek Soni (Impetus)
>   *   Ishwardeep Singh ( Chalk Digital)
>   *   Nikunj Limbaseeya (Impetus)
>   *   Rachna Gogia (HadoopSphere)
>   *   Mayur Choubey (Impetus)
> Sponsors
> <proposed>
> Champion
>
>   *   Edward J. Yoon <ASF member, Samsung Electronics >
> Nominated Mentors
>
>   *   Edward J. Yoon <ASF member, Samsung Electronics >
> Sponsoring Entity
> The Apache Hama project
>
> -- End of proposal --
>
> Thanks,
> Sachin Ghai
>
> ________________________________
>
>
>
>
>
>
> NOTE: This message may contain information that is confidential,
> proprietary, privileged or otherwise protected by law. The message is
> intended solely for the named addressee. If received in error, please
> destroy and notify the sender. Any use of this email is prohibited
> when received in error. Impetus does not represent, warrant and/or
> guarantee, that the integrity of this communication has been
> maintained nor that the communication is free of errors, virus, interception or interference.
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>
I do not believe the the Hama project has had activity for a long time. 1 + year. For example, have attempted to broach this discussion and got no official reply: https://issues.apache.org/jira/browse/HAMA-998.

I am interested in Scalar and I would like to take time and familiarize myself with it.  I do not believe I am the right champion but I can possibly be a mentor/contributor.

Thanks,
Edward

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: Proposal for an Apache Hama sub-project

Posted by Ted Dunning <te...@gmail.com>.
Sub projects are frowned upon. It is possible for a project to graduate as
part of another project from. The incubator, but that is very unusual.
Graduating to a very quiet project like hama would be even more unusual.

A better course would be to simply create a new incubator project. Worry
about building a viable project first. Worry about graduation details
later.



On Mar 2, 2017 7:35 AM, "Edward Capriolo" <ed...@gmail.com> wrote:

> On Mon, Feb 27, 2017 at 7:13 PM, Edward J. Yoon <ed...@samsung.com>
> wrote:
>
> > Thanks for your proposal.
> >
> > I of course think Apache Hama can be used for scheduling sync and async
> > communication/computation networks with various topologies and resource
> > allocation. However, I'm not sure whether this approach is also fit for
> > modern microservice architecture? In my opinion, this can be discussed
> and
> > cooked in Hama community as a sub-project until it's mature enough
> (CC'ing
> > general@i.a.o. I'll be happy to read more feedbacks from ASF incubator
> > community).
> >
> > P.S., It seems you referred to incubation proposal template. There's no
> > need
> > to add me as initial committer (I don't have much time to actively
> > contribute to your project). And, I recently quit Samsung Electronics and
> > joined to $200 billion sized O2O e-commerce company as a CTO.
> >
> > -----Original Message-----
> > From: Sachin Ghai [mailto:sachin.ghai@impetus.co.in]
> > Sent: Monday, February 27, 2017 5:16 PM
> > To: dev@hama.apache.org
> > Subject: Proposal for an Apache Hama sub-project
> >
> > Hama Community,
> >
> > I would like to propose a sub-project for Apache Hama and initiate
> > discussion around the proposal. The proposed sub-project named 'Scalar'
> is
> > a
> > scalable orchestration, training and serving system for machine learning
> > and
> > deep learning. Scalar would leverage Apache Hama to automate the
> > distributed
> > training, model deployment and prediction serving.
> >
> > More details about the proposal are listed below as per Apache project
> > proposal template:
> > Abstract
> > Scalar is a general purpose framework for simplifying massive scale big
> > data
> > analytics and deep learning modelling, deployment, serving with high
> > performance.
> > Proposal
> > It is a goal of Scalar to provide an abstraction framework which allows
> > user
> > to easily scale the functions of training a model, deploying a model and
> > serving the prediction from underlying machine learning or deep learning
> > framework. It is also the characteristic of its execution framework to
> > orchestrate heterogeneous workload graphs utilizing Apache Hama, Apache
> > Hadoop, Apache Spark and TensorFlow resources.
> > Background
> > The initial Scalar code was developed in 2016 and has been successfully
> > beta
> > tested for one of the largest insurance organizations in a client
> specific
> > PoC. The motivation behind this work is to build a framework that
> provides
> > abstraction on heterogeneous data science frameworks and helps users
> > leverage them in the most performant way.
> > Rationale
> > There is a sudden deluge of machine learning and deep learning frameworks
> > in
> > the industry. As an application developer, it becomes a hard choice to
> > switch from one framework to another without rewriting the application.
> > Also, there is additional plumbing to be done to retrieve the prediction
> > results for each model in different frameworks. We aim to provide an
> > abstraction framework which can be used to seamlessly train and deploy
> the
> > model at scale on multiple frameworks like TensorFlow, Apache Horn or
> > Caffe.
> > The abstraction further provides a unified layer for serving the
> prediction
> > in the most performant, scalable and efficient way for a multi-tenant
> > deployment. The key performance metrics will be reduction in training
> time,
> > lower error rate and lower latency time for serving models.
> > Scalar consists of a core engine which can be used to create flows
> > described
> > in terms of state, sequences and algorithms. The engine invokes execution
> > context of Apache Hama to train and deploy models on target framework.
> > Apache Hama is used for a variety of functions including parameter tuning
> > and scheduling computations on a distributed cluster. A data object layer
> > provides access to data from heterogeneous sources like HDFS, local, S3
> > etc.
> > A REST API layer is utilized for serving the prediction functions to
> client
> > applications. A caching layer in the middle acts as a latency improver
> for
> > various functions.
> > Initial Goals
> > Some current goals include:
> >
> >   *   Build community.
> >   *   Provide general purpose API for machine learning and deep learning
> > training, deployment and serving.
> >   *   Serve the predictions with low latency.
> >   *   Run massive workloads via Apache Hama on TensorFlow, Apache Spark
> and
> > Caffe.
> >   *   Provide CPU and GPU support on-premise or on cloud to run the
> > algorithms.
> > Current Status
> > Meritocracy
> > The core developers understand what it means to have a process based on
> > meritocracy. We will provide continuous efforts to build an environment
> > that
> > supports this, encouraging community members to contribute.
> > Community
> > A small community has formed within the Apache Hama project community and
> > companies such as enterprise services and product company and artificial
> > intelligence startup. There is a lot of interest in data science serving
> > systems and Artificial intelligence simplification systems. By bringing
> > Scalar into Apache, we believe that the community will grow even bigger.
> > Core Developers
> > Edward J. Yoon, Sachin Ghai, Ishwardeep Singh, Rachna Gogia, Abhishek
> Soni,
> > Nikunj Limbaseeya, Mayur Choubey
> > Known Risks
> > Orphaned Products
> > Apache Hama is already a core open source component being utilized at
> > Samsung Electronics, and Scalar is already getting adopted by major
> > enterprise organizations. There is no direct risk for Scalar project to
> be
> > orphaned.
> > Inexperience with Open Source
> > All contributors have experience using and/or working on Apache open
> source
> > projects.
> > Homogeneous Developers
> > The initial committers are from different organizations such as Impetus,
> > Chalk Digital, and Samsung Electronics.
> > Reliance on Salaried Developers
> > Few will be working as full-time open source developer. Other developers
> > will also start working on the project in their spare time.
> > Relationships with Other Apache Products
> >
> >   *   Scalar is being built on top of Apache Hama
> >   *   Apache Spark is being used for machine learning.
> >   *   Apache Horn is being used for deep learning.
> >   *   The framework will run natively on Apache Hadoop and Apache Mesos.
> > An Excessive Fascination with the Apache Brand
> > Scalar itself will hopefully have benefits from Apache, in terms of
> > attracting a community and establishing a solid group of developers, but
> > also the relation with Apache Hadoop, Spark and Hama. These are the main
> > reasons for us to send this proposal.
> > Documentation
> > Initial design of Scalar can be found at this
> > link<https://drive.google.com/file/d/0B7mbLUemi6LFVHlFSzhONm
> > Z4aU0/view?usp=s
> > haring>.
> > Initial Source
> > Impetus Technologies (Impetus) will contribute the initial orchestration
> > code base to create this project. Impetus plans to contribute the Scalar
> > code base, test cases, build files, and documentation to the ASF under
> the
> > terms specified in the ASF Corporate Contributor License and further
> > develop
> > it with wider community. Once at Apache, the project will be licensed
> under
> > the ASF license.
> > Cryptography
> > Not applicable.
> > Required Resources
> > Mailing Lists
> >
> >   *   scalar-dev
> >   *   scalar-pmc
> > Subversion Directory
> >
> >   *   Git is the preferred source control system:
> > git://git.apache.org/scalar
> > Issue Tracking
> >
> >   *   a JIRA issue tracker, SCALAR
> > Initial Committers
> >
> >   *   Sachin Ghai (sachin.ghai AT impetus DOT co DOT in)
> >   *   Edward J. Yoon (edwardyoon AT apache DOT org)
> >   *   Abhishek Soni (abhishek.soni AT impetus DOT co DOT in)
> >   *   Ishwardeep Singh ( ishwardeep AT chalkdigital DOT com )
> >   *   Nikunj Limbaseeya (nikunj.limbaseeya AT impetus DOT co DOT in)
> >   *   Rachna Gogia (rachna AT hadoopsphere DOT org)
> >   *   Mayur Choubey (mayur.choubey AT impetus DOT co DOT in)
> > Affiliations
> >
> >   *   Sachin Ghai (Impetus)
> >   *   Edward J. Yoon (Samsung Electronics)
> >   *   Abhishek Soni (Impetus)
> >   *   Ishwardeep Singh ( Chalk Digital)
> >   *   Nikunj Limbaseeya (Impetus)
> >   *   Rachna Gogia (HadoopSphere)
> >   *   Mayur Choubey (Impetus)
> > Sponsors
> > <proposed>
> > Champion
> >
> >   *   Edward J. Yoon <ASF member, Samsung Electronics >
> > Nominated Mentors
> >
> >   *   Edward J. Yoon <ASF member, Samsung Electronics >
> > Sponsoring Entity
> > The Apache Hama project
> >
> > -- End of proposal --
> >
> > Thanks,
> > Sachin Ghai
> >
> > ________________________________
> >
> >
> >
> >
> >
> >
> > NOTE: This message may contain information that is confidential,
> > proprietary, privileged or otherwise protected by law. The message is
> > intended solely for the named addressee. If received in error, please
> > destroy and notify the sender. Any use of this email is prohibited when
> > received in error. Impetus does not represent, warrant and/or guarantee,
> > that the integrity of this communication has been maintained nor that the
> > communication is free of errors, virus, interception or interference.
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> > For additional commands, e-mail: general-help@incubator.apache.org
> >
> >
> I do not believe the the Hama project has had activity for a long time. 1 +
> year. For example, have attempted to broach this discussion and got no
> official reply: https://issues.apache.org/jira/browse/HAMA-998.
>
> I am interested in Scalar and I would like to take time and familiarize
> myself with it.  I do not believe I am the right champion but I can
> possibly be a mentor/contributor.
>
> Thanks,
> Edward
>

Re: Proposal for an Apache Hama sub-project

Posted by Edward Capriolo <ed...@gmail.com>.
On Mon, Feb 27, 2017 at 7:13 PM, Edward J. Yoon <ed...@samsung.com>
wrote:

> Thanks for your proposal.
>
> I of course think Apache Hama can be used for scheduling sync and async
> communication/computation networks with various topologies and resource
> allocation. However, I'm not sure whether this approach is also fit for
> modern microservice architecture? In my opinion, this can be discussed and
> cooked in Hama community as a sub-project until it's mature enough (CC'ing
> general@i.a.o. I'll be happy to read more feedbacks from ASF incubator
> community).
>
> P.S., It seems you referred to incubation proposal template. There's no
> need
> to add me as initial committer (I don't have much time to actively
> contribute to your project). And, I recently quit Samsung Electronics and
> joined to $200 billion sized O2O e-commerce company as a CTO.
>
> -----Original Message-----
> From: Sachin Ghai [mailto:sachin.ghai@impetus.co.in]
> Sent: Monday, February 27, 2017 5:16 PM
> To: dev@hama.apache.org
> Subject: Proposal for an Apache Hama sub-project
>
> Hama Community,
>
> I would like to propose a sub-project for Apache Hama and initiate
> discussion around the proposal. The proposed sub-project named 'Scalar' is
> a
> scalable orchestration, training and serving system for machine learning
> and
> deep learning. Scalar would leverage Apache Hama to automate the
> distributed
> training, model deployment and prediction serving.
>
> More details about the proposal are listed below as per Apache project
> proposal template:
> Abstract
> Scalar is a general purpose framework for simplifying massive scale big
> data
> analytics and deep learning modelling, deployment, serving with high
> performance.
> Proposal
> It is a goal of Scalar to provide an abstraction framework which allows
> user
> to easily scale the functions of training a model, deploying a model and
> serving the prediction from underlying machine learning or deep learning
> framework. It is also the characteristic of its execution framework to
> orchestrate heterogeneous workload graphs utilizing Apache Hama, Apache
> Hadoop, Apache Spark and TensorFlow resources.
> Background
> The initial Scalar code was developed in 2016 and has been successfully
> beta
> tested for one of the largest insurance organizations in a client specific
> PoC. The motivation behind this work is to build a framework that provides
> abstraction on heterogeneous data science frameworks and helps users
> leverage them in the most performant way.
> Rationale
> There is a sudden deluge of machine learning and deep learning frameworks
> in
> the industry. As an application developer, it becomes a hard choice to
> switch from one framework to another without rewriting the application.
> Also, there is additional plumbing to be done to retrieve the prediction
> results for each model in different frameworks. We aim to provide an
> abstraction framework which can be used to seamlessly train and deploy the
> model at scale on multiple frameworks like TensorFlow, Apache Horn or
> Caffe.
> The abstraction further provides a unified layer for serving the prediction
> in the most performant, scalable and efficient way for a multi-tenant
> deployment. The key performance metrics will be reduction in training time,
> lower error rate and lower latency time for serving models.
> Scalar consists of a core engine which can be used to create flows
> described
> in terms of state, sequences and algorithms. The engine invokes execution
> context of Apache Hama to train and deploy models on target framework.
> Apache Hama is used for a variety of functions including parameter tuning
> and scheduling computations on a distributed cluster. A data object layer
> provides access to data from heterogeneous sources like HDFS, local, S3
> etc.
> A REST API layer is utilized for serving the prediction functions to client
> applications. A caching layer in the middle acts as a latency improver for
> various functions.
> Initial Goals
> Some current goals include:
>
>   *   Build community.
>   *   Provide general purpose API for machine learning and deep learning
> training, deployment and serving.
>   *   Serve the predictions with low latency.
>   *   Run massive workloads via Apache Hama on TensorFlow, Apache Spark and
> Caffe.
>   *   Provide CPU and GPU support on-premise or on cloud to run the
> algorithms.
> Current Status
> Meritocracy
> The core developers understand what it means to have a process based on
> meritocracy. We will provide continuous efforts to build an environment
> that
> supports this, encouraging community members to contribute.
> Community
> A small community has formed within the Apache Hama project community and
> companies such as enterprise services and product company and artificial
> intelligence startup. There is a lot of interest in data science serving
> systems and Artificial intelligence simplification systems. By bringing
> Scalar into Apache, we believe that the community will grow even bigger.
> Core Developers
> Edward J. Yoon, Sachin Ghai, Ishwardeep Singh, Rachna Gogia, Abhishek Soni,
> Nikunj Limbaseeya, Mayur Choubey
> Known Risks
> Orphaned Products
> Apache Hama is already a core open source component being utilized at
> Samsung Electronics, and Scalar is already getting adopted by major
> enterprise organizations. There is no direct risk for Scalar project to be
> orphaned.
> Inexperience with Open Source
> All contributors have experience using and/or working on Apache open source
> projects.
> Homogeneous Developers
> The initial committers are from different organizations such as Impetus,
> Chalk Digital, and Samsung Electronics.
> Reliance on Salaried Developers
> Few will be working as full-time open source developer. Other developers
> will also start working on the project in their spare time.
> Relationships with Other Apache Products
>
>   *   Scalar is being built on top of Apache Hama
>   *   Apache Spark is being used for machine learning.
>   *   Apache Horn is being used for deep learning.
>   *   The framework will run natively on Apache Hadoop and Apache Mesos.
> An Excessive Fascination with the Apache Brand
> Scalar itself will hopefully have benefits from Apache, in terms of
> attracting a community and establishing a solid group of developers, but
> also the relation with Apache Hadoop, Spark and Hama. These are the main
> reasons for us to send this proposal.
> Documentation
> Initial design of Scalar can be found at this
> link<https://drive.google.com/file/d/0B7mbLUemi6LFVHlFSzhONm
> Z4aU0/view?usp=s
> haring>.
> Initial Source
> Impetus Technologies (Impetus) will contribute the initial orchestration
> code base to create this project. Impetus plans to contribute the Scalar
> code base, test cases, build files, and documentation to the ASF under the
> terms specified in the ASF Corporate Contributor License and further
> develop
> it with wider community. Once at Apache, the project will be licensed under
> the ASF license.
> Cryptography
> Not applicable.
> Required Resources
> Mailing Lists
>
>   *   scalar-dev
>   *   scalar-pmc
> Subversion Directory
>
>   *   Git is the preferred source control system:
> git://git.apache.org/scalar
> Issue Tracking
>
>   *   a JIRA issue tracker, SCALAR
> Initial Committers
>
>   *   Sachin Ghai (sachin.ghai AT impetus DOT co DOT in)
>   *   Edward J. Yoon (edwardyoon AT apache DOT org)
>   *   Abhishek Soni (abhishek.soni AT impetus DOT co DOT in)
>   *   Ishwardeep Singh ( ishwardeep AT chalkdigital DOT com )
>   *   Nikunj Limbaseeya (nikunj.limbaseeya AT impetus DOT co DOT in)
>   *   Rachna Gogia (rachna AT hadoopsphere DOT org)
>   *   Mayur Choubey (mayur.choubey AT impetus DOT co DOT in)
> Affiliations
>
>   *   Sachin Ghai (Impetus)
>   *   Edward J. Yoon (Samsung Electronics)
>   *   Abhishek Soni (Impetus)
>   *   Ishwardeep Singh ( Chalk Digital)
>   *   Nikunj Limbaseeya (Impetus)
>   *   Rachna Gogia (HadoopSphere)
>   *   Mayur Choubey (Impetus)
> Sponsors
> <proposed>
> Champion
>
>   *   Edward J. Yoon <ASF member, Samsung Electronics >
> Nominated Mentors
>
>   *   Edward J. Yoon <ASF member, Samsung Electronics >
> Sponsoring Entity
> The Apache Hama project
>
> -- End of proposal --
>
> Thanks,
> Sachin Ghai
>
> ________________________________
>
>
>
>
>
>
> NOTE: This message may contain information that is confidential,
> proprietary, privileged or otherwise protected by law. The message is
> intended solely for the named addressee. If received in error, please
> destroy and notify the sender. Any use of this email is prohibited when
> received in error. Impetus does not represent, warrant and/or guarantee,
> that the integrity of this communication has been maintained nor that the
> communication is free of errors, virus, interception or interference.
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>
I do not believe the the Hama project has had activity for a long time. 1 +
year. For example, have attempted to broach this discussion and got no
official reply: https://issues.apache.org/jira/browse/HAMA-998.

I am interested in Scalar and I would like to take time and familiarize
myself with it.  I do not believe I am the right champion but I can
possibly be a mentor/contributor.

Thanks,
Edward

RE: Proposal for an Apache Hama sub-project

Posted by "Edward J. Yoon" <ed...@samsung.com>.
Thanks for your proposal.

I of course think Apache Hama can be used for scheduling sync and async
communication/computation networks with various topologies and resource
allocation. However, I'm not sure whether this approach is also fit for
modern microservice architecture? In my opinion, this can be discussed and
cooked in Hama community as a sub-project until it's mature enough (CC'ing
general@i.a.o. I'll be happy to read more feedbacks from ASF incubator
community).

P.S., It seems you referred to incubation proposal template. There's no need
to add me as initial committer (I don't have much time to actively
contribute to your project). And, I recently quit Samsung Electronics and
joined to $200 billion sized O2O e-commerce company as a CTO.

-----Original Message-----
From: Sachin Ghai [mailto:sachin.ghai@impetus.co.in]
Sent: Monday, February 27, 2017 5:16 PM
To: dev@hama.apache.org
Subject: Proposal for an Apache Hama sub-project

Hama Community,

I would like to propose a sub-project for Apache Hama and initiate
discussion around the proposal. The proposed sub-project named 'Scalar' is a
scalable orchestration, training and serving system for machine learning and
deep learning. Scalar would leverage Apache Hama to automate the distributed
training, model deployment and prediction serving.

More details about the proposal are listed below as per Apache project
proposal template:
Abstract
Scalar is a general purpose framework for simplifying massive scale big data
analytics and deep learning modelling, deployment, serving with high
performance.
Proposal
It is a goal of Scalar to provide an abstraction framework which allows user
to easily scale the functions of training a model, deploying a model and
serving the prediction from underlying machine learning or deep learning
framework. It is also the characteristic of its execution framework to
orchestrate heterogeneous workload graphs utilizing Apache Hama, Apache
Hadoop, Apache Spark and TensorFlow resources.
Background
The initial Scalar code was developed in 2016 and has been successfully beta
tested for one of the largest insurance organizations in a client specific
PoC. The motivation behind this work is to build a framework that provides
abstraction on heterogeneous data science frameworks and helps users
leverage them in the most performant way.
Rationale
There is a sudden deluge of machine learning and deep learning frameworks in
the industry. As an application developer, it becomes a hard choice to
switch from one framework to another without rewriting the application.
Also, there is additional plumbing to be done to retrieve the prediction
results for each model in different frameworks. We aim to provide an
abstraction framework which can be used to seamlessly train and deploy the
model at scale on multiple frameworks like TensorFlow, Apache Horn or Caffe.
The abstraction further provides a unified layer for serving the prediction
in the most performant, scalable and efficient way for a multi-tenant
deployment. The key performance metrics will be reduction in training time,
lower error rate and lower latency time for serving models.
Scalar consists of a core engine which can be used to create flows described
in terms of state, sequences and algorithms. The engine invokes execution
context of Apache Hama to train and deploy models on target framework.
Apache Hama is used for a variety of functions including parameter tuning
and scheduling computations on a distributed cluster. A data object layer
provides access to data from heterogeneous sources like HDFS, local, S3 etc.
A REST API layer is utilized for serving the prediction functions to client
applications. A caching layer in the middle acts as a latency improver for
various functions.
Initial Goals
Some current goals include:

  *   Build community.
  *   Provide general purpose API for machine learning and deep learning
training, deployment and serving.
  *   Serve the predictions with low latency.
  *   Run massive workloads via Apache Hama on TensorFlow, Apache Spark and
Caffe.
  *   Provide CPU and GPU support on-premise or on cloud to run the
algorithms.
Current Status
Meritocracy
The core developers understand what it means to have a process based on
meritocracy. We will provide continuous efforts to build an environment that
supports this, encouraging community members to contribute.
Community
A small community has formed within the Apache Hama project community and
companies such as enterprise services and product company and artificial
intelligence startup. There is a lot of interest in data science serving
systems and Artificial intelligence simplification systems. By bringing
Scalar into Apache, we believe that the community will grow even bigger.
Core Developers
Edward J. Yoon, Sachin Ghai, Ishwardeep Singh, Rachna Gogia, Abhishek Soni,
Nikunj Limbaseeya, Mayur Choubey
Known Risks
Orphaned Products
Apache Hama is already a core open source component being utilized at
Samsung Electronics, and Scalar is already getting adopted by major
enterprise organizations. There is no direct risk for Scalar project to be
orphaned.
Inexperience with Open Source
All contributors have experience using and/or working on Apache open source
projects.
Homogeneous Developers
The initial committers are from different organizations such as Impetus,
Chalk Digital, and Samsung Electronics.
Reliance on Salaried Developers
Few will be working as full-time open source developer. Other developers
will also start working on the project in their spare time.
Relationships with Other Apache Products

  *   Scalar is being built on top of Apache Hama
  *   Apache Spark is being used for machine learning.
  *   Apache Horn is being used for deep learning.
  *   The framework will run natively on Apache Hadoop and Apache Mesos.
An Excessive Fascination with the Apache Brand
Scalar itself will hopefully have benefits from Apache, in terms of
attracting a community and establishing a solid group of developers, but
also the relation with Apache Hadoop, Spark and Hama. These are the main
reasons for us to send this proposal.
Documentation
Initial design of Scalar can be found at this
link<https://drive.google.com/file/d/0B7mbLUemi6LFVHlFSzhONmZ4aU0/view?usp=s
haring>.
Initial Source
Impetus Technologies (Impetus) will contribute the initial orchestration
code base to create this project. Impetus plans to contribute the Scalar
code base, test cases, build files, and documentation to the ASF under the
terms specified in the ASF Corporate Contributor License and further develop
it with wider community. Once at Apache, the project will be licensed under
the ASF license.
Cryptography
Not applicable.
Required Resources
Mailing Lists

  *   scalar-dev
  *   scalar-pmc
Subversion Directory

  *   Git is the preferred source control system:
git://git.apache.org/scalar
Issue Tracking

  *   a JIRA issue tracker, SCALAR
Initial Committers

  *   Sachin Ghai (sachin.ghai AT impetus DOT co DOT in)
  *   Edward J. Yoon (edwardyoon AT apache DOT org)
  *   Abhishek Soni (abhishek.soni AT impetus DOT co DOT in)
  *   Ishwardeep Singh ( ishwardeep AT chalkdigital DOT com )
  *   Nikunj Limbaseeya (nikunj.limbaseeya AT impetus DOT co DOT in)
  *   Rachna Gogia (rachna AT hadoopsphere DOT org)
  *   Mayur Choubey (mayur.choubey AT impetus DOT co DOT in)
Affiliations

  *   Sachin Ghai (Impetus)
  *   Edward J. Yoon (Samsung Electronics)
  *   Abhishek Soni (Impetus)
  *   Ishwardeep Singh ( Chalk Digital)
  *   Nikunj Limbaseeya (Impetus)
  *   Rachna Gogia (HadoopSphere)
  *   Mayur Choubey (Impetus)
Sponsors
<proposed>
Champion

  *   Edward J. Yoon <ASF member, Samsung Electronics >
Nominated Mentors

  *   Edward J. Yoon <ASF member, Samsung Electronics >
Sponsoring Entity
The Apache Hama project

-- End of proposal --

Thanks,
Sachin Ghai

________________________________






NOTE: This message may contain information that is confidential,
proprietary, privileged or otherwise protected by law. The message is
intended solely for the named addressee. If received in error, please
destroy and notify the sender. Any use of this email is prohibited when
received in error. Impetus does not represent, warrant and/or guarantee,
that the integrity of this communication has been maintained nor that the
communication is free of errors, virus, interception or interference.



---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


RE: Proposal for an Apache Hama sub-project

Posted by "Edward J. Yoon" <ed...@samsung.com>.
Thanks for your proposal.

I of course think Apache Hama can be used for scheduling sync and async
communication/computation networks with various topologies and resource
allocation. However, I'm not sure whether this approach is also fit for
modern microservice architecture? In my opinion, this can be discussed and
cooked in Hama community as a sub-project until it's mature enough (CC'ing
general@i.a.o. I'll be happy to read more feedbacks from ASF incubator
community).

P.S., It seems you referred to incubation proposal template. There's no need
to add me as initial committer (I don't have much time to actively
contribute to your project). And, I recently quit Samsung Electronics and
joined to $200 billion sized O2O e-commerce company as a CTO.

-----Original Message-----
From: Sachin Ghai [mailto:sachin.ghai@impetus.co.in]
Sent: Monday, February 27, 2017 5:16 PM
To: dev@hama.apache.org
Subject: Proposal for an Apache Hama sub-project

Hama Community,

I would like to propose a sub-project for Apache Hama and initiate
discussion around the proposal. The proposed sub-project named 'Scalar' is a
scalable orchestration, training and serving system for machine learning and
deep learning. Scalar would leverage Apache Hama to automate the distributed
training, model deployment and prediction serving.

More details about the proposal are listed below as per Apache project
proposal template:
Abstract
Scalar is a general purpose framework for simplifying massive scale big data
analytics and deep learning modelling, deployment, serving with high
performance.
Proposal
It is a goal of Scalar to provide an abstraction framework which allows user
to easily scale the functions of training a model, deploying a model and
serving the prediction from underlying machine learning or deep learning
framework. It is also the characteristic of its execution framework to
orchestrate heterogeneous workload graphs utilizing Apache Hama, Apache
Hadoop, Apache Spark and TensorFlow resources.
Background
The initial Scalar code was developed in 2016 and has been successfully beta
tested for one of the largest insurance organizations in a client specific
PoC. The motivation behind this work is to build a framework that provides
abstraction on heterogeneous data science frameworks and helps users
leverage them in the most performant way.
Rationale
There is a sudden deluge of machine learning and deep learning frameworks in
the industry. As an application developer, it becomes a hard choice to
switch from one framework to another without rewriting the application.
Also, there is additional plumbing to be done to retrieve the prediction
results for each model in different frameworks. We aim to provide an
abstraction framework which can be used to seamlessly train and deploy the
model at scale on multiple frameworks like TensorFlow, Apache Horn or Caffe.
The abstraction further provides a unified layer for serving the prediction
in the most performant, scalable and efficient way for a multi-tenant
deployment. The key performance metrics will be reduction in training time,
lower error rate and lower latency time for serving models.
Scalar consists of a core engine which can be used to create flows described
in terms of state, sequences and algorithms. The engine invokes execution
context of Apache Hama to train and deploy models on target framework.
Apache Hama is used for a variety of functions including parameter tuning
and scheduling computations on a distributed cluster. A data object layer
provides access to data from heterogeneous sources like HDFS, local, S3 etc.
A REST API layer is utilized for serving the prediction functions to client
applications. A caching layer in the middle acts as a latency improver for
various functions.
Initial Goals
Some current goals include:

  *   Build community.
  *   Provide general purpose API for machine learning and deep learning
training, deployment and serving.
  *   Serve the predictions with low latency.
  *   Run massive workloads via Apache Hama on TensorFlow, Apache Spark and
Caffe.
  *   Provide CPU and GPU support on-premise or on cloud to run the
algorithms.
Current Status
Meritocracy
The core developers understand what it means to have a process based on
meritocracy. We will provide continuous efforts to build an environment that
supports this, encouraging community members to contribute.
Community
A small community has formed within the Apache Hama project community and
companies such as enterprise services and product company and artificial
intelligence startup. There is a lot of interest in data science serving
systems and Artificial intelligence simplification systems. By bringing
Scalar into Apache, we believe that the community will grow even bigger.
Core Developers
Edward J. Yoon, Sachin Ghai, Ishwardeep Singh, Rachna Gogia, Abhishek Soni,
Nikunj Limbaseeya, Mayur Choubey
Known Risks
Orphaned Products
Apache Hama is already a core open source component being utilized at
Samsung Electronics, and Scalar is already getting adopted by major
enterprise organizations. There is no direct risk for Scalar project to be
orphaned.
Inexperience with Open Source
All contributors have experience using and/or working on Apache open source
projects.
Homogeneous Developers
The initial committers are from different organizations such as Impetus,
Chalk Digital, and Samsung Electronics.
Reliance on Salaried Developers
Few will be working as full-time open source developer. Other developers
will also start working on the project in their spare time.
Relationships with Other Apache Products

  *   Scalar is being built on top of Apache Hama
  *   Apache Spark is being used for machine learning.
  *   Apache Horn is being used for deep learning.
  *   The framework will run natively on Apache Hadoop and Apache Mesos.
An Excessive Fascination with the Apache Brand
Scalar itself will hopefully have benefits from Apache, in terms of
attracting a community and establishing a solid group of developers, but
also the relation with Apache Hadoop, Spark and Hama. These are the main
reasons for us to send this proposal.
Documentation
Initial design of Scalar can be found at this
link<https://drive.google.com/file/d/0B7mbLUemi6LFVHlFSzhONmZ4aU0/view?usp=s
haring>.
Initial Source
Impetus Technologies (Impetus) will contribute the initial orchestration
code base to create this project. Impetus plans to contribute the Scalar
code base, test cases, build files, and documentation to the ASF under the
terms specified in the ASF Corporate Contributor License and further develop
it with wider community. Once at Apache, the project will be licensed under
the ASF license.
Cryptography
Not applicable.
Required Resources
Mailing Lists

  *   scalar-dev
  *   scalar-pmc
Subversion Directory

  *   Git is the preferred source control system:
git://git.apache.org/scalar
Issue Tracking

  *   a JIRA issue tracker, SCALAR
Initial Committers

  *   Sachin Ghai (sachin.ghai AT impetus DOT co DOT in)
  *   Edward J. Yoon (edwardyoon AT apache DOT org)
  *   Abhishek Soni (abhishek.soni AT impetus DOT co DOT in)
  *   Ishwardeep Singh ( ishwardeep AT chalkdigital DOT com )
  *   Nikunj Limbaseeya (nikunj.limbaseeya AT impetus DOT co DOT in)
  *   Rachna Gogia (rachna AT hadoopsphere DOT org)
  *   Mayur Choubey (mayur.choubey AT impetus DOT co DOT in)
Affiliations

  *   Sachin Ghai (Impetus)
  *   Edward J. Yoon (Samsung Electronics)
  *   Abhishek Soni (Impetus)
  *   Ishwardeep Singh ( Chalk Digital)
  *   Nikunj Limbaseeya (Impetus)
  *   Rachna Gogia (HadoopSphere)
  *   Mayur Choubey (Impetus)
Sponsors
<proposed>
Champion

  *   Edward J. Yoon <ASF member, Samsung Electronics >
Nominated Mentors

  *   Edward J. Yoon <ASF member, Samsung Electronics >
Sponsoring Entity
The Apache Hama project

-- End of proposal --

Thanks,
Sachin Ghai

________________________________






NOTE: This message may contain information that is confidential,
proprietary, privileged or otherwise protected by law. The message is
intended solely for the named addressee. If received in error, please
destroy and notify the sender. Any use of this email is prohibited when
received in error. Impetus does not represent, warrant and/or guarantee,
that the integrity of this communication has been maintained nor that the
communication is free of errors, virus, interception or interference.