You are viewing a plain text version of this content. The canonical link for it is here.

Posted to general@incubator.apache.org by da...@fallside.com on 2015/11/13 01:17:42 UTC

[DISCUSS] Spark-Kernel Incubator Proposal

Hello, we would like to start a discussion on accepting the Spark-Kernel,
a mechanism for applications to interactively and remotely access Apache
Spark, into the Apache Incubator.

The proposal is available online at
https://wiki.apache.org/incubator/SparkKernelProposal, and it is appended
to this email.

We are looking for additional mentors to help with this project, and we
would much appreciate your guidance and advice.

Thank-you in advance,
David Fallside



= Spark-Kernel Proposal =

== Abstract ==
Spark-Kernel provides applications with a mechanism to interactively and
remotely access Apache Spark.

== Proposal ==
The Spark-Kernel enables interactive applications to access Apache Spark
clusters. More specifically:
 * Applications can send code-snippets and libraries for execution by Spark
 * Applications can be deployed separately from Spark clusters and
communicate with the Spark-Kernel using the provided Spark-Kernel client
 * Execution results and streaming data can be sent back to calling
applications
 * Applications no longer have to be network connected to the workers on a
Spark cluster because the Spark-Kernel acts as each applications proxy
 * Work has started on enabling Spark-Kernel to support languages in
addition to Scala, namely Python (with PySpark), R (with SparkR), and SQL
(with SparkSQL)

== Background & Rationale ==
Apache Spark provides applications with a fast and general purpose
distributed computing engine that supports static and streaming data,
tabular and graph representations of data, and an extensive library of
machine learning libraries. Consequently, a wide variety of applications
will be written for Spark and there will be interactive applications that
require relatively frequent function evaluations, and batch-oriented
applications that require one-shot or only occasional evaluation.

Apache Spark provides two mechanisms for applications to connect with
Spark. The primary mechanism launches applications on Spark clusters using
spark-submit
(http://spark.apache.org/docs/latest/submitting-applications.html); this
requires developers to bundle their application code plus any dependencies
into JAR files, and then submit them to Spark. A second mechanism is an
ODBC/JDBC API
(http://spark.apache.org/docs/latest/sql-programming-guide.html#distributed-sql-engine)
which enables applications to issue SQL queries against SparkSQL.

Our experience when developing interactive applications, such as analytic
applications and Jupyter Notebooks, to run against Spark was that the
spark-submit mechanism was overly cumbersome and slow (requiring JAR
creation and forking processes to run spark-submit), and the SQL interface
was too limiting and did not offer easy access to components other than
SparkSQL, such as streaming. The most promising mechanism provided by
Apache Spark was the command-line shell
(http://spark.apache.org/docs/latest/programming-guide.html#using-the-shell)
which enabled us to execute code snippets and dynamically control the
tasks submitted to  a Spark cluster. Spark does not provide the
command-line shell as a consumable service but it provided us with the
starting point from which we developed the Spark-Kernel.

== Current Status ==
Spark-Kernel was first developed by a small team working on an
internal-IBM Spark-related project in July 2014. In recognition of its
likely general utility to Spark users and developers, in November 2014 the
Spark-Kernel project was moved to GitHub and made available under the
Apache License V2.

== Meritocracy ==
The current developers are familiar with the meritocratic open source
development process at Apache. As the project has gathered interest at
GitHub the developers have actively started a process to invite additional
developers into the project, and we have at least one new developer who is
ready to contribute code to the project.

== Community ==
We started building a community around the Spark-Kernel project when we
moved it to GitHub about one year ago. Since then we have grown to about
70 people, and there are regular requests and suggestions from the
community. We believe that providing Apache Spark application developers
with a general-purpose and interactive API holds a lot of community
potential, especially considering possible tie-ins with the Jupyter and
data science community.

== Core Developers ==
The core developers of the project are currently all from IBM, from the
IBM Emerging Technology team and from IBMs recently formed Spark
Technology Center.

== Alignment ==
Apache, as the home of Apache Spark, is the most natural home for the
Spark-Kernel project because it was designed to work with Apache Spark and
to provide capabilities for interactive applications and data science
tools not provided by Spark itself.

The Spark-Kernel also has an affinity with Jupyter (jupyter.org) because
it uses the Jupyter protocol for communications, and so Jupyter Notebooks
can directly use the Spark-Kernel as a kernel for communicating with
Apache Spark. However, we believe that the Spark-Kernel provides a
general-purpose mechanism enabling a wider variety of applications than
just Notebooks to access Spark, and so the Spark-Kernels greatest
affinity is with Apache and Apache Spark.

== Known Risks ==
=== Orphaned products ===
We believe the Spark-Kernel project has a low-risk of abandonment due to
interest in its continuing existence from several parties. More
specifically, the Spark-Kernel provides a capability that is not provided
by Apache Spark today but it enables a wider range of applications to
leverage Spark. For example, IBM uses (and is considering) the
Spark-Kernel in several offerings including its IBM Analytics for Apache
Spark product in the Bluemix Cloud. There are also a couple of other
commercial users who are using or considering its use in their offerings.
Furthermore, Jupyter Notebooks are used by data scientists and Spark is
gaining popularity as an analytic engine for them. Jupyter Notebooks are
very easily enabled with the Spark-Kernel and so there is another
constituency for it.

=== Inexperience with Open Source ===
The Spark-Kernel project has been running as an open-source project
(albeit with only IBM committers) for the past several months. The project
has an active issue tracker and due to the interest indicated by the
nature and volume of requests and comments, the team has publicly stated
it is beginning to build a process so they can accept third-party
contributions to the project.

=== Relationships with Other Apache Products ===
The Spark-Kernel has a clear affinity with the Apache Spark project
because it is designed to  provide capabilities for interactive
applications and data science tools not provided by Spark itself. The
Spark-Kernel can be a back-end for the Zeppelin project currently
incubating at Apache. There is interest from the Spark-Kernel community to
develop this capability and an experimental branch has been started.

=== Homogeneous Developers ===
The current group of developers working on Spark-Kernel are all from IBM
although the group is in the process of expanding its membership to
include members of the GitHub community who are not from IBM and who have
been active in the Spark-Kernel community in GutHub.

=== Reliance on Salaried Developers ===
The initial committers are full-time employees at IBM although not all
work on the project full-time.

=== Excessive Fascination with the Apache Brand ===
We believe the Spark-Kernel benefits Apache Spark application developers,
and we are interested in an Apache Spark-Kernel project to benefit these
developers by engaging a larger community, facilitating closer ties with
the existing Spark project, and yes, gaining more visibility for the
Spark-Kernel as a solution.

We have recently become aware that the project name Spark-Kernel may be
interpreted as having an association with an Apache project. If the
project is accepted by Apache, we suggest the project name remains the
same, but otherwise we will change it to one that does not imply any
Apache association.

=== Documentation ===
Comprehensive documentation including Getting Started, API
specifications and a Roadmap are available from the GitHub project, see
https://github.com/ibm-et/spark-kernel/wiki.

=== Initial Source ===
The source code resides at https://github.com/ibm-et/spark-kernel.

=== External Dependencies ===
The Spark-Kernel depends upon a number of Apache projects:
 * Spark
 * Hadoop
 * Ivy
 * Commons

The Spark-Kernel also depends upon a number of other open source projects:
 * JeroMQ (LGPL with Static Linking Exception,
http://zeromq.org/area:licensing)
 * Akka (MIT)
 * JOpt Simple (MIT)
 * Spring Framework Core (Apache v2)
 * Play (Apache v2)
 * SLF4J (MIT)
 * Scala
 * Scalatest (Apache v2)
 * Scalactic (Apache v2)
 * Mockito (MIT)

== Required Resources ==
Developer and user mailing lists
 * private@spark-kernel.incubator.apache.org (with moderated subscriptions)
 * commits@spark-kernel.incubator.apache.org
 * dev@spark-kernel.incubator.apache.org
 * users@spark-kernel.incubator.apache.org

A git repository:
https://git-wip-us.apache.org/repos/asf/incubator-spark-kernel.git

A JIRA issue tracker: https://issues.apache.org/jira/browse/SPARK-KERNEL

== Initial Committers ==
The initial list of committers is:
 * Leugim Bustelo (gino@bustelos.com)
 * Jakob Odersky (jodersky@gmail.com)
 * Luciano Resende (lresende@apache.org)
 * Robert Senkbeil (chip.senkbeil@gmail.com)
 * Corey Stubbs (cas5542@gmail.com)
 * Miao Wang (wm624@hotmail.com)
 * Sean Welleck (wellecks@gmail.com)

=== Affiliations ===
All of the initial committers are employed by IBM.

== Sponsors ==
=== Champion ===
 * Sam Ruby (IBM)

=== Nominated Mentors ===
 * Luciano Resende

We wish to recruit additional mentors during incubation.

=== Sponsoring Entity ===
The Apache Incubator.



---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [DISCUSS] Spark-Kernel Incubator Proposal

Posted by Hitesh Shah <hi...@apache.org>.

Hi David, 

I would be happy to help out as a mentor.

thanks
— Hitesh

On Nov 12, 2015, at 4:17 PM, david@fallside.com wrote:

> Hello, we would like to start a discussion on accepting the Spark-Kernel,
> a mechanism for applications to interactively and remotely access Apache
> Spark, into the Apache Incubator.
> 
> The proposal is available online at
> https://wiki.apache.org/incubator/SparkKernelProposal, and it is appended
> to this email.
> 
> We are looking for additional mentors to help with this project, and we
> would much appreciate your guidance and advice.
> 
> Thank-you in advance,
> David Fallside
> 
> 
> 
> = Spark-Kernel Proposal =
> 
> == Abstract ==
> Spark-Kernel provides applications with a mechanism to interactively and
> remotely access Apache Spark.
> 
> == Proposal ==
> The Spark-Kernel enables interactive applications to access Apache Spark
> clusters. More specifically:
> * Applications can send code-snippets and libraries for execution by Spark
> * Applications can be deployed separately from Spark clusters and
> communicate with the Spark-Kernel using the provided Spark-Kernel client
> * Execution results and streaming data can be sent back to calling
> applications
> * Applications no longer have to be network connected to the workers on a
> Spark cluster because the Spark-Kernel acts as each application’s proxy
> * Work has started on enabling Spark-Kernel to support languages in
> addition to Scala, namely Python (with PySpark), R (with SparkR), and SQL
> (with SparkSQL)
> 
> == Background & Rationale ==
> Apache Spark provides applications with a fast and general purpose
> distributed computing engine that supports static and streaming data,
> tabular and graph representations of data, and an extensive library of
> machine learning libraries. Consequently, a wide variety of applications
> will be written for Spark and there will be interactive applications that
> require relatively frequent function evaluations, and batch-oriented
> applications that require one-shot or only occasional evaluation.
> 
> Apache Spark provides two mechanisms for applications to connect with
> Spark. The primary mechanism launches applications on Spark clusters using
> spark-submit
> (http://spark.apache.org/docs/latest/submitting-applications.html); this
> requires developers to bundle their application code plus any dependencies
> into JAR files, and then submit them to Spark. A second mechanism is an
> ODBC/JDBC API
> (http://spark.apache.org/docs/latest/sql-programming-guide.html#distributed-sql-engine)
> which enables applications to issue SQL queries against SparkSQL.
> 
> Our experience when developing interactive applications, such as analytic
> applications and Jupyter Notebooks, to run against Spark was that the
> spark-submit mechanism was overly cumbersome and slow (requiring JAR
> creation and forking processes to run spark-submit), and the SQL interface
> was too limiting and did not offer easy access to components other than
> SparkSQL, such as streaming. The most promising mechanism provided by
> Apache Spark was the command-line shell
> (http://spark.apache.org/docs/latest/programming-guide.html#using-the-shell)
> which enabled us to execute code snippets and dynamically control the
> tasks submitted to  a Spark cluster. Spark does not provide the
> command-line shell as a consumable service but it provided us with the
> starting point from which we developed the Spark-Kernel.
> 
> == Current Status ==
> Spark-Kernel was first developed by a small team working on an
> internal-IBM Spark-related project in July 2014. In recognition of its
> likely general utility to Spark users and developers, in November 2014 the
> Spark-Kernel project was moved to GitHub and made available under the
> Apache License V2.
> 
> == Meritocracy ==
> The current developers are familiar with the meritocratic open source
> development process at Apache. As the project has gathered interest at
> GitHub the developers have actively started a process to invite additional
> developers into the project, and we have at least one new developer who is
> ready to contribute code to the project.
> 
> == Community ==
> We started building a community around the Spark-Kernel project when we
> moved it to GitHub about one year ago. Since then we have grown to about
> 70 people, and there are regular requests and suggestions from the
> community. We believe that providing Apache Spark application developers
> with a general-purpose and interactive API holds a lot of community
> potential, especially considering possible tie-in’s with the Jupyter and
> data science community.
> 
> == Core Developers ==
> The core developers of the project are currently all from IBM, from the
> IBM Emerging Technology team and from IBM’s recently formed Spark
> Technology Center.
> 
> == Alignment ==
> Apache, as the home of Apache Spark, is the most natural home for the
> Spark-Kernel project because it was designed to work with Apache Spark and
> to provide capabilities for interactive applications and data science
> tools not provided by Spark itself.
> 
> The Spark-Kernel also has an affinity with Jupyter (jupyter.org) because
> it uses the Jupyter protocol for communications, and so Jupyter Notebooks
> can directly use the Spark-Kernel as a kernel for communicating with
> Apache Spark. However, we believe that the Spark-Kernel provides a
> general-purpose mechanism enabling a wider variety of applications than
> just Notebooks to access Spark, and so the Spark-Kernel’s greatest
> affinity is with Apache and Apache Spark.
> 
> == Known Risks ==
> === Orphaned products ===
> We believe the Spark-Kernel project has a low-risk of abandonment due to
> interest in its continuing existence from several parties. More
> specifically, the Spark-Kernel provides a capability that is not provided
> by Apache Spark today but it enables a wider range of applications to
> leverage Spark. For example, IBM uses (and is considering) the
> Spark-Kernel in several offerings including its IBM Analytics for Apache
> Spark product in the Bluemix Cloud. There are also a couple of other
> commercial users who are using or considering its use in their offerings.
> Furthermore, Jupyter Notebooks are used by data scientists and Spark is
> gaining popularity as an analytic engine for them. Jupyter Notebooks are
> very easily enabled with the Spark-Kernel and so there is another
> constituency for it.
> 
> === Inexperience with Open Source ===
> The Spark-Kernel project has been running as an open-source project
> (albeit with only IBM committers) for the past several months. The project
> has an active issue tracker and due to the interest indicated by the
> nature and volume of requests and comments, the team has publicly stated
> it is beginning to build a process so they can accept third-party
> contributions to the project.
> 
> === Relationships with Other Apache Products ===
> The Spark-Kernel has a clear affinity with the Apache Spark project
> because it is designed to  provide capabilities for interactive
> applications and data science tools not provided by Spark itself. The
> Spark-Kernel can be a back-end for the Zeppelin project currently
> incubating at Apache. There is interest from the Spark-Kernel community to
> develop this capability and an experimental branch has been started.
> 
> === Homogeneous Developers ===
> The current group of developers working on Spark-Kernel are all from IBM
> although the group is in the process of expanding its membership to
> include members of the GitHub community who are not from IBM and who have
> been active in the Spark-Kernel community in GutHub.
> 
> === Reliance on Salaried Developers ===
> The initial committers are full-time employees at IBM although not all
> work on the project full-time.
> 
> === Excessive Fascination with the Apache Brand ===
> We believe the Spark-Kernel benefits Apache Spark application developers,
> and we are interested in an Apache Spark-Kernel project to benefit these
> developers by engaging a larger community, facilitating closer ties with
> the existing Spark project, and yes, gaining more visibility for the
> Spark-Kernel as a solution.
> 
> We have recently become aware that the project name “Spark-Kernel” may be
> interpreted as having an association with an Apache project. If the
> project is accepted by Apache, we suggest the project name remains the
> same, but otherwise we will change it to one that does not imply any
> Apache association.
> 
> === Documentation ===
> Comprehensive documentation including “Getting Started”, API
> specifications and a Roadmap are available from the GitHub project, see
> https://github.com/ibm-et/spark-kernel/wiki.
> 
> === Initial Source ===
> The source code resides at https://github.com/ibm-et/spark-kernel.
> 
> === External Dependencies ===
> The Spark-Kernel depends upon a number of Apache projects:
> * Spark
> * Hadoop
> * Ivy
> * Commons
> 
> The Spark-Kernel also depends upon a number of other open source projects:
> * JeroMQ (LGPL with Static Linking Exception,
> http://zeromq.org/area:licensing)
> * Akka (MIT)
> * JOpt Simple (MIT)
> * Spring Framework Core (Apache v2)
> * Play (Apache v2)
> * SLF4J (MIT)
> * Scala
> * Scalatest (Apache v2)
> * Scalactic (Apache v2)
> * Mockito (MIT)
> 
> == Required Resources ==
> Developer and user mailing lists
> * private@spark-kernel.incubator.apache.org (with moderated subscriptions)
> * commits@spark-kernel.incubator.apache.org
> * dev@spark-kernel.incubator.apache.org
> * users@spark-kernel.incubator.apache.org
> 
> A git repository:
> https://git-wip-us.apache.org/repos/asf/incubator-spark-kernel.git
> 
> A JIRA issue tracker: https://issues.apache.org/jira/browse/SPARK-KERNEL
> 
> == Initial Committers ==
> The initial list of committers is:
> * Leugim Bustelo (gino@bustelos.com)
> * Jakob Odersky (jodersky@gmail.com)
> * Luciano Resende (lresende@apache.org)
> * Robert Senkbeil (chip.senkbeil@gmail.com)
> * Corey Stubbs (cas5542@gmail.com)
> * Miao Wang (wm624@hotmail.com)
> * Sean Welleck (wellecks@gmail.com)
> 
> === Affiliations ===
> All of the initial committers are employed by IBM.
> 
> == Sponsors ==
> === Champion ===
> * Sam Ruby (IBM)
> 
> === Nominated Mentors ===
> * Luciano Resende
> 
> We wish to recruit additional mentors during incubation.
> 
> === Sponsoring Entity ===
> The Apache Incubator.
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [DISCUSS] Spark-Kernel Incubator Proposal

Posted by Julien Le Dem <ju...@dremio.com>.

I'd be happy to help as a mentor if you need more.

On Thu, Nov 12, 2015 at 4:17 PM, <da...@fallside.com> wrote:

> Hello, we would like to start a discussion on accepting the Spark-Kernel,
> a mechanism for applications to interactively and remotely access Apache
> Spark, into the Apache Incubator.
>
> The proposal is available online at
> https://wiki.apache.org/incubator/SparkKernelProposal, and it is appended
> to this email.
>
> We are looking for additional mentors to help with this project, and we
> would much appreciate your guidance and advice.
>
> Thank-you in advance,
> David Fallside
>
>
>
> = Spark-Kernel Proposal =
>
> == Abstract ==
> Spark-Kernel provides applications with a mechanism to interactively and
> remotely access Apache Spark.
>
> == Proposal ==
> The Spark-Kernel enables interactive applications to access Apache Spark
> clusters. More specifically:
>  * Applications can send code-snippets and libraries for execution by Spark
>  * Applications can be deployed separately from Spark clusters and
> communicate with the Spark-Kernel using the provided Spark-Kernel client
>  * Execution results and streaming data can be sent back to calling
> applications
>  * Applications no longer have to be network connected to the workers on a
> Spark cluster because the Spark-Kernel acts as each application’s proxy
>  * Work has started on enabling Spark-Kernel to support languages in
> addition to Scala, namely Python (with PySpark), R (with SparkR), and SQL
> (with SparkSQL)
>
> == Background & Rationale ==
> Apache Spark provides applications with a fast and general purpose
> distributed computing engine that supports static and streaming data,
> tabular and graph representations of data, and an extensive library of
> machine learning libraries. Consequently, a wide variety of applications
> will be written for Spark and there will be interactive applications that
> require relatively frequent function evaluations, and batch-oriented
> applications that require one-shot or only occasional evaluation.
>
> Apache Spark provides two mechanisms for applications to connect with
> Spark. The primary mechanism launches applications on Spark clusters using
> spark-submit
> (http://spark.apache.org/docs/latest/submitting-applications.html); this
> requires developers to bundle their application code plus any dependencies
> into JAR files, and then submit them to Spark. A second mechanism is an
> ODBC/JDBC API
> (
> http://spark.apache.org/docs/latest/sql-programming-guide.html#distributed-sql-engine
> )
> which enables applications to issue SQL queries against SparkSQL.
>
> Our experience when developing interactive applications, such as analytic
> applications and Jupyter Notebooks, to run against Spark was that the
> spark-submit mechanism was overly cumbersome and slow (requiring JAR
> creation and forking processes to run spark-submit), and the SQL interface
> was too limiting and did not offer easy access to components other than
> SparkSQL, such as streaming. The most promising mechanism provided by
> Apache Spark was the command-line shell
> (
> http://spark.apache.org/docs/latest/programming-guide.html#using-the-shell
> )
> which enabled us to execute code snippets and dynamically control the
> tasks submitted to  a Spark cluster. Spark does not provide the
> command-line shell as a consumable service but it provided us with the
> starting point from which we developed the Spark-Kernel.
>
> == Current Status ==
> Spark-Kernel was first developed by a small team working on an
> internal-IBM Spark-related project in July 2014. In recognition of its
> likely general utility to Spark users and developers, in November 2014 the
> Spark-Kernel project was moved to GitHub and made available under the
> Apache License V2.
>
> == Meritocracy ==
> The current developers are familiar with the meritocratic open source
> development process at Apache. As the project has gathered interest at
> GitHub the developers have actively started a process to invite additional
> developers into the project, and we have at least one new developer who is
> ready to contribute code to the project.
>
> == Community ==
> We started building a community around the Spark-Kernel project when we
> moved it to GitHub about one year ago. Since then we have grown to about
> 70 people, and there are regular requests and suggestions from the
> community. We believe that providing Apache Spark application developers
> with a general-purpose and interactive API holds a lot of community
> potential, especially considering possible tie-in’s with the Jupyter and
> data science community.
>
> == Core Developers ==
> The core developers of the project are currently all from IBM, from the
> IBM Emerging Technology team and from IBM’s recently formed Spark
> Technology Center.
>
> == Alignment ==
> Apache, as the home of Apache Spark, is the most natural home for the
> Spark-Kernel project because it was designed to work with Apache Spark and
> to provide capabilities for interactive applications and data science
> tools not provided by Spark itself.
>
> The Spark-Kernel also has an affinity with Jupyter (jupyter.org) because
> it uses the Jupyter protocol for communications, and so Jupyter Notebooks
> can directly use the Spark-Kernel as a kernel for communicating with
> Apache Spark. However, we believe that the Spark-Kernel provides a
> general-purpose mechanism enabling a wider variety of applications than
> just Notebooks to access Spark, and so the Spark-Kernel’s greatest
> affinity is with Apache and Apache Spark.
>
> == Known Risks ==
> === Orphaned products ===
> We believe the Spark-Kernel project has a low-risk of abandonment due to
> interest in its continuing existence from several parties. More
> specifically, the Spark-Kernel provides a capability that is not provided
> by Apache Spark today but it enables a wider range of applications to
> leverage Spark. For example, IBM uses (and is considering) the
> Spark-Kernel in several offerings including its IBM Analytics for Apache
> Spark product in the Bluemix Cloud. There are also a couple of other
> commercial users who are using or considering its use in their offerings.
> Furthermore, Jupyter Notebooks are used by data scientists and Spark is
> gaining popularity as an analytic engine for them. Jupyter Notebooks are
> very easily enabled with the Spark-Kernel and so there is another
> constituency for it.
>
> === Inexperience with Open Source ===
> The Spark-Kernel project has been running as an open-source project
> (albeit with only IBM committers) for the past several months. The project
> has an active issue tracker and due to the interest indicated by the
> nature and volume of requests and comments, the team has publicly stated
> it is beginning to build a process so they can accept third-party
> contributions to the project.
>
> === Relationships with Other Apache Products ===
> The Spark-Kernel has a clear affinity with the Apache Spark project
> because it is designed to  provide capabilities for interactive
> applications and data science tools not provided by Spark itself. The
> Spark-Kernel can be a back-end for the Zeppelin project currently
> incubating at Apache. There is interest from the Spark-Kernel community to
> develop this capability and an experimental branch has been started.
>
> === Homogeneous Developers ===
> The current group of developers working on Spark-Kernel are all from IBM
> although the group is in the process of expanding its membership to
> include members of the GitHub community who are not from IBM and who have
> been active in the Spark-Kernel community in GutHub.
>
> === Reliance on Salaried Developers ===
> The initial committers are full-time employees at IBM although not all
> work on the project full-time.
>
> === Excessive Fascination with the Apache Brand ===
> We believe the Spark-Kernel benefits Apache Spark application developers,
> and we are interested in an Apache Spark-Kernel project to benefit these
> developers by engaging a larger community, facilitating closer ties with
> the existing Spark project, and yes, gaining more visibility for the
> Spark-Kernel as a solution.
>
> We have recently become aware that the project name “Spark-Kernel” may be
> interpreted as having an association with an Apache project. If the
> project is accepted by Apache, we suggest the project name remains the
> same, but otherwise we will change it to one that does not imply any
> Apache association.
>
> === Documentation ===
> Comprehensive documentation including “Getting Started”, API
> specifications and a Roadmap are available from the GitHub project, see
> https://github.com/ibm-et/spark-kernel/wiki.
>
> === Initial Source ===
> The source code resides at https://github.com/ibm-et/spark-kernel.
>
> === External Dependencies ===
> The Spark-Kernel depends upon a number of Apache projects:
>  * Spark
>  * Hadoop
>  * Ivy
>  * Commons
>
> The Spark-Kernel also depends upon a number of other open source projects:
>  * JeroMQ (LGPL with Static Linking Exception,
> http://zeromq.org/area:licensing)
>  * Akka (MIT)
>  * JOpt Simple (MIT)
>  * Spring Framework Core (Apache v2)
>  * Play (Apache v2)
>  * SLF4J (MIT)
>  * Scala
>  * Scalatest (Apache v2)
>  * Scalactic (Apache v2)
>  * Mockito (MIT)
>
> == Required Resources ==
> Developer and user mailing lists
>  * private@spark-kernel.incubator.apache.org (with moderated
> subscriptions)
>  * commits@spark-kernel.incubator.apache.org
>  * dev@spark-kernel.incubator.apache.org
>  * users@spark-kernel.incubator.apache.org
>
> A git repository:
> https://git-wip-us.apache.org/repos/asf/incubator-spark-kernel.git
>
> A JIRA issue tracker: https://issues.apache.org/jira/browse/SPARK-KERNEL
>
> == Initial Committers ==
> The initial list of committers is:
>  * Leugim Bustelo (gino@bustelos.com)
>  * Jakob Odersky (jodersky@gmail.com)
>  * Luciano Resende (lresende@apache.org)
>  * Robert Senkbeil (chip.senkbeil@gmail.com)
>  * Corey Stubbs (cas5542@gmail.com)
>  * Miao Wang (wm624@hotmail.com)
>  * Sean Welleck (wellecks@gmail.com)
>
> === Affiliations ===
> All of the initial committers are employed by IBM.
>
> == Sponsors ==
> === Champion ===
>  * Sam Ruby (IBM)
>
> === Nominated Mentors ===
>  * Luciano Resende
>
> We wish to recruit additional mentors during incubation.
>
> === Sponsoring Entity ===
> The Apache Incubator.
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>


-- 
Julien

Re: [DISCUSS] Spark-Kernel Incubator Proposal

Posted by Steve Loughran <st...@hortonworks.com>.

> On 13 Nov 2015, at 22:19, Matei Zaharia <ma...@gmail.com> wrote:
> 
> One question about this from the Spark side: have you considered giving the project a different name so that it doesn't sound like a Spark component? Right now "Spark Kernel" may be confused with "Spark Core" and things like that. I don't see a lot of Apache TLPs with related names, though maybe there's nothing wrong with that.
> 

ASF projects are allowed to, it just creates confusion about where support calls go. 

Certainly, if it is outside the ASF, then its an infringement of the ASF trademarks on Apache(tm) Spark(r). So from a trademark perspective alone, submitting to the ASF incubator may avoid having to rename the project. Given how java source usually ends up including product names in the packaging hierarchy, this can only be welcomed by the team

> In terms of whether to put this in Apache Spark proper, we can have a discussion about it later, but my feeling is that it's not necessary. One reason is that this only uses public APIs, and another is that there are also other notebook interfaces over Spark (e.g. Zeppelin).
> 
> Matei
> 

+1 for keeping it separate, because it can have its own release schedule, and it is designed to be loosely coupled through those APIs,

Regarding the proposal, I do think the Kernel is architecturally interesting, especially the ability to register new event handlers running in-cluster.

However, it's requirement to be 100% compatible with Jupyter means that is must use zeroMQ as a transport, —and zeromq.jar is LPGL. 

And the ASF, for better or worse, has a policy of: no mandatory dependencies on LGPL artifacts

 http://www.apache.org/legal/resolved.html

with the most recent discussion on the topic being : https://issues.apache.org/jira/browse/LEGAL-192

I see that 0MQ are talking about adopting the MPL license: http://zeromq.org/area:licensing ; I think getting zeromq.jar licensed as MPL is going to have be a checklist item on being able to release ASF approved artifacts, hence getting out of incubation.

If the project does get into incubation, one option that the spark team has is that of becoming the sponsoring project, rather than the incubator PMC. This gives the PMC there the responsibility for supervising the project, and should help foster a closer relationship between the groups

-Steve
---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [DISCUSS] Spark-Kernel Incubator Proposal

Posted by "P. Taylor Goetz" <pt...@gmail.com>.

> On Nov 13, 2015, at 5:19 PM, Matei Zaharia <ma...@gmail.com> wrote:
> 
> In terms of whether to put this in Apache Spark proper, we can have a discussion about it later, but my feeling is that it's not necessary. One reason is that this only uses public APIs, and another is that there are also other notebook interfaces over Spark (e.g. Zeppelin).
> 

That seems to echo the sentiment in the JIRA that Alex pointed out, and from a birds eye view looks like it was the impetus for the contributors to take the incubator route.

Would that sentiment change if spark-kernel relied on private APIs, or did so in the future?

The fact that at least Spark PMC member stepped up and offered to mentor the project seems like an indicator that there's at least some support for collaboration from the Spark community.

I'm not it any way trying to affect the outcome of this proposal. I'm mostly thinking out loud that there might be a missed collaboration opportunity here.

> Matei

-Taylor
---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [DISCUSS] Spark-Kernel Incubator Proposal

Posted by Matei Zaharia <ma...@gmail.com>.

One question about this from the Spark side: have you considered giving the project a different name so that it doesn't sound like a Spark component? Right now "Spark Kernel" may be confused with "Spark Core" and things like that. I don't see a lot of Apache TLPs with related names, though maybe there's nothing wrong with that.

In terms of whether to put this in Apache Spark proper, we can have a discussion about it later, but my feeling is that it's not necessary. One reason is that this only uses public APIs, and another is that there are also other notebook interfaces over Spark (e.g. Zeppelin).

Matei

> On Nov 12, 2015, at 7:17 PM, david@fallside.com wrote:
> 
> Hello, we would like to start a discussion on accepting the Spark-Kernel,
> a mechanism for applications to interactively and remotely access Apache
> Spark, into the Apache Incubator.
> 
> The proposal is available online at
> https://wiki.apache.org/incubator/SparkKernelProposal, and it is appended
> to this email.
> 
> We are looking for additional mentors to help with this project, and we
> would much appreciate your guidance and advice.
> 
> Thank-you in advance,
> David Fallside
> 
> 
> 
> = Spark-Kernel Proposal =
> 
> == Abstract ==
> Spark-Kernel provides applications with a mechanism to interactively and
> remotely access Apache Spark.
> 
> == Proposal ==
> The Spark-Kernel enables interactive applications to access Apache Spark
> clusters. More specifically:
> * Applications can send code-snippets and libraries for execution by Spark
> * Applications can be deployed separately from Spark clusters and
> communicate with the Spark-Kernel using the provided Spark-Kernel client
> * Execution results and streaming data can be sent back to calling
> applications
> * Applications no longer have to be network connected to the workers on a
> Spark cluster because the Spark-Kernel acts as each applications proxy
> * Work has started on enabling Spark-Kernel to support languages in
> addition to Scala, namely Python (with PySpark), R (with SparkR), and SQL
> (with SparkSQL)
> 
> == Background & Rationale ==
> Apache Spark provides applications with a fast and general purpose
> distributed computing engine that supports static and streaming data,
> tabular and graph representations of data, and an extensive library of
> machine learning libraries. Consequently, a wide variety of applications
> will be written for Spark and there will be interactive applications that
> require relatively frequent function evaluations, and batch-oriented
> applications that require one-shot or only occasional evaluation.
> 
> Apache Spark provides two mechanisms for applications to connect with
> Spark. The primary mechanism launches applications on Spark clusters using
> spark-submit
> (http://spark.apache.org/docs/latest/submitting-applications.html); this
> requires developers to bundle their application code plus any dependencies
> into JAR files, and then submit them to Spark. A second mechanism is an
> ODBC/JDBC API
> (http://spark.apache.org/docs/latest/sql-programming-guide.html#distributed-sql-engine)
> which enables applications to issue SQL queries against SparkSQL.
> 
> Our experience when developing interactive applications, such as analytic
> applications and Jupyter Notebooks, to run against Spark was that the
> spark-submit mechanism was overly cumbersome and slow (requiring JAR
> creation and forking processes to run spark-submit), and the SQL interface
> was too limiting and did not offer easy access to components other than
> SparkSQL, such as streaming. The most promising mechanism provided by
> Apache Spark was the command-line shell
> (http://spark.apache.org/docs/latest/programming-guide.html#using-the-shell)
> which enabled us to execute code snippets and dynamically control the
> tasks submitted to  a Spark cluster. Spark does not provide the
> command-line shell as a consumable service but it provided us with the
> starting point from which we developed the Spark-Kernel.
> 
> == Current Status ==
> Spark-Kernel was first developed by a small team working on an
> internal-IBM Spark-related project in July 2014. In recognition of its
> likely general utility to Spark users and developers, in November 2014 the
> Spark-Kernel project was moved to GitHub and made available under the
> Apache License V2.
> 
> == Meritocracy ==
> The current developers are familiar with the meritocratic open source
> development process at Apache. As the project has gathered interest at
> GitHub the developers have actively started a process to invite additional
> developers into the project, and we have at least one new developer who is
> ready to contribute code to the project.
> 
> == Community ==
> We started building a community around the Spark-Kernel project when we
> moved it to GitHub about one year ago. Since then we have grown to about
> 70 people, and there are regular requests and suggestions from the
> community. We believe that providing Apache Spark application developers
> with a general-purpose and interactive API holds a lot of community
> potential, especially considering possible tie-ins with the Jupyter and
> data science community.
> 
> == Core Developers ==
> The core developers of the project are currently all from IBM, from the
> IBM Emerging Technology team and from IBMs recently formed Spark
> Technology Center.
> 
> == Alignment ==
> Apache, as the home of Apache Spark, is the most natural home for the
> Spark-Kernel project because it was designed to work with Apache Spark and
> to provide capabilities for interactive applications and data science
> tools not provided by Spark itself.
> 
> The Spark-Kernel also has an affinity with Jupyter (jupyter.org) because
> it uses the Jupyter protocol for communications, and so Jupyter Notebooks
> can directly use the Spark-Kernel as a kernel for communicating with
> Apache Spark. However, we believe that the Spark-Kernel provides a
> general-purpose mechanism enabling a wider variety of applications than
> just Notebooks to access Spark, and so the Spark-Kernels greatest
> affinity is with Apache and Apache Spark.
> 
> == Known Risks ==
> === Orphaned products ===
> We believe the Spark-Kernel project has a low-risk of abandonment due to
> interest in its continuing existence from several parties. More
> specifically, the Spark-Kernel provides a capability that is not provided
> by Apache Spark today but it enables a wider range of applications to
> leverage Spark. For example, IBM uses (and is considering) the
> Spark-Kernel in several offerings including its IBM Analytics for Apache
> Spark product in the Bluemix Cloud. There are also a couple of other
> commercial users who are using or considering its use in their offerings.
> Furthermore, Jupyter Notebooks are used by data scientists and Spark is
> gaining popularity as an analytic engine for them. Jupyter Notebooks are
> very easily enabled with the Spark-Kernel and so there is another
> constituency for it.
> 
> === Inexperience with Open Source ===
> The Spark-Kernel project has been running as an open-source project
> (albeit with only IBM committers) for the past several months. The project
> has an active issue tracker and due to the interest indicated by the
> nature and volume of requests and comments, the team has publicly stated
> it is beginning to build a process so they can accept third-party
> contributions to the project.
> 
> === Relationships with Other Apache Products ===
> The Spark-Kernel has a clear affinity with the Apache Spark project
> because it is designed to  provide capabilities for interactive
> applications and data science tools not provided by Spark itself. The
> Spark-Kernel can be a back-end for the Zeppelin project currently
> incubating at Apache. There is interest from the Spark-Kernel community to
> develop this capability and an experimental branch has been started.
> 
> === Homogeneous Developers ===
> The current group of developers working on Spark-Kernel are all from IBM
> although the group is in the process of expanding its membership to
> include members of the GitHub community who are not from IBM and who have
> been active in the Spark-Kernel community in GutHub.
> 
> === Reliance on Salaried Developers ===
> The initial committers are full-time employees at IBM although not all
> work on the project full-time.
> 
> === Excessive Fascination with the Apache Brand ===
> We believe the Spark-Kernel benefits Apache Spark application developers,
> and we are interested in an Apache Spark-Kernel project to benefit these
> developers by engaging a larger community, facilitating closer ties with
> the existing Spark project, and yes, gaining more visibility for the
> Spark-Kernel as a solution.
> 
> We have recently become aware that the project name Spark-Kernel may be
> interpreted as having an association with an Apache project. If the
> project is accepted by Apache, we suggest the project name remains the
> same, but otherwise we will change it to one that does not imply any
> Apache association.
> 
> === Documentation ===
> Comprehensive documentation including Getting Started, API
> specifications and a Roadmap are available from the GitHub project, see
> https://github.com/ibm-et/spark-kernel/wiki.
> 
> === Initial Source ===
> The source code resides at https://github.com/ibm-et/spark-kernel.
> 
> === External Dependencies ===
> The Spark-Kernel depends upon a number of Apache projects:
> * Spark
> * Hadoop
> * Ivy
> * Commons
> 
> The Spark-Kernel also depends upon a number of other open source projects:
> * JeroMQ (LGPL with Static Linking Exception,
> http://zeromq.org/area:licensing)
> * Akka (MIT)
> * JOpt Simple (MIT)
> * Spring Framework Core (Apache v2)
> * Play (Apache v2)
> * SLF4J (MIT)
> * Scala
> * Scalatest (Apache v2)
> * Scalactic (Apache v2)
> * Mockito (MIT)
> 
> == Required Resources ==
> Developer and user mailing lists
> * private@spark-kernel.incubator.apache.org (with moderated subscriptions)
> * commits@spark-kernel.incubator.apache.org
> * dev@spark-kernel.incubator.apache.org
> * users@spark-kernel.incubator.apache.org
> 
> A git repository:
> https://git-wip-us.apache.org/repos/asf/incubator-spark-kernel.git
> 
> A JIRA issue tracker: https://issues.apache.org/jira/browse/SPARK-KERNEL
> 
> == Initial Committers ==
> The initial list of committers is:
> * Leugim Bustelo (gino@bustelos.com)
> * Jakob Odersky (jodersky@gmail.com)
> * Luciano Resende (lresende@apache.org)
> * Robert Senkbeil (chip.senkbeil@gmail.com)
> * Corey Stubbs (cas5542@gmail.com)
> * Miao Wang (wm624@hotmail.com)
> * Sean Welleck (wellecks@gmail.com)
> 
> === Affiliations ===
> All of the initial committers are employed by IBM.
> 
> == Sponsors ==
> === Champion ===
> * Sam Ruby (IBM)
> 
> === Nominated Mentors ===
> * Luciano Resende
> 
> We wish to recruit additional mentors during incubation.
> 
> === Sponsoring Entity ===
> The Apache Incubator.
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [DISCUSS] Spark-Kernel Incubator Proposal

Posted by David Fallside <da...@fallside.com>.

Hi Taylor, I don't know the Spark community's opinion on the "outright vs
subproject" issue, although I have told a couple people in that community about
the proposal and have posted an FYI to the spark-dev list. From a technical
perspective, Spark-Kernel mainly uses public Spark APIs (except for some SparkR
usage,
https://github.com/ibm-et/spark-kernel/blob/master/sparkr-interpreter/src/main/resources/README.md)
and so I guess the answer could go either way depending on the Spark community.
Thanks,
David

> On November 12, 2015 at 8:05 PM "P. Taylor Goetz" <pt...@gmail.com> wrote:
>
>
> Just a quick (or maybe not :) ) question...
>
> Given the tight coupling to the Apache Spark project, were there any
> considerations or discussions with the Spark community regarding including the
> Spark-Kernel functionality outright in Spark, or the possibility of becoming a
> subproject?
>
> I'm just curious. I don't think an answer one way or another would necessarily
> block incubation.
>
> -Taylor
>
> > On Nov 12, 2015, at 7:17 PM, david@fallside.com wrote:
> >
> > Hello, we would like to start a discussion on accepting the Spark-Kernel,
> > a mechanism for applications to interactively and remotely access Apache
> > Spark, into the Apache Incubator.
> >
> > The proposal is available online at
> > https://wiki.apache.org/incubator/SparkKernelProposal, and it is appended
> > to this email.
> >
> > We are looking for additional mentors to help with this project, and we
> > would much appreciate your guidance and advice.
> >
> > Thank-you in advance,
> > David Fallside
> >
> >
> >
> > = Spark-Kernel Proposal =
> >
> > == Abstract ==
> > Spark-Kernel provides applications with a mechanism to interactively and
> > remotely access Apache Spark.
> >
> > == Proposal ==
> > The Spark-Kernel enables interactive applications to access Apache Spark
> > clusters. More specifically:
> > * Applications can send code-snippets and libraries for execution by Spark
> > * Applications can be deployed separately from Spark clusters and
> > communicate with the Spark-Kernel using the provided Spark-Kernel client
> > * Execution results and streaming data can be sent back to calling
> > applications
> > * Applications no longer have to be network connected to the workers on a
> > Spark cluster because the Spark-Kernel acts as each application’s proxy
> > * Work has started on enabling Spark-Kernel to support languages in
> > addition to Scala, namely Python (with PySpark), R (with SparkR), and SQL
> > (with SparkSQL)
> >
> > == Background & Rationale ==
> > Apache Spark provides applications with a fast and general purpose
> > distributed computing engine that supports static and streaming data,
> > tabular and graph representations of data, and an extensive library of
> > machine learning libraries. Consequently, a wide variety of applications
> > will be written for Spark and there will be interactive applications that
> > require relatively frequent function evaluations, and batch-oriented
> > applications that require one-shot or only occasional evaluation.
> >
> > Apache Spark provides two mechanisms for applications to connect with
> > Spark. The primary mechanism launches applications on Spark clusters using
> > spark-submit
> > (http://spark.apache.org/docs/latest/submitting-applications.html); this
> > requires developers to bundle their application code plus any dependencies
> > into JAR files, and then submit them to Spark. A second mechanism is an
> > ODBC/JDBC API
> > (http://spark.apache.org/docs/latest/sql-programming-guide.html#distributed-sql-engine)
> > which enables applications to issue SQL queries against SparkSQL.
> >
> > Our experience when developing interactive applications, such as analytic
> > applications and Jupyter Notebooks, to run against Spark was that the
> > spark-submit mechanism was overly cumbersome and slow (requiring JAR
> > creation and forking processes to run spark-submit), and the SQL interface
> > was too limiting and did not offer easy access to components other than
> > SparkSQL, such as streaming. The most promising mechanism provided by
> > Apache Spark was the command-line shell
> > (http://spark.apache.org/docs/latest/programming-guide.html#using-the-shell)
> > which enabled us to execute code snippets and dynamically control the
> > tasks submitted to a Spark cluster. Spark does not provide the
> > command-line shell as a consumable service but it provided us with the
> > starting point from which we developed the Spark-Kernel.
> >
> > == Current Status ==
> > Spark-Kernel was first developed by a small team working on an
> > internal-IBM Spark-related project in July 2014. In recognition of its
> > likely general utility to Spark users and developers, in November 2014 the
> > Spark-Kernel project was moved to GitHub and made available under the
> > Apache License V2.
> >
> > == Meritocracy ==
> > The current developers are familiar with the meritocratic open source
> > development process at Apache. As the project has gathered interest at
> > GitHub the developers have actively started a process to invite additional
> > developers into the project, and we have at least one new developer who is
> > ready to contribute code to the project.
> >
> > == Community ==
> > We started building a community around the Spark-Kernel project when we
> > moved it to GitHub about one year ago. Since then we have grown to about
> > 70 people, and there are regular requests and suggestions from the
> > community. We believe that providing Apache Spark application developers
> > with a general-purpose and interactive API holds a lot of community
> > potential, especially considering possible tie-in’s with the Jupyter and
> > data science community.
> >
> > == Core Developers ==
> > The core developers of the project are currently all from IBM, from the
> > IBM Emerging Technology team and from IBM’s recently formed Spark
> > Technology Center.
> >
> > == Alignment ==
> > Apache, as the home of Apache Spark, is the most natural home for the
> > Spark-Kernel project because it was designed to work with Apache Spark and
> > to provide capabilities for interactive applications and data science
> > tools not provided by Spark itself.
> >
> > The Spark-Kernel also has an affinity with Jupyter (jupyter.org) because
> > it uses the Jupyter protocol for communications, and so Jupyter Notebooks
> > can directly use the Spark-Kernel as a kernel for communicating with
> > Apache Spark. However, we believe that the Spark-Kernel provides a
> > general-purpose mechanism enabling a wider variety of applications than
> > just Notebooks to access Spark, and so the Spark-Kernel’s greatest
> > affinity is with Apache and Apache Spark.
> >
> > == Known Risks ==
> > === Orphaned products ===
> > We believe the Spark-Kernel project has a low-risk of abandonment due to
> > interest in its continuing existence from several parties. More
> > specifically, the Spark-Kernel provides a capability that is not provided
> > by Apache Spark today but it enables a wider range of applications to
> > leverage Spark. For example, IBM uses (and is considering) the
> > Spark-Kernel in several offerings including its IBM Analytics for Apache
> > Spark product in the Bluemix Cloud. There are also a couple of other
> > commercial users who are using or considering its use in their offerings.
> > Furthermore, Jupyter Notebooks are used by data scientists and Spark is
> > gaining popularity as an analytic engine for them. Jupyter Notebooks are
> > very easily enabled with the Spark-Kernel and so there is another
> > constituency for it.
> >
> > === Inexperience with Open Source ===
> > The Spark-Kernel project has been running as an open-source project
> > (albeit with only IBM committers) for the past several months. The project
> > has an active issue tracker and due to the interest indicated by the
> > nature and volume of requests and comments, the team has publicly stated
> > it is beginning to build a process so they can accept third-party
> > contributions to the project.
> >
> > === Relationships with Other Apache Products ===
> > The Spark-Kernel has a clear affinity with the Apache Spark project
> > because it is designed to provide capabilities for interactive
> > applications and data science tools not provided by Spark itself. The
> > Spark-Kernel can be a back-end for the Zeppelin project currently
> > incubating at Apache. There is interest from the Spark-Kernel community to
> > develop this capability and an experimental branch has been started.
> >
> > === Homogeneous Developers ===
> > The current group of developers working on Spark-Kernel are all from IBM
> > although the group is in the process of expanding its membership to
> > include members of the GitHub community who are not from IBM and who have
> > been active in the Spark-Kernel community in GutHub.
> >
> > === Reliance on Salaried Developers ===
> > The initial committers are full-time employees at IBM although not all
> > work on the project full-time.
> >
> > === Excessive Fascination with the Apache Brand ===
> > We believe the Spark-Kernel benefits Apache Spark application developers,
> > and we are interested in an Apache Spark-Kernel project to benefit these
> > developers by engaging a larger community, facilitating closer ties with
> > the existing Spark project, and yes, gaining more visibility for the
> > Spark-Kernel as a solution.
> >
> > We have recently become aware that the project name “Spark-Kernel” may be
> > interpreted as having an association with an Apache project. If the
> > project is accepted by Apache, we suggest the project name remains the
> > same, but otherwise we will change it to one that does not imply any
> > Apache association.
> >
> > === Documentation ===
> > Comprehensive documentation including “Getting Started”, API
> > specifications and a Roadmap are available from the GitHub project, see
> > https://github.com/ibm-et/spark-kernel/wiki.
> >
> > === Initial Source ===
> > The source code resides at https://github.com/ibm-et/spark-kernel.
> >
> > === External Dependencies ===
> > The Spark-Kernel depends upon a number of Apache projects:
> > * Spark
> > * Hadoop
> > * Ivy
> > * Commons
> >
> > The Spark-Kernel also depends upon a number of other open source projects:
> > * JeroMQ (LGPL with Static Linking Exception,
> > http://zeromq.org/area:licensing)
> > * Akka (MIT)
> > * JOpt Simple (MIT)
> > * Spring Framework Core (Apache v2)
> > * Play (Apache v2)
> > * SLF4J (MIT)
> > * Scala
> > * Scalatest (Apache v2)
> > * Scalactic (Apache v2)
> > * Mockito (MIT)
> >
> > == Required Resources ==
> > Developer and user mailing lists
> > * private@spark-kernel.incubator.apache.org (with moderated subscriptions)
> > * commits@spark-kernel.incubator.apache.org
> > * dev@spark-kernel.incubator.apache.org
> > * users@spark-kernel.incubator.apache.org
> >
> > A git repository:
> > https://git-wip-us.apache.org/repos/asf/incubator-spark-kernel.git
> >
> > A JIRA issue tracker: https://issues.apache.org/jira/browse/SPARK-KERNEL
> >
> > == Initial Committers ==
> > The initial list of committers is:
> > * Leugim Bustelo (gino@bustelos.com)
> > * Jakob Odersky (jodersky@gmail.com)
> > * Luciano Resende (lresende@apache.org)
> > * Robert Senkbeil (chip.senkbeil@gmail.com)
> > * Corey Stubbs (cas5542@gmail.com)
> > * Miao Wang (wm624@hotmail.com)
> > * Sean Welleck (wellecks@gmail.com)
> >
> > === Affiliations ===
> > All of the initial committers are employed by IBM.
> >
> > == Sponsors ==
> > === Champion ===
> > * Sam Ruby (IBM)
> >
> > === Nominated Mentors ===
> > * Luciano Resende
> >
> > We wish to recruit additional mentors during incubation.
> >
> > === Sponsoring Entity ===
> > The Apache Incubator.
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> > For additional commands, e-mail: general-help@incubator.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>

Re: [DISCUSS] Spark-Kernel Incubator Proposal

Posted by Luciano Resende <lu...@gmail.com>.

On Fri, Nov 13, 2015 at 2:13 AM, Alexander Bezzubov <ab...@nflabs.com>
wrote:

> Hi,
>
> it looks pretty interesting, especially a part about integration with
> Zeppelin as another Scala interpreter implementation.
>
> AFAIK there was a discussion on including Spark-Kernel to spark core
> https://issues.apache.org/jira/browse/SPARK-4605 but not sure about a
> possibility of becoming a sub-project one.
>
> Would be interesting to know as indeed it looks very aligned with Apache
> Spark.
>
> --
> Alex
>
>

Thanks for the pointer Alex.

This discussion can continue during incubation and while the project starts
to grow, and we can actually revisit this during graduation, which is when
it would really make a difference.

-- 
Luciano Resende
http://people.apache.org/~lresende
http://twitter.com/lresende1975
http://lresende.blogspot.com/

Re: [DISCUSS] Spark-Kernel Incubator Proposal

Posted by Henry Saputra <he...@gmail.com>.

Recent example was the MySos proposal, it is MySQL in Mesos, which
renamed to Cotton before officially entering incubator.

I think it would be easier for infra if the name is decided a bit
early to something that have better chance to pass incubation.

- Henry

On Fri, Nov 13, 2015 at 3:57 PM, Sam Ruby <ru...@intertwingly.net> wrote:
> On Fri, Nov 13, 2015 at 6:49 PM, Reynold Xin <rx...@apache.org> wrote:
>>
>> I'd also like to second Matei that spark-kernel as a name is fairly
>> confusing. It only makes sense when viewing from IPython notebook's point
>> of view to refer to these things as kernels. Outside of that context, it
>> sounds like it is the spark-core module, which this obviously isn't.
>
> That name is indeed unlikely to survive incubation, particularly if
> this result of graduation is a separate PMC (as opposed to, say,
> becoming a subproject of Apache Spark).
>
> - Sam Ruby
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [DISCUSS] Spark-Kernel Incubator Proposal

Posted by Sam Ruby <ru...@intertwingly.net>.

On Fri, Nov 13, 2015 at 6:49 PM, Reynold Xin <rx...@apache.org> wrote:
>
> I'd also like to second Matei that spark-kernel as a name is fairly
> confusing. It only makes sense when viewing from IPython notebook's point
> of view to refer to these things as kernels. Outside of that context, it
> sounds like it is the spark-core module, which this obviously isn't.

That name is indeed unlikely to survive incubation, particularly if
this result of graduation is a separate PMC (as opposed to, say,
becoming a subproject of Apache Spark).

- Sam Ruby

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [DISCUSS] Spark-Kernel Incubator Proposal

Posted by Reynold Xin <rx...@apache.org>.

I'm happy to mentor the incubation if you are still looking for mentors.

I'd also like to second Matei that spark-kernel as a name is fairly
confusing. It only makes sense when viewing from IPython notebook's point
of view to refer to these things as kernels. Outside of that context, it
sounds like it is the spark-core module, which this obviously isn't.



On Fri, Nov 13, 2015 at 2:28 PM, P. Taylor Goetz <pt...@gmail.com> wrote:

> Thanks for the reference Alex. It answers my question regarding the path
> you chose.
>
> -Taylor
>
> > On Nov 13, 2015, at 12:13 AM, Alexander Bezzubov <ab...@nflabs.com>
> wrote:
> >
> > Hi,
> >
> > it looks pretty interesting, especially a part about integration with
> > Zeppelin as another Scala interpreter implementation.
> >
> > AFAIK there was a discussion on including Spark-Kernel to spark core
> > https://issues.apache.org/jira/browse/SPARK-4605 but not sure about a
> > possibility of becoming a sub-project one.
> >
> > Would be interesting to know as indeed it looks very aligned with Apache
> > Spark.
> >
> > --
> > Alex
> >
> >> On Fri, Nov 13, 2015 at 10:05 AM, P. Taylor Goetz <pt...@gmail.com>
> wrote:
> >>
> >> Just a quick (or maybe not :) ) question...
> >>
> >> Given the tight coupling to the Apache Spark project, were there any
> >> considerations or discussions with the Spark community regarding
> including
> >> the Spark-Kernel functionality outright in Spark, or the possibility of
> >> becoming a subproject?
> >>
> >> I'm just curious. I don't think an answer one way or another would
> >> necessarily block incubation.
> >>
> >> -Taylor
> >>
> >>> On Nov 12, 2015, at 7:17 PM, david@fallside.com wrote:
> >>>
> >>> Hello, we would like to start a discussion on accepting the
> Spark-Kernel,
> >>> a mechanism for applications to interactively and remotely access
> Apache
> >>> Spark, into the Apache Incubator.
> >>>
> >>> The proposal is available online at
> >>> https://wiki.apache.org/incubator/SparkKernelProposal, and it is
> >> appended
> >>> to this email.
> >>>
> >>> We are looking for additional mentors to help with this project, and we
> >>> would much appreciate your guidance and advice.
> >>>
> >>> Thank-you in advance,
> >>> David Fallside
> >>>
> >>>
> >>>
> >>> = Spark-Kernel Proposal =
> >>>
> >>> == Abstract ==
> >>> Spark-Kernel provides applications with a mechanism to interactively
> and
> >>> remotely access Apache Spark.
> >>>
> >>> == Proposal ==
> >>> The Spark-Kernel enables interactive applications to access Apache
> Spark
> >>> clusters. More specifically:
> >>> * Applications can send code-snippets and libraries for execution by
> >> Spark
> >>> * Applications can be deployed separately from Spark clusters and
> >>> communicate with the Spark-Kernel using the provided Spark-Kernel
> client
> >>> * Execution results and streaming data can be sent back to calling
> >>> applications
> >>> * Applications no longer have to be network connected to the workers
> on a
> >>> Spark cluster because the Spark-Kernel acts as each application’s proxy
> >>> * Work has started on enabling Spark-Kernel to support languages in
> >>> addition to Scala, namely Python (with PySpark), R (with SparkR), and
> SQL
> >>> (with SparkSQL)
> >>>
> >>> == Background & Rationale ==
> >>> Apache Spark provides applications with a fast and general purpose
> >>> distributed computing engine that supports static and streaming data,
> >>> tabular and graph representations of data, and an extensive library of
> >>> machine learning libraries. Consequently, a wide variety of
> applications
> >>> will be written for Spark and there will be interactive applications
> that
> >>> require relatively frequent function evaluations, and batch-oriented
> >>> applications that require one-shot or only occasional evaluation.
> >>>
> >>> Apache Spark provides two mechanisms for applications to connect with
> >>> Spark. The primary mechanism launches applications on Spark clusters
> >> using
> >>> spark-submit
> >>> (http://spark.apache.org/docs/latest/submitting-applications.html);
> this
> >>> requires developers to bundle their application code plus any
> >> dependencies
> >>> into JAR files, and then submit them to Spark. A second mechanism is an
> >>> ODBC/JDBC API
> >>> (
> >>
> http://spark.apache.org/docs/latest/sql-programming-guide.html#distributed-sql-engine
> >> )
> >>> which enables applications to issue SQL queries against SparkSQL.
> >>>
> >>> Our experience when developing interactive applications, such as
> analytic
> >>> applications and Jupyter Notebooks, to run against Spark was that the
> >>> spark-submit mechanism was overly cumbersome and slow (requiring JAR
> >>> creation and forking processes to run spark-submit), and the SQL
> >> interface
> >>> was too limiting and did not offer easy access to components other than
> >>> SparkSQL, such as streaming. The most promising mechanism provided by
> >>> Apache Spark was the command-line shell
> >>> (
> >>
> http://spark.apache.org/docs/latest/programming-guide.html#using-the-shell
> >> )
> >>> which enabled us to execute code snippets and dynamically control the
> >>> tasks submitted to  a Spark cluster. Spark does not provide the
> >>> command-line shell as a consumable service but it provided us with the
> >>> starting point from which we developed the Spark-Kernel.
> >>>
> >>> == Current Status ==
> >>> Spark-Kernel was first developed by a small team working on an
> >>> internal-IBM Spark-related project in July 2014. In recognition of its
> >>> likely general utility to Spark users and developers, in November 2014
> >> the
> >>> Spark-Kernel project was moved to GitHub and made available under the
> >>> Apache License V2.
> >>>
> >>> == Meritocracy ==
> >>> The current developers are familiar with the meritocratic open source
> >>> development process at Apache. As the project has gathered interest at
> >>> GitHub the developers have actively started a process to invite
> >> additional
> >>> developers into the project, and we have at least one new developer who
> >> is
> >>> ready to contribute code to the project.
> >>>
> >>> == Community ==
> >>> We started building a community around the Spark-Kernel project when we
> >>> moved it to GitHub about one year ago. Since then we have grown to
> about
> >>> 70 people, and there are regular requests and suggestions from the
> >>> community. We believe that providing Apache Spark application
> developers
> >>> with a general-purpose and interactive API holds a lot of community
> >>> potential, especially considering possible tie-in’s with the Jupyter
> and
> >>> data science community.
> >>>
> >>> == Core Developers ==
> >>> The core developers of the project are currently all from IBM, from the
> >>> IBM Emerging Technology team and from IBM’s recently formed Spark
> >>> Technology Center.
> >>>
> >>> == Alignment ==
> >>> Apache, as the home of Apache Spark, is the most natural home for the
> >>> Spark-Kernel project because it was designed to work with Apache Spark
> >> and
> >>> to provide capabilities for interactive applications and data science
> >>> tools not provided by Spark itself.
> >>>
> >>> The Spark-Kernel also has an affinity with Jupyter (jupyter.org)
> because
> >>> it uses the Jupyter protocol for communications, and so Jupyter
> Notebooks
> >>> can directly use the Spark-Kernel as a kernel for communicating with
> >>> Apache Spark. However, we believe that the Spark-Kernel provides a
> >>> general-purpose mechanism enabling a wider variety of applications than
> >>> just Notebooks to access Spark, and so the Spark-Kernel’s greatest
> >>> affinity is with Apache and Apache Spark.
> >>>
> >>> == Known Risks ==
> >>> === Orphaned products ===
> >>> We believe the Spark-Kernel project has a low-risk of abandonment due
> to
> >>> interest in its continuing existence from several parties. More
> >>> specifically, the Spark-Kernel provides a capability that is not
> provided
> >>> by Apache Spark today but it enables a wider range of applications to
> >>> leverage Spark. For example, IBM uses (and is considering) the
> >>> Spark-Kernel in several offerings including its IBM Analytics for
> Apache
> >>> Spark product in the Bluemix Cloud. There are also a couple of other
> >>> commercial users who are using or considering its use in their
> offerings.
> >>> Furthermore, Jupyter Notebooks are used by data scientists and Spark is
> >>> gaining popularity as an analytic engine for them. Jupyter Notebooks
> are
> >>> very easily enabled with the Spark-Kernel and so there is another
> >>> constituency for it.
> >>>
> >>> === Inexperience with Open Source ===
> >>> The Spark-Kernel project has been running as an open-source project
> >>> (albeit with only IBM committers) for the past several months. The
> >> project
> >>> has an active issue tracker and due to the interest indicated by the
> >>> nature and volume of requests and comments, the team has publicly
> stated
> >>> it is beginning to build a process so they can accept third-party
> >>> contributions to the project.
> >>>
> >>> === Relationships with Other Apache Products ===
> >>> The Spark-Kernel has a clear affinity with the Apache Spark project
> >>> because it is designed to  provide capabilities for interactive
> >>> applications and data science tools not provided by Spark itself. The
> >>> Spark-Kernel can be a back-end for the Zeppelin project currently
> >>> incubating at Apache. There is interest from the Spark-Kernel community
> >> to
> >>> develop this capability and an experimental branch has been started.
> >>>
> >>> === Homogeneous Developers ===
> >>> The current group of developers working on Spark-Kernel are all from
> IBM
> >>> although the group is in the process of expanding its membership to
> >>> include members of the GitHub community who are not from IBM and who
> have
> >>> been active in the Spark-Kernel community in GutHub.
> >>>
> >>> === Reliance on Salaried Developers ===
> >>> The initial committers are full-time employees at IBM although not all
> >>> work on the project full-time.
> >>>
> >>> === Excessive Fascination with the Apache Brand ===
> >>> We believe the Spark-Kernel benefits Apache Spark application
> developers,
> >>> and we are interested in an Apache Spark-Kernel project to benefit
> these
> >>> developers by engaging a larger community, facilitating closer ties
> with
> >>> the existing Spark project, and yes, gaining more visibility for the
> >>> Spark-Kernel as a solution.
> >>>
> >>> We have recently become aware that the project name “Spark-Kernel” may
> be
> >>> interpreted as having an association with an Apache project. If the
> >>> project is accepted by Apache, we suggest the project name remains the
> >>> same, but otherwise we will change it to one that does not imply any
> >>> Apache association.
> >>>
> >>> === Documentation ===
> >>> Comprehensive documentation including “Getting Started”, API
> >>> specifications and a Roadmap are available from the GitHub project, see
> >>> https://github.com/ibm-et/spark-kernel/wiki.
> >>>
> >>> === Initial Source ===
> >>> The source code resides at https://github.com/ibm-et/spark-kernel.
> >>>
> >>> === External Dependencies ===
> >>> The Spark-Kernel depends upon a number of Apache projects:
> >>> * Spark
> >>> * Hadoop
> >>> * Ivy
> >>> * Commons
> >>>
> >>> The Spark-Kernel also depends upon a number of other open source
> >> projects:
> >>> * JeroMQ (LGPL with Static Linking Exception,
> >>> http://zeromq.org/area:licensing)
> >>> * Akka (MIT)
> >>> * JOpt Simple (MIT)
> >>> * Spring Framework Core (Apache v2)
> >>> * Play (Apache v2)
> >>> * SLF4J (MIT)
> >>> * Scala
> >>> * Scalatest (Apache v2)
> >>> * Scalactic (Apache v2)
> >>> * Mockito (MIT)
> >>>
> >>> == Required Resources ==
> >>> Developer and user mailing lists
> >>> * private@spark-kernel.incubator.apache.org (with moderated
> >> subscriptions)
> >>> * commits@spark-kernel.incubator.apache.org
> >>> * dev@spark-kernel.incubator.apache.org
> >>> * users@spark-kernel.incubator.apache.org
> >>>
> >>> A git repository:
> >>> https://git-wip-us.apache.org/repos/asf/incubator-spark-kernel.git
> >>>
> >>> A JIRA issue tracker:
> https://issues.apache.org/jira/browse/SPARK-KERNEL
> >>>
> >>> == Initial Committers ==
> >>> The initial list of committers is:
> >>> * Leugim Bustelo (gino@bustelos.com)
> >>> * Jakob Odersky (jodersky@gmail.com)
> >>> * Luciano Resende (lresende@apache.org)
> >>> * Robert Senkbeil (chip.senkbeil@gmail.com)
> >>> * Corey Stubbs (cas5542@gmail.com)
> >>> * Miao Wang (wm624@hotmail.com)
> >>> * Sean Welleck (wellecks@gmail.com)
> >>>
> >>> === Affiliations ===
> >>> All of the initial committers are employed by IBM.
> >>>
> >>> == Sponsors ==
> >>> === Champion ===
> >>> * Sam Ruby (IBM)
> >>>
> >>> === Nominated Mentors ===
> >>> * Luciano Resende
> >>>
> >>> We wish to recruit additional mentors during incubation.
> >>>
> >>> === Sponsoring Entity ===
> >>> The Apache Incubator.
> >>>
> >>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> >>> For additional commands, e-mail: general-help@incubator.apache.org
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> >> For additional commands, e-mail: general-help@incubator.apache.org
> >
> >
> > --
> > --
> > Kind regards,
> > Alexander.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>

Re: [DISCUSS] Spark-Kernel Incubator Proposal

Posted by "P. Taylor Goetz" <pt...@gmail.com>.

Thanks for the reference Alex. It answers my question regarding the path you chose.

-Taylor

> On Nov 13, 2015, at 12:13 AM, Alexander Bezzubov <ab...@nflabs.com> wrote:
> 
> Hi,
> 
> it looks pretty interesting, especially a part about integration with
> Zeppelin as another Scala interpreter implementation.
> 
> AFAIK there was a discussion on including Spark-Kernel to spark core
> https://issues.apache.org/jira/browse/SPARK-4605 but not sure about a
> possibility of becoming a sub-project one.
> 
> Would be interesting to know as indeed it looks very aligned with Apache
> Spark.
> 
> --
> Alex
> 
>> On Fri, Nov 13, 2015 at 10:05 AM, P. Taylor Goetz <pt...@gmail.com> wrote:
>> 
>> Just a quick (or maybe not :) ) question...
>> 
>> Given the tight coupling to the Apache Spark project, were there any
>> considerations or discussions with the Spark community regarding including
>> the Spark-Kernel functionality outright in Spark, or the possibility of
>> becoming a subproject?
>> 
>> I'm just curious. I don't think an answer one way or another would
>> necessarily block incubation.
>> 
>> -Taylor
>> 
>>> On Nov 12, 2015, at 7:17 PM, david@fallside.com wrote:
>>> 
>>> Hello, we would like to start a discussion on accepting the Spark-Kernel,
>>> a mechanism for applications to interactively and remotely access Apache
>>> Spark, into the Apache Incubator.
>>> 
>>> The proposal is available online at
>>> https://wiki.apache.org/incubator/SparkKernelProposal, and it is
>> appended
>>> to this email.
>>> 
>>> We are looking for additional mentors to help with this project, and we
>>> would much appreciate your guidance and advice.
>>> 
>>> Thank-you in advance,
>>> David Fallside
>>> 
>>> 
>>> 
>>> = Spark-Kernel Proposal =
>>> 
>>> == Abstract ==
>>> Spark-Kernel provides applications with a mechanism to interactively and
>>> remotely access Apache Spark.
>>> 
>>> == Proposal ==
>>> The Spark-Kernel enables interactive applications to access Apache Spark
>>> clusters. More specifically:
>>> * Applications can send code-snippets and libraries for execution by
>> Spark
>>> * Applications can be deployed separately from Spark clusters and
>>> communicate with the Spark-Kernel using the provided Spark-Kernel client
>>> * Execution results and streaming data can be sent back to calling
>>> applications
>>> * Applications no longer have to be network connected to the workers on a
>>> Spark cluster because the Spark-Kernel acts as each application’s proxy
>>> * Work has started on enabling Spark-Kernel to support languages in
>>> addition to Scala, namely Python (with PySpark), R (with SparkR), and SQL
>>> (with SparkSQL)
>>> 
>>> == Background & Rationale ==
>>> Apache Spark provides applications with a fast and general purpose
>>> distributed computing engine that supports static and streaming data,
>>> tabular and graph representations of data, and an extensive library of
>>> machine learning libraries. Consequently, a wide variety of applications
>>> will be written for Spark and there will be interactive applications that
>>> require relatively frequent function evaluations, and batch-oriented
>>> applications that require one-shot or only occasional evaluation.
>>> 
>>> Apache Spark provides two mechanisms for applications to connect with
>>> Spark. The primary mechanism launches applications on Spark clusters
>> using
>>> spark-submit
>>> (http://spark.apache.org/docs/latest/submitting-applications.html); this
>>> requires developers to bundle their application code plus any
>> dependencies
>>> into JAR files, and then submit them to Spark. A second mechanism is an
>>> ODBC/JDBC API
>>> (
>> http://spark.apache.org/docs/latest/sql-programming-guide.html#distributed-sql-engine
>> )
>>> which enables applications to issue SQL queries against SparkSQL.
>>> 
>>> Our experience when developing interactive applications, such as analytic
>>> applications and Jupyter Notebooks, to run against Spark was that the
>>> spark-submit mechanism was overly cumbersome and slow (requiring JAR
>>> creation and forking processes to run spark-submit), and the SQL
>> interface
>>> was too limiting and did not offer easy access to components other than
>>> SparkSQL, such as streaming. The most promising mechanism provided by
>>> Apache Spark was the command-line shell
>>> (
>> http://spark.apache.org/docs/latest/programming-guide.html#using-the-shell
>> )
>>> which enabled us to execute code snippets and dynamically control the
>>> tasks submitted to  a Spark cluster. Spark does not provide the
>>> command-line shell as a consumable service but it provided us with the
>>> starting point from which we developed the Spark-Kernel.
>>> 
>>> == Current Status ==
>>> Spark-Kernel was first developed by a small team working on an
>>> internal-IBM Spark-related project in July 2014. In recognition of its
>>> likely general utility to Spark users and developers, in November 2014
>> the
>>> Spark-Kernel project was moved to GitHub and made available under the
>>> Apache License V2.
>>> 
>>> == Meritocracy ==
>>> The current developers are familiar with the meritocratic open source
>>> development process at Apache. As the project has gathered interest at
>>> GitHub the developers have actively started a process to invite
>> additional
>>> developers into the project, and we have at least one new developer who
>> is
>>> ready to contribute code to the project.
>>> 
>>> == Community ==
>>> We started building a community around the Spark-Kernel project when we
>>> moved it to GitHub about one year ago. Since then we have grown to about
>>> 70 people, and there are regular requests and suggestions from the
>>> community. We believe that providing Apache Spark application developers
>>> with a general-purpose and interactive API holds a lot of community
>>> potential, especially considering possible tie-in’s with the Jupyter and
>>> data science community.
>>> 
>>> == Core Developers ==
>>> The core developers of the project are currently all from IBM, from the
>>> IBM Emerging Technology team and from IBM’s recently formed Spark
>>> Technology Center.
>>> 
>>> == Alignment ==
>>> Apache, as the home of Apache Spark, is the most natural home for the
>>> Spark-Kernel project because it was designed to work with Apache Spark
>> and
>>> to provide capabilities for interactive applications and data science
>>> tools not provided by Spark itself.
>>> 
>>> The Spark-Kernel also has an affinity with Jupyter (jupyter.org) because
>>> it uses the Jupyter protocol for communications, and so Jupyter Notebooks
>>> can directly use the Spark-Kernel as a kernel for communicating with
>>> Apache Spark. However, we believe that the Spark-Kernel provides a
>>> general-purpose mechanism enabling a wider variety of applications than
>>> just Notebooks to access Spark, and so the Spark-Kernel’s greatest
>>> affinity is with Apache and Apache Spark.
>>> 
>>> == Known Risks ==
>>> === Orphaned products ===
>>> We believe the Spark-Kernel project has a low-risk of abandonment due to
>>> interest in its continuing existence from several parties. More
>>> specifically, the Spark-Kernel provides a capability that is not provided
>>> by Apache Spark today but it enables a wider range of applications to
>>> leverage Spark. For example, IBM uses (and is considering) the
>>> Spark-Kernel in several offerings including its IBM Analytics for Apache
>>> Spark product in the Bluemix Cloud. There are also a couple of other
>>> commercial users who are using or considering its use in their offerings.
>>> Furthermore, Jupyter Notebooks are used by data scientists and Spark is
>>> gaining popularity as an analytic engine for them. Jupyter Notebooks are
>>> very easily enabled with the Spark-Kernel and so there is another
>>> constituency for it.
>>> 
>>> === Inexperience with Open Source ===
>>> The Spark-Kernel project has been running as an open-source project
>>> (albeit with only IBM committers) for the past several months. The
>> project
>>> has an active issue tracker and due to the interest indicated by the
>>> nature and volume of requests and comments, the team has publicly stated
>>> it is beginning to build a process so they can accept third-party
>>> contributions to the project.
>>> 
>>> === Relationships with Other Apache Products ===
>>> The Spark-Kernel has a clear affinity with the Apache Spark project
>>> because it is designed to  provide capabilities for interactive
>>> applications and data science tools not provided by Spark itself. The
>>> Spark-Kernel can be a back-end for the Zeppelin project currently
>>> incubating at Apache. There is interest from the Spark-Kernel community
>> to
>>> develop this capability and an experimental branch has been started.
>>> 
>>> === Homogeneous Developers ===
>>> The current group of developers working on Spark-Kernel are all from IBM
>>> although the group is in the process of expanding its membership to
>>> include members of the GitHub community who are not from IBM and who have
>>> been active in the Spark-Kernel community in GutHub.
>>> 
>>> === Reliance on Salaried Developers ===
>>> The initial committers are full-time employees at IBM although not all
>>> work on the project full-time.
>>> 
>>> === Excessive Fascination with the Apache Brand ===
>>> We believe the Spark-Kernel benefits Apache Spark application developers,
>>> and we are interested in an Apache Spark-Kernel project to benefit these
>>> developers by engaging a larger community, facilitating closer ties with
>>> the existing Spark project, and yes, gaining more visibility for the
>>> Spark-Kernel as a solution.
>>> 
>>> We have recently become aware that the project name “Spark-Kernel” may be
>>> interpreted as having an association with an Apache project. If the
>>> project is accepted by Apache, we suggest the project name remains the
>>> same, but otherwise we will change it to one that does not imply any
>>> Apache association.
>>> 
>>> === Documentation ===
>>> Comprehensive documentation including “Getting Started”, API
>>> specifications and a Roadmap are available from the GitHub project, see
>>> https://github.com/ibm-et/spark-kernel/wiki.
>>> 
>>> === Initial Source ===
>>> The source code resides at https://github.com/ibm-et/spark-kernel.
>>> 
>>> === External Dependencies ===
>>> The Spark-Kernel depends upon a number of Apache projects:
>>> * Spark
>>> * Hadoop
>>> * Ivy
>>> * Commons
>>> 
>>> The Spark-Kernel also depends upon a number of other open source
>> projects:
>>> * JeroMQ (LGPL with Static Linking Exception,
>>> http://zeromq.org/area:licensing)
>>> * Akka (MIT)
>>> * JOpt Simple (MIT)
>>> * Spring Framework Core (Apache v2)
>>> * Play (Apache v2)
>>> * SLF4J (MIT)
>>> * Scala
>>> * Scalatest (Apache v2)
>>> * Scalactic (Apache v2)
>>> * Mockito (MIT)
>>> 
>>> == Required Resources ==
>>> Developer and user mailing lists
>>> * private@spark-kernel.incubator.apache.org (with moderated
>> subscriptions)
>>> * commits@spark-kernel.incubator.apache.org
>>> * dev@spark-kernel.incubator.apache.org
>>> * users@spark-kernel.incubator.apache.org
>>> 
>>> A git repository:
>>> https://git-wip-us.apache.org/repos/asf/incubator-spark-kernel.git
>>> 
>>> A JIRA issue tracker: https://issues.apache.org/jira/browse/SPARK-KERNEL
>>> 
>>> == Initial Committers ==
>>> The initial list of committers is:
>>> * Leugim Bustelo (gino@bustelos.com)
>>> * Jakob Odersky (jodersky@gmail.com)
>>> * Luciano Resende (lresende@apache.org)
>>> * Robert Senkbeil (chip.senkbeil@gmail.com)
>>> * Corey Stubbs (cas5542@gmail.com)
>>> * Miao Wang (wm624@hotmail.com)
>>> * Sean Welleck (wellecks@gmail.com)
>>> 
>>> === Affiliations ===
>>> All of the initial committers are employed by IBM.
>>> 
>>> == Sponsors ==
>>> === Champion ===
>>> * Sam Ruby (IBM)
>>> 
>>> === Nominated Mentors ===
>>> * Luciano Resende
>>> 
>>> We wish to recruit additional mentors during incubation.
>>> 
>>> === Sponsoring Entity ===
>>> The Apache Incubator.
>>> 
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>> For additional commands, e-mail: general-help@incubator.apache.org
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: general-help@incubator.apache.org
> 
> 
> -- 
> --
> Kind regards,
> Alexander.

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [DISCUSS] Spark-Kernel Incubator Proposal

Posted by Alexander Bezzubov <ab...@nflabs.com>.

Hi,

it looks pretty interesting, especially a part about integration with
Zeppelin as another Scala interpreter implementation.

AFAIK there was a discussion on including Spark-Kernel to spark core
https://issues.apache.org/jira/browse/SPARK-4605 but not sure about a
possibility of becoming a sub-project one.

Would be interesting to know as indeed it looks very aligned with Apache
Spark.

--
Alex

On Fri, Nov 13, 2015 at 10:05 AM, P. Taylor Goetz <pt...@gmail.com> wrote:

> Just a quick (or maybe not :) ) question...
>
> Given the tight coupling to the Apache Spark project, were there any
> considerations or discussions with the Spark community regarding including
> the Spark-Kernel functionality outright in Spark, or the possibility of
> becoming a subproject?
>
> I'm just curious. I don't think an answer one way or another would
> necessarily block incubation.
>
> -Taylor
>
> > On Nov 12, 2015, at 7:17 PM, david@fallside.com wrote:
> >
> > Hello, we would like to start a discussion on accepting the Spark-Kernel,
> > a mechanism for applications to interactively and remotely access Apache
> > Spark, into the Apache Incubator.
> >
> > The proposal is available online at
> > https://wiki.apache.org/incubator/SparkKernelProposal, and it is
> appended
> > to this email.
> >
> > We are looking for additional mentors to help with this project, and we
> > would much appreciate your guidance and advice.
> >
> > Thank-you in advance,
> > David Fallside
> >
> >
> >
> > = Spark-Kernel Proposal =
> >
> > == Abstract ==
> > Spark-Kernel provides applications with a mechanism to interactively and
> > remotely access Apache Spark.
> >
> > == Proposal ==
> > The Spark-Kernel enables interactive applications to access Apache Spark
> > clusters. More specifically:
> > * Applications can send code-snippets and libraries for execution by
> Spark
> > * Applications can be deployed separately from Spark clusters and
> > communicate with the Spark-Kernel using the provided Spark-Kernel client
> > * Execution results and streaming data can be sent back to calling
> > applications
> > * Applications no longer have to be network connected to the workers on a
> > Spark cluster because the Spark-Kernel acts as each application’s proxy
> > * Work has started on enabling Spark-Kernel to support languages in
> > addition to Scala, namely Python (with PySpark), R (with SparkR), and SQL
> > (with SparkSQL)
> >
> > == Background & Rationale ==
> > Apache Spark provides applications with a fast and general purpose
> > distributed computing engine that supports static and streaming data,
> > tabular and graph representations of data, and an extensive library of
> > machine learning libraries. Consequently, a wide variety of applications
> > will be written for Spark and there will be interactive applications that
> > require relatively frequent function evaluations, and batch-oriented
> > applications that require one-shot or only occasional evaluation.
> >
> > Apache Spark provides two mechanisms for applications to connect with
> > Spark. The primary mechanism launches applications on Spark clusters
> using
> > spark-submit
> > (http://spark.apache.org/docs/latest/submitting-applications.html); this
> > requires developers to bundle their application code plus any
> dependencies
> > into JAR files, and then submit them to Spark. A second mechanism is an
> > ODBC/JDBC API
> > (
> http://spark.apache.org/docs/latest/sql-programming-guide.html#distributed-sql-engine
> )
> > which enables applications to issue SQL queries against SparkSQL.
> >
> > Our experience when developing interactive applications, such as analytic
> > applications and Jupyter Notebooks, to run against Spark was that the
> > spark-submit mechanism was overly cumbersome and slow (requiring JAR
> > creation and forking processes to run spark-submit), and the SQL
> interface
> > was too limiting and did not offer easy access to components other than
> > SparkSQL, such as streaming. The most promising mechanism provided by
> > Apache Spark was the command-line shell
> > (
> http://spark.apache.org/docs/latest/programming-guide.html#using-the-shell
> )
> > which enabled us to execute code snippets and dynamically control the
> > tasks submitted to  a Spark cluster. Spark does not provide the
> > command-line shell as a consumable service but it provided us with the
> > starting point from which we developed the Spark-Kernel.
> >
> > == Current Status ==
> > Spark-Kernel was first developed by a small team working on an
> > internal-IBM Spark-related project in July 2014. In recognition of its
> > likely general utility to Spark users and developers, in November 2014
> the
> > Spark-Kernel project was moved to GitHub and made available under the
> > Apache License V2.
> >
> > == Meritocracy ==
> > The current developers are familiar with the meritocratic open source
> > development process at Apache. As the project has gathered interest at
> > GitHub the developers have actively started a process to invite
> additional
> > developers into the project, and we have at least one new developer who
> is
> > ready to contribute code to the project.
> >
> > == Community ==
> > We started building a community around the Spark-Kernel project when we
> > moved it to GitHub about one year ago. Since then we have grown to about
> > 70 people, and there are regular requests and suggestions from the
> > community. We believe that providing Apache Spark application developers
> > with a general-purpose and interactive API holds a lot of community
> > potential, especially considering possible tie-in’s with the Jupyter and
> > data science community.
> >
> > == Core Developers ==
> > The core developers of the project are currently all from IBM, from the
> > IBM Emerging Technology team and from IBM’s recently formed Spark
> > Technology Center.
> >
> > == Alignment ==
> > Apache, as the home of Apache Spark, is the most natural home for the
> > Spark-Kernel project because it was designed to work with Apache Spark
> and
> > to provide capabilities for interactive applications and data science
> > tools not provided by Spark itself.
> >
> > The Spark-Kernel also has an affinity with Jupyter (jupyter.org) because
> > it uses the Jupyter protocol for communications, and so Jupyter Notebooks
> > can directly use the Spark-Kernel as a kernel for communicating with
> > Apache Spark. However, we believe that the Spark-Kernel provides a
> > general-purpose mechanism enabling a wider variety of applications than
> > just Notebooks to access Spark, and so the Spark-Kernel’s greatest
> > affinity is with Apache and Apache Spark.
> >
> > == Known Risks ==
> > === Orphaned products ===
> > We believe the Spark-Kernel project has a low-risk of abandonment due to
> > interest in its continuing existence from several parties. More
> > specifically, the Spark-Kernel provides a capability that is not provided
> > by Apache Spark today but it enables a wider range of applications to
> > leverage Spark. For example, IBM uses (and is considering) the
> > Spark-Kernel in several offerings including its IBM Analytics for Apache
> > Spark product in the Bluemix Cloud. There are also a couple of other
> > commercial users who are using or considering its use in their offerings.
> > Furthermore, Jupyter Notebooks are used by data scientists and Spark is
> > gaining popularity as an analytic engine for them. Jupyter Notebooks are
> > very easily enabled with the Spark-Kernel and so there is another
> > constituency for it.
> >
> > === Inexperience with Open Source ===
> > The Spark-Kernel project has been running as an open-source project
> > (albeit with only IBM committers) for the past several months. The
> project
> > has an active issue tracker and due to the interest indicated by the
> > nature and volume of requests and comments, the team has publicly stated
> > it is beginning to build a process so they can accept third-party
> > contributions to the project.
> >
> > === Relationships with Other Apache Products ===
> > The Spark-Kernel has a clear affinity with the Apache Spark project
> > because it is designed to  provide capabilities for interactive
> > applications and data science tools not provided by Spark itself. The
> > Spark-Kernel can be a back-end for the Zeppelin project currently
> > incubating at Apache. There is interest from the Spark-Kernel community
> to
> > develop this capability and an experimental branch has been started.
> >
> > === Homogeneous Developers ===
> > The current group of developers working on Spark-Kernel are all from IBM
> > although the group is in the process of expanding its membership to
> > include members of the GitHub community who are not from IBM and who have
> > been active in the Spark-Kernel community in GutHub.
> >
> > === Reliance on Salaried Developers ===
> > The initial committers are full-time employees at IBM although not all
> > work on the project full-time.
> >
> > === Excessive Fascination with the Apache Brand ===
> > We believe the Spark-Kernel benefits Apache Spark application developers,
> > and we are interested in an Apache Spark-Kernel project to benefit these
> > developers by engaging a larger community, facilitating closer ties with
> > the existing Spark project, and yes, gaining more visibility for the
> > Spark-Kernel as a solution.
> >
> > We have recently become aware that the project name “Spark-Kernel” may be
> > interpreted as having an association with an Apache project. If the
> > project is accepted by Apache, we suggest the project name remains the
> > same, but otherwise we will change it to one that does not imply any
> > Apache association.
> >
> > === Documentation ===
> > Comprehensive documentation including “Getting Started”, API
> > specifications and a Roadmap are available from the GitHub project, see
> > https://github.com/ibm-et/spark-kernel/wiki.
> >
> > === Initial Source ===
> > The source code resides at https://github.com/ibm-et/spark-kernel.
> >
> > === External Dependencies ===
> > The Spark-Kernel depends upon a number of Apache projects:
> > * Spark
> > * Hadoop
> > * Ivy
> > * Commons
> >
> > The Spark-Kernel also depends upon a number of other open source
> projects:
> > * JeroMQ (LGPL with Static Linking Exception,
> > http://zeromq.org/area:licensing)
> > * Akka (MIT)
> > * JOpt Simple (MIT)
> > * Spring Framework Core (Apache v2)
> > * Play (Apache v2)
> > * SLF4J (MIT)
> > * Scala
> > * Scalatest (Apache v2)
> > * Scalactic (Apache v2)
> > * Mockito (MIT)
> >
> > == Required Resources ==
> > Developer and user mailing lists
> > * private@spark-kernel.incubator.apache.org (with moderated
> subscriptions)
> > * commits@spark-kernel.incubator.apache.org
> > * dev@spark-kernel.incubator.apache.org
> > * users@spark-kernel.incubator.apache.org
> >
> > A git repository:
> > https://git-wip-us.apache.org/repos/asf/incubator-spark-kernel.git
> >
> > A JIRA issue tracker: https://issues.apache.org/jira/browse/SPARK-KERNEL
> >
> > == Initial Committers ==
> > The initial list of committers is:
> > * Leugim Bustelo (gino@bustelos.com)
> > * Jakob Odersky (jodersky@gmail.com)
> > * Luciano Resende (lresende@apache.org)
> > * Robert Senkbeil (chip.senkbeil@gmail.com)
> > * Corey Stubbs (cas5542@gmail.com)
> > * Miao Wang (wm624@hotmail.com)
> > * Sean Welleck (wellecks@gmail.com)
> >
> > === Affiliations ===
> > All of the initial committers are employed by IBM.
> >
> > == Sponsors ==
> > === Champion ===
> > * Sam Ruby (IBM)
> >
> > === Nominated Mentors ===
> > * Luciano Resende
> >
> > We wish to recruit additional mentors during incubation.
> >
> > === Sponsoring Entity ===
> > The Apache Incubator.
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> > For additional commands, e-mail: general-help@incubator.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>


-- 
--
Kind regards,
Alexander.

Re: [DISCUSS] Spark-Kernel Incubator Proposal

Posted by "P. Taylor Goetz" <pt...@gmail.com>.

Just a quick (or maybe not :) ) question...

Given the tight coupling to the Apache Spark project, were there any considerations or discussions with the Spark community regarding including the Spark-Kernel functionality outright in Spark, or the possibility of becoming a subproject?

I'm just curious. I don't think an answer one way or another would necessarily block incubation.

-Taylor

> On Nov 12, 2015, at 7:17 PM, david@fallside.com wrote:
> 
> Hello, we would like to start a discussion on accepting the Spark-Kernel,
> a mechanism for applications to interactively and remotely access Apache
> Spark, into the Apache Incubator.
> 
> The proposal is available online at
> https://wiki.apache.org/incubator/SparkKernelProposal, and it is appended
> to this email.
> 
> We are looking for additional mentors to help with this project, and we
> would much appreciate your guidance and advice.
> 
> Thank-you in advance,
> David Fallside
> 
> 
> 
> = Spark-Kernel Proposal =
> 
> == Abstract ==
> Spark-Kernel provides applications with a mechanism to interactively and
> remotely access Apache Spark.
> 
> == Proposal ==
> The Spark-Kernel enables interactive applications to access Apache Spark
> clusters. More specifically:
> * Applications can send code-snippets and libraries for execution by Spark
> * Applications can be deployed separately from Spark clusters and
> communicate with the Spark-Kernel using the provided Spark-Kernel client
> * Execution results and streaming data can be sent back to calling
> applications
> * Applications no longer have to be network connected to the workers on a
> Spark cluster because the Spark-Kernel acts as each application’s proxy
> * Work has started on enabling Spark-Kernel to support languages in
> addition to Scala, namely Python (with PySpark), R (with SparkR), and SQL
> (with SparkSQL)
> 
> == Background & Rationale ==
> Apache Spark provides applications with a fast and general purpose
> distributed computing engine that supports static and streaming data,
> tabular and graph representations of data, and an extensive library of
> machine learning libraries. Consequently, a wide variety of applications
> will be written for Spark and there will be interactive applications that
> require relatively frequent function evaluations, and batch-oriented
> applications that require one-shot or only occasional evaluation.
> 
> Apache Spark provides two mechanisms for applications to connect with
> Spark. The primary mechanism launches applications on Spark clusters using
> spark-submit
> (http://spark.apache.org/docs/latest/submitting-applications.html); this
> requires developers to bundle their application code plus any dependencies
> into JAR files, and then submit them to Spark. A second mechanism is an
> ODBC/JDBC API
> (http://spark.apache.org/docs/latest/sql-programming-guide.html#distributed-sql-engine)
> which enables applications to issue SQL queries against SparkSQL.
> 
> Our experience when developing interactive applications, such as analytic
> applications and Jupyter Notebooks, to run against Spark was that the
> spark-submit mechanism was overly cumbersome and slow (requiring JAR
> creation and forking processes to run spark-submit), and the SQL interface
> was too limiting and did not offer easy access to components other than
> SparkSQL, such as streaming. The most promising mechanism provided by
> Apache Spark was the command-line shell
> (http://spark.apache.org/docs/latest/programming-guide.html#using-the-shell)
> which enabled us to execute code snippets and dynamically control the
> tasks submitted to  a Spark cluster. Spark does not provide the
> command-line shell as a consumable service but it provided us with the
> starting point from which we developed the Spark-Kernel.
> 
> == Current Status ==
> Spark-Kernel was first developed by a small team working on an
> internal-IBM Spark-related project in July 2014. In recognition of its
> likely general utility to Spark users and developers, in November 2014 the
> Spark-Kernel project was moved to GitHub and made available under the
> Apache License V2.
> 
> == Meritocracy ==
> The current developers are familiar with the meritocratic open source
> development process at Apache. As the project has gathered interest at
> GitHub the developers have actively started a process to invite additional
> developers into the project, and we have at least one new developer who is
> ready to contribute code to the project.
> 
> == Community ==
> We started building a community around the Spark-Kernel project when we
> moved it to GitHub about one year ago. Since then we have grown to about
> 70 people, and there are regular requests and suggestions from the
> community. We believe that providing Apache Spark application developers
> with a general-purpose and interactive API holds a lot of community
> potential, especially considering possible tie-in’s with the Jupyter and
> data science community.
> 
> == Core Developers ==
> The core developers of the project are currently all from IBM, from the
> IBM Emerging Technology team and from IBM’s recently formed Spark
> Technology Center.
> 
> == Alignment ==
> Apache, as the home of Apache Spark, is the most natural home for the
> Spark-Kernel project because it was designed to work with Apache Spark and
> to provide capabilities for interactive applications and data science
> tools not provided by Spark itself.
> 
> The Spark-Kernel also has an affinity with Jupyter (jupyter.org) because
> it uses the Jupyter protocol for communications, and so Jupyter Notebooks
> can directly use the Spark-Kernel as a kernel for communicating with
> Apache Spark. However, we believe that the Spark-Kernel provides a
> general-purpose mechanism enabling a wider variety of applications than
> just Notebooks to access Spark, and so the Spark-Kernel’s greatest
> affinity is with Apache and Apache Spark.
> 
> == Known Risks ==
> === Orphaned products ===
> We believe the Spark-Kernel project has a low-risk of abandonment due to
> interest in its continuing existence from several parties. More
> specifically, the Spark-Kernel provides a capability that is not provided
> by Apache Spark today but it enables a wider range of applications to
> leverage Spark. For example, IBM uses (and is considering) the
> Spark-Kernel in several offerings including its IBM Analytics for Apache
> Spark product in the Bluemix Cloud. There are also a couple of other
> commercial users who are using or considering its use in their offerings.
> Furthermore, Jupyter Notebooks are used by data scientists and Spark is
> gaining popularity as an analytic engine for them. Jupyter Notebooks are
> very easily enabled with the Spark-Kernel and so there is another
> constituency for it.
> 
> === Inexperience with Open Source ===
> The Spark-Kernel project has been running as an open-source project
> (albeit with only IBM committers) for the past several months. The project
> has an active issue tracker and due to the interest indicated by the
> nature and volume of requests and comments, the team has publicly stated
> it is beginning to build a process so they can accept third-party
> contributions to the project.
> 
> === Relationships with Other Apache Products ===
> The Spark-Kernel has a clear affinity with the Apache Spark project
> because it is designed to  provide capabilities for interactive
> applications and data science tools not provided by Spark itself. The
> Spark-Kernel can be a back-end for the Zeppelin project currently
> incubating at Apache. There is interest from the Spark-Kernel community to
> develop this capability and an experimental branch has been started.
> 
> === Homogeneous Developers ===
> The current group of developers working on Spark-Kernel are all from IBM
> although the group is in the process of expanding its membership to
> include members of the GitHub community who are not from IBM and who have
> been active in the Spark-Kernel community in GutHub.
> 
> === Reliance on Salaried Developers ===
> The initial committers are full-time employees at IBM although not all
> work on the project full-time.
> 
> === Excessive Fascination with the Apache Brand ===
> We believe the Spark-Kernel benefits Apache Spark application developers,
> and we are interested in an Apache Spark-Kernel project to benefit these
> developers by engaging a larger community, facilitating closer ties with
> the existing Spark project, and yes, gaining more visibility for the
> Spark-Kernel as a solution.
> 
> We have recently become aware that the project name “Spark-Kernel” may be
> interpreted as having an association with an Apache project. If the
> project is accepted by Apache, we suggest the project name remains the
> same, but otherwise we will change it to one that does not imply any
> Apache association.
> 
> === Documentation ===
> Comprehensive documentation including “Getting Started”, API
> specifications and a Roadmap are available from the GitHub project, see
> https://github.com/ibm-et/spark-kernel/wiki.
> 
> === Initial Source ===
> The source code resides at https://github.com/ibm-et/spark-kernel.
> 
> === External Dependencies ===
> The Spark-Kernel depends upon a number of Apache projects:
> * Spark
> * Hadoop
> * Ivy
> * Commons
> 
> The Spark-Kernel also depends upon a number of other open source projects:
> * JeroMQ (LGPL with Static Linking Exception,
> http://zeromq.org/area:licensing)
> * Akka (MIT)
> * JOpt Simple (MIT)
> * Spring Framework Core (Apache v2)
> * Play (Apache v2)
> * SLF4J (MIT)
> * Scala
> * Scalatest (Apache v2)
> * Scalactic (Apache v2)
> * Mockito (MIT)
> 
> == Required Resources ==
> Developer and user mailing lists
> * private@spark-kernel.incubator.apache.org (with moderated subscriptions)
> * commits@spark-kernel.incubator.apache.org
> * dev@spark-kernel.incubator.apache.org
> * users@spark-kernel.incubator.apache.org
> 
> A git repository:
> https://git-wip-us.apache.org/repos/asf/incubator-spark-kernel.git
> 
> A JIRA issue tracker: https://issues.apache.org/jira/browse/SPARK-KERNEL
> 
> == Initial Committers ==
> The initial list of committers is:
> * Leugim Bustelo (gino@bustelos.com)
> * Jakob Odersky (jodersky@gmail.com)
> * Luciano Resende (lresende@apache.org)
> * Robert Senkbeil (chip.senkbeil@gmail.com)
> * Corey Stubbs (cas5542@gmail.com)
> * Miao Wang (wm624@hotmail.com)
> * Sean Welleck (wellecks@gmail.com)
> 
> === Affiliations ===
> All of the initial committers are employed by IBM.
> 
> == Sponsors ==
> === Champion ===
> * Sam Ruby (IBM)
> 
> === Nominated Mentors ===
> * Luciano Resende
> 
> We wish to recruit additional mentors during incubation.
> 
> === Sponsoring Entity ===
> The Apache Incubator.
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [DISCUSS] Spark-Kernel Incubator Proposal

Posted by Sree V <sr...@yahoo.com.INVALID>.

Hi David & All,
The 'spark-kernel/torii' is a "good to have" tool.
Pardon me, I am not the judge in any way.After going through this thread and the referred links, it seems, by giving it a decent publicity in Apache Spark (may be provide link, etc.),would be sufficient enough for its survival and evolution, instead going through the entire 'apache incubation'.
I am not undermining the incubation any way.  But for the prep work needed (license/trademark, project rename, package rename) to make 'spark-kernel' incubation eligible and once in incubation, it needs to keep up with the progress of Apache Zeppelin (which is incubating already).

Oh! that also makes me ask, can apache zeppelin & spark-kernel/torri be combined into one ?!


Either way,  count me in for any help required with 'spark-kernel/torii'.
Thanking you.With RegardsSree
 


    On Monday, November 30, 2015 4:13 PM, Julien Le Dem <ju...@dremio.com> wrote:
 

 Sorry for the late reply.
FYI there is an opensource project called torii already:
https://vestorly.github.io/torii/
Whether there is a trademark or not, I'd recommend a name that does not
collide with another project.

On Wed, Nov 25, 2015 at 9:00 PM, Luciano Resende <lu...@gmail.com>
wrote:

> Thanks for all your feedback, we have updated the proposal with the
> following :
>
> - Renamed the project to Torii
> - Added new mentors that volunteered during the discussion
>
> Below is an updated proposal, which I will be calling for a vote shortly.
>
> = Torii =
>
> == Abstract ==
> Torii provides applications with a mechanism to interactively and remotely
> access Apache Spark.
>
> == Proposal ==
> Torii enables interactive applications to access Apache Spark clusters.
> More specifically:
>  * Applications can send code-snippets and libraries for execution by Spark
>  * Applications can be deployed separately from Spark clusters and
> communicate with the Torii using the provided Torii client
>  * Execution results and streaming data can be sent back to calling
> applications
>  * Applications no longer have to be network connected to the workers on a
> Spark cluster because the Torii acts as each application’s proxy
>  * Work has started on enabling Torii to support languages in addition to
> Scala, namely Python (with PySpark), R (with SparkR), and SQL (with
> SparkSQL)
>
> == Background & Rationale ==
> Apache Spark provides applications with a fast and general purpose
> distributed computing engine that supports static and streaming data,
> tabular and graph representations of data, and an extensive library of
> machine learning libraries. Consequently, a wide variety of applications
> will be written for Spark and there will be interactive applications that
> require relatively frequent function evaluations, and batch-oriented
> applications that require one-shot or only occasional evaluation.
>
> Apache Spark provides two mechanisms for applications to connect with
> Spark. The primary mechanism launches applications on Spark clusters using
> spark-submit (
> http://spark.apache.org/docs/latest/submitting-applications.html); this
> requires developers to bundle their application code plus any dependencies
> into JAR files, and then submit them to Spark. A second mechanism is an
> ODBC/JDBC API (
>
> http://spark.apache.org/docs/latest/sql-programming-guide.html#distributed-sql-engine
> )
> which enables applications to issue SQL queries against SparkSQL.
>
> Our experience when developing interactive applications, such as analytic
> applications integrated with Notebooks, to run against Spark was that the
> spark-submit mechanism was overly cumbersome and slow (requiring JAR
> creation and forking processes to run spark-submit), and the SQL interface
> was too limiting and did not offer easy access to components other than
> SparkSQL, such as streaming. The most promising mechanism provided by
> Apache Spark was the command-line shell (
> http://spark.apache.org/docs/latest/programming-guide.html#using-the-shell
> )
> which enabled us to execute code snippets and dynamically control the tasks
> submitted to  a Spark cluster. Spark does not provide the command-line
> shell as a consumable service but it provided us with the starting point
> from which we developed Torii.
>
> == Current Status ==
> Torii was first developed by a small team working on an internal-IBM
> Spark-related project in July 2014. In recognition of its likely general
> utility to Spark users and developers, in November 2014 the Torii project
> was moved to GitHub and made available under the Apache License V2.
>
> == Meritocracy ==
> The current developers are familiar with the meritocratic open source
> development process at Apache. As the project has gathered interest at
> GitHub the developers have actively started a process to invite additional
> developers into the project, and we have at least one new developer who is
> ready to contribute code to the project.
>
> == Community ==
> We started building a community around Torii project when we moved it to
> GitHub about one year ago. Since then we have grown to about 70 people, and
> there are regular requests and suggestions from the community. We believe
> that providing Apache Spark application developers with a general-purpose
> and interactive API holds a lot of community potential, especially
> considering possible tie-in’s with Notebooks and data science community.
>
> == Core Developers ==
> The core developers of the project are currently all from IBM, from the IBM
> Emerging Technology team and from IBM’s recently formed Spark Technology
> Center.
>
> == Alignment ==
> Apache, as the home of Apache Spark, is the most natural home for the Torii
> project because it was designed to work with Apache Spark and to provide
> capabilities for interactive applications and data science tools not
> provided by Spark itself.
>
> The Torii also has an affinity with Jupyter (jupyter.org) because it uses
> the Jupyter protocol for communications, and so Jupyter Notebooks can
> directly use the Torii as a kernel for communicating with Apache Spark.
> However, we believe that the Torii provides a general-purpose mechanism
> enabling a wider variety of applications than just Notebooks to access
> Spark, and so the Torii’s greatest affinity is with Apache and Apache
> Spark.
>
> == Known Risks ==
>
> === Orphaned products ===
> We believe the Torii project has a low-risk of abandonment due to interest
> in its continuing existence from several parties. More specifically, the
> Torii provides a capability that is not provided by Apache Spark today but
> it enables a wider range of applications to leverage Spark. For example,
> IBM uses (and is considering) the Torii in several offerings including its
> IBM Analytics for Apache Spark product in the Bluemix Cloud. There are also
> a couple of other commercial users who are using or considering its use in
> their offerings. Furthermore, Jupyter Notebooks are used by data scientists
> and Spark is gaining popularity as an analytic engine for them. Jupyter
> Notebooks are very easily enabled with the Torii and so there is another
> constituency for it.
>
> === Inexperience with Open Source ===
> The Torii project has been running as an open-source project (albeit with
> only IBM committers) for the past several months. The project has an active
> issue tracker and due to the interest indicated by the nature and volume of
> requests and comments, the team has publicly stated it is beginning to
> build a process so they can accept third-party contributions to the
> project.
>
> === Relationships with Other Apache Products ===
> The Torii has a clear affinity with the Apache Spark project because it is
> designed to  provide capabilities for interactive applications and data
> science tools not provided by Spark itself. The Torii can be a back-end for
> the Zeppelin project currently incubating at Apache. There is interest from
> the Torii community to develop this capability and an experimental branch
> has been started.
>
> === Homogeneous Developers ===
> The current group of developers working on Torii are all from IBM although
> the group is in the process of expanding its membership to include members
> of the GitHub community who are not from IBM and who have been active in
> the Torii community in GutHub.
>
> === Reliance on Salaried Developers ===
> The initial committers are full-time employees at IBM although not all work
> on the project full-time.
>
> === Excessive Fascination with the Apache Brand ===
> We believe the Torii benefits Apache Spark application developers, and we
> are interested in an Apache Torii project to benefit these developers by
> engaging a larger community, facilitating closer ties with the existing
> Spark project, and yes, gaining more visibility for the Torii as a
> solution.
>
> === Documentation ===
> Comprehensive documentation including “Getting Started”, API specifications
> and a Roadmap are available from the GitHub project, see
> https://github.com/ibm-et/Torii/wiki.
>
> === Initial Source ===
> The source code resides at https://github.com/ibm-et/Torii.
>
> === External Dependencies ===
> The Torii depends upon a number of Apache projects:
>  * Spark
>  * Hadoop
>  * Ivy
>  * Commons
>
> The Torii also depends upon a number of other open source projects:
>  * ZeroMQ (LGPL with Static Linking Exception,
> http://zeromq.org/area:licensing)
>  * Akka (MIT)
>  * JOpt Simple (MIT)
>  * Spring Framework Core (Apache v2)
>  * Play (Apache v2)
>  * SLF4J (MIT)
>  * Scala
>  * Scalatest (Apache v2)
>  * Scalactic (Apache v2)
>  * Mockito (MIT)
>
> == Required Resources ==
>
> === Mailing lists ===
>
>  * private@torii.incubator.apache.org (with moderated subscriptions)
>  * commits@torii.incubator.apache.org
>  * dev@torii.incubator.apache.org
>
> === Git Repository ===
>
>  * https://git-wip-us.apache.org/repos/asf/incubator-torii.git
>
> === Issue Tracking ===
>
>  * A JIRA issue tracker: https://issues.apache.org/jira/browse/TORII
>
> == Initial Committers ==
>
>  * Leugim Bustelo (lbustelo AT us DOT ibm DOT com)
>  * Jakob Odersky (odersky AT us DOT ibm DOT com)
>  * Luciano Resende (lresende AT apache DOT org)
>  * Robert Senkbeil (rcsenkbe AT us DOT ibm DOT com)
>  * Corey Stubbs (cstubbs AT us DOT ibm DOT com)
>  * Miao Wang (wangmiao AT us DOT ibm DOT com)
>  * Sean Welleck (swelleck AT us DOT ibm DOT com)
>
> === Affiliations ===
> All of the initial committers are employed by IBM.
>
> == Sponsors ==
>
> === Champion ===
>  * Sam Ruby (rubys AT apache DOT org)
>
> === Nominated Mentors ===
>  * Luciano Resende (lresende AT apache DOT org)
>  * Reynold Xin (rxin AT apache DOT org)
>  * Hitesh Shah (hitesh AT apache DOT org)
>  * Julien Le Dem (julien AT apache DOT org)
>
> === Sponsoring Entity ===
>
> The Apache Incubator.
>
>
>
> --
> Luciano Resende
> http://people.apache.org/~lresende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
>



-- 
Julien

Re: [DISCUSS] Spark-Kernel Incubator Proposal

Posted by Bertrand Delacretaz <bd...@apache.org>.

Hi,

On Tue, Dec 15, 2015 at 7:26 PM, Luciano Resende <lu...@gmail.com> wrote:
> ...We used Sam approach and replaced vowels from our previous choice and came
> up with Toree, and if nobody have any issues with the name, we will use it....

Note that we have TomEE which is fairly similar.

But we also have Flink and Sling with the same issue, I suppose it's
unavoidable as our number of projects grows - just wanted to mention
it.

-Bertrand

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [DISCUSS] Spark-Kernel Incubator Proposal

Posted by Luciano Resende <lu...@gmail.com>.

Ok, I have done some research and here are some proposed names :

Toree
Zawn
Smirr
Inlata

We used Sam approach and replaced vowels from our previous choice and came
up with Toree, and if nobody have any issues with the name, we will use it.
I will leave this open for couple days before I update the vote thread with
the new name.

Thank you all that identified the naming issue early on, and helped finding
a new name.

On Wed, Dec 2, 2015 at 7:15 PM, Sam Ruby <ru...@intertwingly.net> wrote:

> On Wed, Dec 2, 2015 at 5:52 PM, Luciano Resende <lu...@gmail.com>
> wrote:
> > On Mon, Nov 30, 2015 at 4:13 PM, Julien Le Dem <ju...@dremio.com>
> wrote:
> >
> >> Sorry for the late reply.
> >> FYI there is an opensource project called torii already:
> >> https://vestorly.github.io/torii/
> >> Whether there is a trademark or not, I'd recommend a name that does not
> >> collide with another project.
> >>
> >>
> > We missed that, and I guess we are also getting out of names...
> >
> > Any name suggestion from the community ?
>
> Random thought, take a name from this list and replace a random vowel
> with a 'y':
>
> https://en.wikipedia.org/wiki/Moons_of_Jupiter#List
>
> Example: Eyropa.
>
> - Sam Ruby
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>

-- 
Luciano Resende
http://people.apache.org/~lresende
http://twitter.com/lresende1975
http://lresende.blogspot.com/

Re: [DISCUSS] Spark-Kernel Incubator Proposal

Posted by Sam Ruby <ru...@intertwingly.net>.

On Wed, Dec 2, 2015 at 5:52 PM, Luciano Resende <lu...@gmail.com> wrote:
> On Mon, Nov 30, 2015 at 4:13 PM, Julien Le Dem <ju...@dremio.com> wrote:
>
>> Sorry for the late reply.
>> FYI there is an opensource project called torii already:
>> https://vestorly.github.io/torii/
>> Whether there is a trademark or not, I'd recommend a name that does not
>> collide with another project.
>>
>>
> We missed that, and I guess we are also getting out of names...
>
> Any name suggestion from the community ?

Random thought, take a name from this list and replace a random vowel
with a 'y':

https://en.wikipedia.org/wiki/Moons_of_Jupiter#List

Example: Eyropa.

- Sam Ruby

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [DISCUSS] Spark-Kernel Incubator Proposal

Posted by Luciano Resende <lu...@gmail.com>.

On Mon, Nov 30, 2015 at 4:13 PM, Julien Le Dem <ju...@dremio.com> wrote:

> Sorry for the late reply.
> FYI there is an opensource project called torii already:
> https://vestorly.github.io/torii/
> Whether there is a trademark or not, I'd recommend a name that does not
> collide with another project.
>
>
We missed that, and I guess we are also getting out of names...

Any name suggestion from the community ?




-- 
Luciano Resende
http://people.apache.org/~lresende
http://twitter.com/lresende1975
http://lresende.blogspot.com/

Re: [DISCUSS] Spark-Kernel Incubator Proposal

Posted by Julien Le Dem <ju...@dremio.com>.

Sorry for the late reply.
FYI there is an opensource project called torii already:
https://vestorly.github.io/torii/
Whether there is a trademark or not, I'd recommend a name that does not
collide with another project.

On Wed, Nov 25, 2015 at 9:00 PM, Luciano Resende <lu...@gmail.com>
wrote:

> Thanks for all your feedback, we have updated the proposal with the
> following :
>
> - Renamed the project to Torii
> - Added new mentors that volunteered during the discussion
>
> Below is an updated proposal, which I will be calling for a vote shortly.
>
> = Torii =
>
> == Abstract ==
> Torii provides applications with a mechanism to interactively and remotely
> access Apache Spark.
>
> == Proposal ==
> Torii enables interactive applications to access Apache Spark clusters.
> More specifically:
>  * Applications can send code-snippets and libraries for execution by Spark
>  * Applications can be deployed separately from Spark clusters and
> communicate with the Torii using the provided Torii client
>  * Execution results and streaming data can be sent back to calling
> applications
>  * Applications no longer have to be network connected to the workers on a
> Spark cluster because the Torii acts as each application’s proxy
>  * Work has started on enabling Torii to support languages in addition to
> Scala, namely Python (with PySpark), R (with SparkR), and SQL (with
> SparkSQL)
>
> == Background & Rationale ==
> Apache Spark provides applications with a fast and general purpose
> distributed computing engine that supports static and streaming data,
> tabular and graph representations of data, and an extensive library of
> machine learning libraries. Consequently, a wide variety of applications
> will be written for Spark and there will be interactive applications that
> require relatively frequent function evaluations, and batch-oriented
> applications that require one-shot or only occasional evaluation.
>
> Apache Spark provides two mechanisms for applications to connect with
> Spark. The primary mechanism launches applications on Spark clusters using
> spark-submit (
> http://spark.apache.org/docs/latest/submitting-applications.html); this
> requires developers to bundle their application code plus any dependencies
> into JAR files, and then submit them to Spark. A second mechanism is an
> ODBC/JDBC API (
>
> http://spark.apache.org/docs/latest/sql-programming-guide.html#distributed-sql-engine
> )
> which enables applications to issue SQL queries against SparkSQL.
>
> Our experience when developing interactive applications, such as analytic
> applications integrated with Notebooks, to run against Spark was that the
> spark-submit mechanism was overly cumbersome and slow (requiring JAR
> creation and forking processes to run spark-submit), and the SQL interface
> was too limiting and did not offer easy access to components other than
> SparkSQL, such as streaming. The most promising mechanism provided by
> Apache Spark was the command-line shell (
> http://spark.apache.org/docs/latest/programming-guide.html#using-the-shell
> )
> which enabled us to execute code snippets and dynamically control the tasks
> submitted to  a Spark cluster. Spark does not provide the command-line
> shell as a consumable service but it provided us with the starting point
> from which we developed Torii.
>
> == Current Status ==
> Torii was first developed by a small team working on an internal-IBM
> Spark-related project in July 2014. In recognition of its likely general
> utility to Spark users and developers, in November 2014 the Torii project
> was moved to GitHub and made available under the Apache License V2.
>
> == Meritocracy ==
> The current developers are familiar with the meritocratic open source
> development process at Apache. As the project has gathered interest at
> GitHub the developers have actively started a process to invite additional
> developers into the project, and we have at least one new developer who is
> ready to contribute code to the project.
>
> == Community ==
> We started building a community around Torii project when we moved it to
> GitHub about one year ago. Since then we have grown to about 70 people, and
> there are regular requests and suggestions from the community. We believe
> that providing Apache Spark application developers with a general-purpose
> and interactive API holds a lot of community potential, especially
> considering possible tie-in’s with Notebooks and data science community.
>
> == Core Developers ==
> The core developers of the project are currently all from IBM, from the IBM
> Emerging Technology team and from IBM’s recently formed Spark Technology
> Center.
>
> == Alignment ==
> Apache, as the home of Apache Spark, is the most natural home for the Torii
> project because it was designed to work with Apache Spark and to provide
> capabilities for interactive applications and data science tools not
> provided by Spark itself.
>
> The Torii also has an affinity with Jupyter (jupyter.org) because it uses
> the Jupyter protocol for communications, and so Jupyter Notebooks can
> directly use the Torii as a kernel for communicating with Apache Spark.
> However, we believe that the Torii provides a general-purpose mechanism
> enabling a wider variety of applications than just Notebooks to access
> Spark, and so the Torii’s greatest affinity is with Apache and Apache
> Spark.
>
> == Known Risks ==
>
> === Orphaned products ===
> We believe the Torii project has a low-risk of abandonment due to interest
> in its continuing existence from several parties. More specifically, the
> Torii provides a capability that is not provided by Apache Spark today but
> it enables a wider range of applications to leverage Spark. For example,
> IBM uses (and is considering) the Torii in several offerings including its
> IBM Analytics for Apache Spark product in the Bluemix Cloud. There are also
> a couple of other commercial users who are using or considering its use in
> their offerings. Furthermore, Jupyter Notebooks are used by data scientists
> and Spark is gaining popularity as an analytic engine for them. Jupyter
> Notebooks are very easily enabled with the Torii and so there is another
> constituency for it.
>
> === Inexperience with Open Source ===
> The Torii project has been running as an open-source project (albeit with
> only IBM committers) for the past several months. The project has an active
> issue tracker and due to the interest indicated by the nature and volume of
> requests and comments, the team has publicly stated it is beginning to
> build a process so they can accept third-party contributions to the
> project.
>
> === Relationships with Other Apache Products ===
> The Torii has a clear affinity with the Apache Spark project because it is
> designed to  provide capabilities for interactive applications and data
> science tools not provided by Spark itself. The Torii can be a back-end for
> the Zeppelin project currently incubating at Apache. There is interest from
> the Torii community to develop this capability and an experimental branch
> has been started.
>
> === Homogeneous Developers ===
> The current group of developers working on Torii are all from IBM although
> the group is in the process of expanding its membership to include members
> of the GitHub community who are not from IBM and who have been active in
> the Torii community in GutHub.
>
> === Reliance on Salaried Developers ===
> The initial committers are full-time employees at IBM although not all work
> on the project full-time.
>
> === Excessive Fascination with the Apache Brand ===
> We believe the Torii benefits Apache Spark application developers, and we
> are interested in an Apache Torii project to benefit these developers by
> engaging a larger community, facilitating closer ties with the existing
> Spark project, and yes, gaining more visibility for the Torii as a
> solution.
>
> === Documentation ===
> Comprehensive documentation including “Getting Started”, API specifications
> and a Roadmap are available from the GitHub project, see
> https://github.com/ibm-et/Torii/wiki.
>
> === Initial Source ===
> The source code resides at https://github.com/ibm-et/Torii.
>
> === External Dependencies ===
> The Torii depends upon a number of Apache projects:
>  * Spark
>  * Hadoop
>  * Ivy
>  * Commons
>
> The Torii also depends upon a number of other open source projects:
>  * ZeroMQ (LGPL with Static Linking Exception,
> http://zeromq.org/area:licensing)
>  * Akka (MIT)
>  * JOpt Simple (MIT)
>  * Spring Framework Core (Apache v2)
>  * Play (Apache v2)
>  * SLF4J (MIT)
>  * Scala
>  * Scalatest (Apache v2)
>  * Scalactic (Apache v2)
>  * Mockito (MIT)
>
> == Required Resources ==
>
> === Mailing lists ===
>
>  * private@torii.incubator.apache.org (with moderated subscriptions)
>  * commits@torii.incubator.apache.org
>  * dev@torii.incubator.apache.org
>
> === Git Repository ===
>
>  * https://git-wip-us.apache.org/repos/asf/incubator-torii.git
>
> === Issue Tracking ===
>
>  * A JIRA issue tracker: https://issues.apache.org/jira/browse/TORII
>
> == Initial Committers ==
>
>  * Leugim Bustelo (lbustelo AT us DOT ibm DOT com)
>  * Jakob Odersky (odersky AT us DOT ibm DOT com)
>  * Luciano Resende (lresende AT apache DOT org)
>  * Robert Senkbeil (rcsenkbe AT us DOT ibm DOT com)
>  * Corey Stubbs (cstubbs AT us DOT ibm DOT com)
>  * Miao Wang (wangmiao AT us DOT ibm DOT com)
>  * Sean Welleck (swelleck AT us DOT ibm DOT com)
>
> === Affiliations ===
> All of the initial committers are employed by IBM.
>
> == Sponsors ==
>
> === Champion ===
>  * Sam Ruby (rubys AT apache DOT org)
>
> === Nominated Mentors ===
>  * Luciano Resende (lresende AT apache DOT org)
>  * Reynold Xin (rxin AT apache DOT org)
>  * Hitesh Shah (hitesh AT apache DOT org)
>  * Julien Le Dem (julien AT apache DOT org)
>
> === Sponsoring Entity ===
>
> The Apache Incubator.
>
>
>
> --
> Luciano Resende
> http://people.apache.org/~lresende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
>



-- 
Julien

Re: [DISCUSS] Spark-Kernel Incubator Proposal

Posted by Luciano Resende <lu...@gmail.com>.

Thanks for all your feedback, we have updated the proposal with the
following :

- Renamed the project to Torii
- Added new mentors that volunteered during the discussion

Below is an updated proposal, which I will be calling for a vote shortly.

= Torii =

== Abstract ==
Torii provides applications with a mechanism to interactively and remotely
access Apache Spark.

== Proposal ==
Torii enables interactive applications to access Apache Spark clusters.
More specifically:
 * Applications can send code-snippets and libraries for execution by Spark
 * Applications can be deployed separately from Spark clusters and
communicate with the Torii using the provided Torii client
 * Execution results and streaming data can be sent back to calling
applications
 * Applications no longer have to be network connected to the workers on a
Spark cluster because the Torii acts as each application’s proxy
 * Work has started on enabling Torii to support languages in addition to
Scala, namely Python (with PySpark), R (with SparkR), and SQL (with
SparkSQL)

== Background & Rationale ==
Apache Spark provides applications with a fast and general purpose
distributed computing engine that supports static and streaming data,
tabular and graph representations of data, and an extensive library of
machine learning libraries. Consequently, a wide variety of applications
will be written for Spark and there will be interactive applications that
require relatively frequent function evaluations, and batch-oriented
applications that require one-shot or only occasional evaluation.

Apache Spark provides two mechanisms for applications to connect with
Spark. The primary mechanism launches applications on Spark clusters using
spark-submit (
http://spark.apache.org/docs/latest/submitting-applications.html); this
requires developers to bundle their application code plus any dependencies
into JAR files, and then submit them to Spark. A second mechanism is an
ODBC/JDBC API (
http://spark.apache.org/docs/latest/sql-programming-guide.html#distributed-sql-engine)
which enables applications to issue SQL queries against SparkSQL.

Our experience when developing interactive applications, such as analytic
applications integrated with Notebooks, to run against Spark was that the
spark-submit mechanism was overly cumbersome and slow (requiring JAR
creation and forking processes to run spark-submit), and the SQL interface
was too limiting and did not offer easy access to components other than
SparkSQL, such as streaming. The most promising mechanism provided by
Apache Spark was the command-line shell (
http://spark.apache.org/docs/latest/programming-guide.html#using-the-shell)
which enabled us to execute code snippets and dynamically control the tasks
submitted to  a Spark cluster. Spark does not provide the command-line
shell as a consumable service but it provided us with the starting point
from which we developed Torii.

== Current Status ==
Torii was first developed by a small team working on an internal-IBM
Spark-related project in July 2014. In recognition of its likely general
utility to Spark users and developers, in November 2014 the Torii project
was moved to GitHub and made available under the Apache License V2.

== Meritocracy ==
The current developers are familiar with the meritocratic open source
development process at Apache. As the project has gathered interest at
GitHub the developers have actively started a process to invite additional
developers into the project, and we have at least one new developer who is
ready to contribute code to the project.

== Community ==
We started building a community around Torii project when we moved it to
GitHub about one year ago. Since then we have grown to about 70 people, and
there are regular requests and suggestions from the community. We believe
that providing Apache Spark application developers with a general-purpose
and interactive API holds a lot of community potential, especially
considering possible tie-in’s with Notebooks and data science community.

== Core Developers ==
The core developers of the project are currently all from IBM, from the IBM
Emerging Technology team and from IBM’s recently formed Spark Technology
Center.

== Alignment ==
Apache, as the home of Apache Spark, is the most natural home for the Torii
project because it was designed to work with Apache Spark and to provide
capabilities for interactive applications and data science tools not
provided by Spark itself.

The Torii also has an affinity with Jupyter (jupyter.org) because it uses
the Jupyter protocol for communications, and so Jupyter Notebooks can
directly use the Torii as a kernel for communicating with Apache Spark.
However, we believe that the Torii provides a general-purpose mechanism
enabling a wider variety of applications than just Notebooks to access
Spark, and so the Torii’s greatest affinity is with Apache and Apache
Spark.

== Known Risks ==

=== Orphaned products ===
We believe the Torii project has a low-risk of abandonment due to interest
in its continuing existence from several parties. More specifically, the
Torii provides a capability that is not provided by Apache Spark today but
it enables a wider range of applications to leverage Spark. For example,
IBM uses (and is considering) the Torii in several offerings including its
IBM Analytics for Apache Spark product in the Bluemix Cloud. There are also
a couple of other commercial users who are using or considering its use in
their offerings. Furthermore, Jupyter Notebooks are used by data scientists
and Spark is gaining popularity as an analytic engine for them. Jupyter
Notebooks are very easily enabled with the Torii and so there is another
constituency for it.

=== Inexperience with Open Source ===
The Torii project has been running as an open-source project (albeit with
only IBM committers) for the past several months. The project has an active
issue tracker and due to the interest indicated by the nature and volume of
requests and comments, the team has publicly stated it is beginning to
build a process so they can accept third-party contributions to the project.

=== Relationships with Other Apache Products ===
The Torii has a clear affinity with the Apache Spark project because it is
designed to  provide capabilities for interactive applications and data
science tools not provided by Spark itself. The Torii can be a back-end for
the Zeppelin project currently incubating at Apache. There is interest from
the Torii community to develop this capability and an experimental branch
has been started.

=== Homogeneous Developers ===
The current group of developers working on Torii are all from IBM although
the group is in the process of expanding its membership to include members
of the GitHub community who are not from IBM and who have been active in
the Torii community in GutHub.

=== Reliance on Salaried Developers ===
The initial committers are full-time employees at IBM although not all work
on the project full-time.

=== Excessive Fascination with the Apache Brand ===
We believe the Torii benefits Apache Spark application developers, and we
are interested in an Apache Torii project to benefit these developers by
engaging a larger community, facilitating closer ties with the existing
Spark project, and yes, gaining more visibility for the Torii as a solution.

=== Documentation ===
Comprehensive documentation including “Getting Started”, API specifications
and a Roadmap are available from the GitHub project, see
https://github.com/ibm-et/Torii/wiki.

=== Initial Source ===
The source code resides at https://github.com/ibm-et/Torii.

=== External Dependencies ===
The Torii depends upon a number of Apache projects:
 * Spark
 * Hadoop
 * Ivy
 * Commons

The Torii also depends upon a number of other open source projects:
 * ZeroMQ (LGPL with Static Linking Exception,
http://zeromq.org/area:licensing)
 * Akka (MIT)
 * JOpt Simple (MIT)
 * Spring Framework Core (Apache v2)
 * Play (Apache v2)
 * SLF4J (MIT)
 * Scala
 * Scalatest (Apache v2)
 * Scalactic (Apache v2)
 * Mockito (MIT)

== Required Resources ==

=== Mailing lists ===

 * private@torii.incubator.apache.org (with moderated subscriptions)
 * commits@torii.incubator.apache.org
 * dev@torii.incubator.apache.org

=== Git Repository ===

 * https://git-wip-us.apache.org/repos/asf/incubator-torii.git

=== Issue Tracking ===

 * A JIRA issue tracker: https://issues.apache.org/jira/browse/TORII

== Initial Committers ==

 * Leugim Bustelo (lbustelo AT us DOT ibm DOT com)
 * Jakob Odersky (odersky AT us DOT ibm DOT com)
 * Luciano Resende (lresende AT apache DOT org)
 * Robert Senkbeil (rcsenkbe AT us DOT ibm DOT com)
 * Corey Stubbs (cstubbs AT us DOT ibm DOT com)
 * Miao Wang (wangmiao AT us DOT ibm DOT com)
 * Sean Welleck (swelleck AT us DOT ibm DOT com)

=== Affiliations ===
All of the initial committers are employed by IBM.

== Sponsors ==

=== Champion ===
 * Sam Ruby (rubys AT apache DOT org)

=== Nominated Mentors ===
 * Luciano Resende (lresende AT apache DOT org)
 * Reynold Xin (rxin AT apache DOT org)
 * Hitesh Shah (hitesh AT apache DOT org)
 * Julien Le Dem (julien AT apache DOT org)

=== Sponsoring Entity ===

The Apache Incubator.



-- 
Luciano Resende
http://people.apache.org/~lresende
http://twitter.com/lresende1975
http://lresende.blogspot.com/