You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Gil Vernik <GI...@il.ibm.com> on 2014/06/08 10:01:50 UTC
Apache Spark and Swift object store
Hello everyone,
I would like to initiate discussion about integration Apache Spark and
Openstack Swift.
(https://issues.apache.org/jira/browse/SPARK-938 was created while ago)
I created a patch (https://github.com/apache/spark/pull/1010) that
provides initial information how to connect Swift and Spark. Currently it
uses Hadoop 2.3.0 and only stand alone mode of Spark. This patch is mainly
used to provide community a way to experiment with this integration.
I have it fully working on my private cluster and it works very well,
allowing me to make various analytics using Spark.
My next planned patches will include information how to configure Swift
for other cluster deployment of Spark and also information how to
integrate Spark and Swift with earlier versions of Hadoop.
I am confident that the integration between Spark and Swift is very
important future that will benefit greatly for the exposure of Spark.
The integration between Spark and Swift is very similar to how Spark
integrates with S3.
Will be great to hear comments / suggestions / remarks from the community!
All the best,
Gil Vernik.
Re: Apache Spark and Swift object store
Posted by Gil Vernik <GI...@il.ibm.com>.
Hi Matthew,
Thanks for the information about Spark plugin for Sahara.
I will send emails to Sahara and Swift mailing lists notifying them about
this patch.
All the best,
Gil.
From: Matthew Farrellee <ma...@redhat.com>
To: dev@spark.apache.org,
Date: 14/06/2014 06:53 PM
Subject: Re: Apache Spark and Swift object store
On 06/08/2014 04:01 AM, Gil Vernik wrote:
> Hello everyone,
>
> I would like to initiate discussion about integration Apache Spark and
> Openstack Swift.
> (https://issues.apache.org/jira/browse/SPARK-938 was created while ago)
>
> I created a patch (https://github.com/apache/spark/pull/1010) that
> provides initial information how to connect Swift and Spark. Currently
it
> uses Hadoop 2.3.0 and only stand alone mode of Spark. This patch is
mainly
> used to provide community a way to experiment with this integration.
> I have it fully working on my private cluster and it works very well,
> allowing me to make various analytics using Spark.
>
> My next planned patches will include information how to configure Swift
> for other cluster deployment of Spark and also information how to
> integrate Spark and Swift with earlier versions of Hadoop.
> I am confident that the integration between Spark and Swift is very
> important future that will benefit greatly for the exposure of Spark.
>
> The integration between Spark and Swift is very similar to how Spark
> integrates with S3.
>
> Will be great to hear comments / suggestions / remarks from the
community!
>
> All the best,
> Gil Vernik.
gil,
the sahara project within openstack is also taking on this effort.
https://wiki.openstack.org/wiki/Sahara/SparkPlugin
there's currently a plugin to provision a spark cluster on openstack and
folks on #openstack-sahara will be very interested to hear what you're
working on.
the theory atm is that the work done to create the swift dfs plugin will
easily integrate spark and swift, and it's great to see that your patch
suggests this works in practice.
best,
matt
Re: Apache Spark and Swift object store
Posted by Matthew Farrellee <ma...@redhat.com>.
On 06/08/2014 04:01 AM, Gil Vernik wrote:
> Hello everyone,
>
> I would like to initiate discussion about integration Apache Spark and
> Openstack Swift.
> (https://issues.apache.org/jira/browse/SPARK-938 was created while ago)
>
> I created a patch (https://github.com/apache/spark/pull/1010) that
> provides initial information how to connect Swift and Spark. Currently it
> uses Hadoop 2.3.0 and only stand alone mode of Spark. This patch is mainly
> used to provide community a way to experiment with this integration.
> I have it fully working on my private cluster and it works very well,
> allowing me to make various analytics using Spark.
>
> My next planned patches will include information how to configure Swift
> for other cluster deployment of Spark and also information how to
> integrate Spark and Swift with earlier versions of Hadoop.
> I am confident that the integration between Spark and Swift is very
> important future that will benefit greatly for the exposure of Spark.
>
> The integration between Spark and Swift is very similar to how Spark
> integrates with S3.
>
> Will be great to hear comments / suggestions / remarks from the community!
>
> All the best,
> Gil Vernik.
gil,
the sahara project within openstack is also taking on this effort.
https://wiki.openstack.org/wiki/Sahara/SparkPlugin
there's currently a plugin to provision a spark cluster on openstack and
folks on #openstack-sahara will be very interested to hear what you're
working on.
the theory atm is that the work done to create the swift dfs plugin will
easily integrate spark and swift, and it's great to see that your patch
suggests this works in practice.
best,
matt