You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Steve Loughran <st...@hortonworks.com> on 2016/10/07 17:56:20 UTC

Anyone interested in Spark & Cloud got time to look at the SPARK-7481 PR?

Some people may have noticed I've been working on adding packaging, docs & testing for getting Spark to work with S3, Azure and openstack into a Spark distribution,

https://github.com/apache/spark/pull/12004

It's been a WiP, but now I've got tests for all three cloud infrastructures, tests covering: basic IO, output committing, dataframe IO and streaming, the core test coverage is done; the packaging working.

Which means I'd really like some reviews by people who want to have spark work with S3, Azure or their local Swift endpoint to review that PR, ideally going through the documentation and validating that as well as the code.
It's Hadoop 2.7+ only, with a new profile, "cloud", to pull in the new module of the same name.

thanks

-Steve

PS: documentation (without templated code rendering):
https://github.com/steveloughran/spark/blob/features/SPARK-7481-cloud/docs/cloud-integration.md