You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Schmirr Wurst <sc...@gmail.com> on 2015/07/17 10:36:58 UTC

use S3-Compatible Storage with spark

Hi,

I wonder how to use S3 compatible Storage in Spark ?
If I'm using s3n:// url schema, the it will point to amazon, is there
a way I can specify the host somewhere ?

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: use S3-Compatible Storage with spark

Posted by Ankur Chauhan <an...@malloc64.com>.

The endpoint is the property you want to set. I would look at the source for that.

Sent from my iPhone

> On Jul 17, 2015, at 08:55, Sujit Pal <su...@gmail.com> wrote:
> 
> Hi Schmirr,
> 
> The part after the s3n:// is your bucket name and folder name, ie s3n://${bucket_name}/${folder_name}[/${subfolder_name}]*. Bucket names are unique across S3, so the resulting path is also unique. There is no concept of hostname in s3 urls as far as I know.
> 
> -sujit
> 
> 
>> On Fri, Jul 17, 2015 at 1:36 AM, Schmirr Wurst <sc...@gmail.com> wrote:
>> Hi,
>> 
>> I wonder how to use S3 compatible Storage in Spark ?
>> If I'm using s3n:// url schema, the it will point to amazon, is there
>> a way I can specify the host somewhere ?
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>

Re: use S3-Compatible Storage with spark

Posted by Sujit Pal <su...@gmail.com>.

Hi Schmirr,

The part after the s3n:// is your bucket name and folder name, ie
s3n://${bucket_name}/${folder_name}[/${subfolder_name}]*. Bucket names are
unique across S3, so the resulting path is also unique. There is no concept
of hostname in s3 urls as far as I know.

-sujit

On Fri, Jul 17, 2015 at 1:36 AM, Schmirr Wurst <sc...@gmail.com>
wrote:

> Hi,
>
> I wonder how to use S3 compatible Storage in Spark ?
> If I'm using s3n:// url schema, the it will point to amazon, is there
> a way I can specify the host somewhere ?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Re: use S3-Compatible Storage with spark

Posted by Schmirr Wurst <sc...@gmail.com>.

I could get a little further :
- installed spark-1.4.1-without-hadoop
- unpacked hadoop 2.7.1
- added the folowing in spark-env.sh

HADOOP_HOME=/opt/hadoop-2.7.1/
SPARK_DIST_CLASSPATH=/opt/hadoop-2.7.1/opt/hadoop-2.7.1/share/hadoop/tools/lib/*/share/hadoop/tools/lib/*:/opt/hadoop-2.7.1/etc/hadoop:/opt/hadoop-2.7.1/share/hadoop/common/lib/*:/opt/had$

and start spark-shell with :
bin/spark-shell --jars
/opt/hadoop-2.7.1/share/hadoop/tools/lib/hadoop-aws-2.7.1.jar

Now spark-shell is starting with
"spark.SparkContext: Added JAR
file:/opt/hadoop-2.7.1/share/hadoop/tools/lib/hadoop-aws-2.7.1.jar at
http://185.19.29.91:46368/jars/hadoop-aws-2.7.1.jar with timestamp
1437575186830"

But when trying to access s3 I have
java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem:
Provider org.apache.hadoop.fs.s3a.S3AFileSystem could not be
instantiated

In Fact it doesn't even matters if I try to use s3n or s3a, error is
the same (strange!)

2015-07-22 12:19 GMT+02:00 Thomas Demoor <Th...@hgst.com>:
> You need to get the hadoop-aws.jar from hadoop-tools (use hadoop 2.7+) - you can get the source and build with mvn or get it from prebuilt hadoop distro's. Then when you run your spark job add --jars path/to/thejar
>
> ________________________________________
> From: Schmirr Wurst <sc...@gmail.com>
> Sent: Wednesday, July 22, 2015 12:06 PM
> To: Thomas Demoor
> Subject: Re: use S3-Compatible Storage with spark
>
> Hi Thomas, thanks, could you just tell me what exaclty I need to do ?
> I'm not familiar with java programming
> - where do I get the jar from, do  I need to compile it with mvn ?
> - where should I update the classpath and how ?
>
>
>
> 2015-07-22 11:55 GMT+02:00 Thomas Demoor <Th...@hgst.com>:
>> The classes are not found. Is the jar on your classpath?
>>
>> Take care: there are multiple s3 connectors in hadoop: the legacy s3n, based on a 3d party S3 lib Jets3t, and the recent (functional since hadoop 2.7)  s3a based on the Amazon SDK. Make sure you stick to one: so use fs.s3a endpoint and url s3a://bucket/object or fs.s3n.endpoint and s3n://bucket/object. I recommend s3a but I'm biased :P
>>
>> Regards,
>> Thomas
>>
>> ________________________________________
>> From: Schmirr Wurst <sc...@gmail.com>
>> Sent: Tuesday, July 21, 2015 11:59 AM
>> To: Akhil Das
>> Cc: user@spark.apache.org
>> Subject: Re: use S3-Compatible Storage with spark
>>
>> Which version do you have ?
>>
>> - I tried with spark 1.4.1 for hdp 2.6, but here I had an issue that
>> the aws-module is not there somehow:
>> java.io.IOException: No FileSystem for scheme: s3n
>> the same for s3a :
>> java.lang.RuntimeException: java.lang.ClassNotFoundException: Class
>> org.apache.hadoop.fs.s3a.S3AFileSystem not found
>>
>> - On Spark 1.4.1 for hdp 2.4 , the module is there, and works out of
>> the box for S3n (but for the endpoint)
>> But I have "java.io.IOException: No FileSystem for scheme: s3a"
>>
>> :-|
>>
>> 2015-07-21 11:09 GMT+02:00 Akhil Das <ak...@sigmoidanalytics.com>:
>>> Did you try with s3a? It seems its more like an issue with hadoop.
>>>
>>> Thanks
>>> Best Regards
>>>
>>> On Tue, Jul 21, 2015 at 2:31 PM, Schmirr Wurst <sc...@gmail.com>
>>> wrote:
>>>>
>>>> It seems to work for the credentials , but the endpoint is ignored.. :
>>>> I've changed it to
>>>> sc.hadoopConfiguration.set("fs.s3n.endpoint","test.com")
>>>>
>>>> And I continue to get my data from amazon, how could it be ? (I also
>>>> use s3n in my text url)
>>>>
>>>> 2015-07-21 9:30 GMT+02:00 Akhil Das <ak...@sigmoidanalytics.com>:
>>>> > You can add the jar in the classpath, and you can set the property like:
>>>> >
>>>> > sc.hadoopConfiguration.set("fs.s3a.endpoint","storage.sigmoid.com")
>>>> >
>>>> >
>>>> >
>>>> > Thanks
>>>> > Best Regards
>>>> >
>>>> > On Mon, Jul 20, 2015 at 9:41 PM, Schmirr Wurst <sc...@gmail.com>
>>>> > wrote:
>>>> >>
>>>> >> Thanks, that is what I was looking for...
>>>> >>
>>>> >> Any Idea where I have to store and reference the corresponding
>>>> >> hadoop-aws-2.6.0.jar ?:
>>>> >>
>>>> >> java.io.IOException: No FileSystem for scheme: s3n
>>>> >>
>>>> >> 2015-07-20 8:33 GMT+02:00 Akhil Das <ak...@sigmoidanalytics.com>:
>>>> >> > Not in the uri, but in the hadoop configuration you can specify it.
>>>> >> >
>>>> >> > <property>
>>>> >> >   <name>fs.s3a.endpoint</name>
>>>> >> >   <description>AWS S3 endpoint to connect to. An up-to-date list is
>>>> >> >     provided in the AWS Documentation: regions and endpoints. Without
>>>> >> > this
>>>> >> >     property, the standard region (s3.amazonaws.com) is assumed.
>>>> >> >   </description>
>>>> >> > </property>
>>>> >> >
>>>> >> >
>>>> >> > Thanks
>>>> >> > Best Regards
>>>> >> >
>>>> >> > On Sun, Jul 19, 2015 at 9:13 PM, Schmirr Wurst
>>>> >> > <sc...@gmail.com>
>>>> >> > wrote:
>>>> >> >>
>>>> >> >> I want to use pithos, were do I can specify that endpoint, is it
>>>> >> >> possible in the url ?
>>>> >> >>
>>>> >> >> 2015-07-19 17:22 GMT+02:00 Akhil Das <ak...@sigmoidanalytics.com>:
>>>> >> >> > Could you name the Storage service that you are using? Most of
>>>> >> >> > them
>>>> >> >> > provides
>>>> >> >> > a S3 like RestAPI endpoint for you to hit.
>>>> >> >> >
>>>> >> >> > Thanks
>>>> >> >> > Best Regards
>>>> >> >> >
>>>> >> >> > On Fri, Jul 17, 2015 at 2:06 PM, Schmirr Wurst
>>>> >> >> > <sc...@gmail.com>
>>>> >> >> > wrote:
>>>> >> >> >>
>>>> >> >> >> Hi,
>>>> >> >> >>
>>>> >> >> >> I wonder how to use S3 compatible Storage in Spark ?
>>>> >> >> >> If I'm using s3n:// url schema, the it will point to amazon, is
>>>> >> >> >> there
>>>> >> >> >> a way I can specify the host somewhere ?
>>>> >> >> >>
>>>> >> >> >>
>>>> >> >> >>
>>>> >> >> >> ---------------------------------------------------------------------
>>>> >> >> >> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>> >> >> >> For additional commands, e-mail: user-help@spark.apache.org
>>>> >> >> >>
>>>> >> >> >
>>>> >> >>
>>>> >> >>
>>>> >> >> ---------------------------------------------------------------------
>>>> >> >> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>> >> >> For additional commands, e-mail: user-help@spark.apache.org
>>>> >> >>
>>>> >> >
>>>> >
>>>> >
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: use S3-Compatible Storage with spark

Posted by Schmirr Wurst <sc...@gmail.com>.

Which version do you have ?

- I tried with spark 1.4.1 for hdp 2.6, but here I had an issue that
the aws-module is not there somehow:
java.io.IOException: No FileSystem for scheme: s3n
the same for s3a :
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class
org.apache.hadoop.fs.s3a.S3AFileSystem not found

- On Spark 1.4.1 for hdp 2.4 , the module is there, and works out of
the box for S3n (but for the endpoint)
But I have "java.io.IOException: No FileSystem for scheme: s3a"

:-|

2015-07-21 11:09 GMT+02:00 Akhil Das <ak...@sigmoidanalytics.com>:
> Did you try with s3a? It seems its more like an issue with hadoop.
>
> Thanks
> Best Regards
>
> On Tue, Jul 21, 2015 at 2:31 PM, Schmirr Wurst <sc...@gmail.com>
> wrote:
>>
>> It seems to work for the credentials , but the endpoint is ignored.. :
>> I've changed it to
>> sc.hadoopConfiguration.set("fs.s3n.endpoint","test.com")
>>
>> And I continue to get my data from amazon, how could it be ? (I also
>> use s3n in my text url)
>>
>> 2015-07-21 9:30 GMT+02:00 Akhil Das <ak...@sigmoidanalytics.com>:
>> > You can add the jar in the classpath, and you can set the property like:
>> >
>> > sc.hadoopConfiguration.set("fs.s3a.endpoint","storage.sigmoid.com")
>> >
>> >
>> >
>> > Thanks
>> > Best Regards
>> >
>> > On Mon, Jul 20, 2015 at 9:41 PM, Schmirr Wurst <sc...@gmail.com>
>> > wrote:
>> >>
>> >> Thanks, that is what I was looking for...
>> >>
>> >> Any Idea where I have to store and reference the corresponding
>> >> hadoop-aws-2.6.0.jar ?:
>> >>
>> >> java.io.IOException: No FileSystem for scheme: s3n
>> >>
>> >> 2015-07-20 8:33 GMT+02:00 Akhil Das <ak...@sigmoidanalytics.com>:
>> >> > Not in the uri, but in the hadoop configuration you can specify it.
>> >> >
>> >> > <property>
>> >> >   <name>fs.s3a.endpoint</name>
>> >> >   <description>AWS S3 endpoint to connect to. An up-to-date list is
>> >> >     provided in the AWS Documentation: regions and endpoints. Without
>> >> > this
>> >> >     property, the standard region (s3.amazonaws.com) is assumed.
>> >> >   </description>
>> >> > </property>
>> >> >
>> >> >
>> >> > Thanks
>> >> > Best Regards
>> >> >
>> >> > On Sun, Jul 19, 2015 at 9:13 PM, Schmirr Wurst
>> >> > <sc...@gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> I want to use pithos, were do I can specify that endpoint, is it
>> >> >> possible in the url ?
>> >> >>
>> >> >> 2015-07-19 17:22 GMT+02:00 Akhil Das <ak...@sigmoidanalytics.com>:
>> >> >> > Could you name the Storage service that you are using? Most of
>> >> >> > them
>> >> >> > provides
>> >> >> > a S3 like RestAPI endpoint for you to hit.
>> >> >> >
>> >> >> > Thanks
>> >> >> > Best Regards
>> >> >> >
>> >> >> > On Fri, Jul 17, 2015 at 2:06 PM, Schmirr Wurst
>> >> >> > <sc...@gmail.com>
>> >> >> > wrote:
>> >> >> >>
>> >> >> >> Hi,
>> >> >> >>
>> >> >> >> I wonder how to use S3 compatible Storage in Spark ?
>> >> >> >> If I'm using s3n:// url schema, the it will point to amazon, is
>> >> >> >> there
>> >> >> >> a way I can specify the host somewhere ?
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> ---------------------------------------------------------------------
>> >> >> >> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> >> >> >> For additional commands, e-mail: user-help@spark.apache.org
>> >> >> >>
>> >> >> >
>> >> >>
>> >> >>
>> >> >> ---------------------------------------------------------------------
>> >> >> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> >> >> For additional commands, e-mail: user-help@spark.apache.org
>> >> >>
>> >> >
>> >
>> >
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: use S3-Compatible Storage with spark

Posted by Akhil Das <ak...@sigmoidanalytics.com>.

Did you try with s3a? It seems its more like an issue with hadoop.

Thanks
Best Regards

On Tue, Jul 21, 2015 at 2:31 PM, Schmirr Wurst <sc...@gmail.com>
wrote:

> It seems to work for the credentials , but the endpoint is ignored.. :
> I've changed it to sc.hadoopConfiguration.set("fs.s3n.endpoint","test.com
> ")
>
> And I continue to get my data from amazon, how could it be ? (I also
> use s3n in my text url)
>
> 2015-07-21 9:30 GMT+02:00 Akhil Das <ak...@sigmoidanalytics.com>:
> > You can add the jar in the classpath, and you can set the property like:
> >
> > sc.hadoopConfiguration.set("fs.s3a.endpoint","storage.sigmoid.com")
> >
> >
> >
> > Thanks
> > Best Regards
> >
> > On Mon, Jul 20, 2015 at 9:41 PM, Schmirr Wurst <sc...@gmail.com>
> > wrote:
> >>
> >> Thanks, that is what I was looking for...
> >>
> >> Any Idea where I have to store and reference the corresponding
> >> hadoop-aws-2.6.0.jar ?:
> >>
> >> java.io.IOException: No FileSystem for scheme: s3n
> >>
> >> 2015-07-20 8:33 GMT+02:00 Akhil Das <ak...@sigmoidanalytics.com>:
> >> > Not in the uri, but in the hadoop configuration you can specify it.
> >> >
> >> > <property>
> >> >   <name>fs.s3a.endpoint</name>
> >> >   <description>AWS S3 endpoint to connect to. An up-to-date list is
> >> >     provided in the AWS Documentation: regions and endpoints. Without
> >> > this
> >> >     property, the standard region (s3.amazonaws.com) is assumed.
> >> >   </description>
> >> > </property>
> >> >
> >> >
> >> > Thanks
> >> > Best Regards
> >> >
> >> > On Sun, Jul 19, 2015 at 9:13 PM, Schmirr Wurst <
> schmirrwurst@gmail.com>
> >> > wrote:
> >> >>
> >> >> I want to use pithos, were do I can specify that endpoint, is it
> >> >> possible in the url ?
> >> >>
> >> >> 2015-07-19 17:22 GMT+02:00 Akhil Das <ak...@sigmoidanalytics.com>:
> >> >> > Could you name the Storage service that you are using? Most of them
> >> >> > provides
> >> >> > a S3 like RestAPI endpoint for you to hit.
> >> >> >
> >> >> > Thanks
> >> >> > Best Regards
> >> >> >
> >> >> > On Fri, Jul 17, 2015 at 2:06 PM, Schmirr Wurst
> >> >> > <sc...@gmail.com>
> >> >> > wrote:
> >> >> >>
> >> >> >> Hi,
> >> >> >>
> >> >> >> I wonder how to use S3 compatible Storage in Spark ?
> >> >> >> If I'm using s3n:// url schema, the it will point to amazon, is
> >> >> >> there
> >> >> >> a way I can specify the host somewhere ?
> >> >> >>
> >> >> >>
> >> >> >>
> ---------------------------------------------------------------------
> >> >> >> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> >> >> >> For additional commands, e-mail: user-help@spark.apache.org
> >> >> >>
> >> >> >
> >> >>
> >> >> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> >> >> For additional commands, e-mail: user-help@spark.apache.org
> >> >>
> >> >
> >
> >
>

Re: use S3-Compatible Storage with spark

Posted by Schmirr Wurst <sc...@gmail.com>.

It seems to work for the credentials , but the endpoint is ignored.. :
I've changed it to sc.hadoopConfiguration.set("fs.s3n.endpoint","test.com")

And I continue to get my data from amazon, how could it be ? (I also
use s3n in my text url)

2015-07-21 9:30 GMT+02:00 Akhil Das <ak...@sigmoidanalytics.com>:
> You can add the jar in the classpath, and you can set the property like:
>
> sc.hadoopConfiguration.set("fs.s3a.endpoint","storage.sigmoid.com")
>
>
>
> Thanks
> Best Regards
>
> On Mon, Jul 20, 2015 at 9:41 PM, Schmirr Wurst <sc...@gmail.com>
> wrote:
>>
>> Thanks, that is what I was looking for...
>>
>> Any Idea where I have to store and reference the corresponding
>> hadoop-aws-2.6.0.jar ?:
>>
>> java.io.IOException: No FileSystem for scheme: s3n
>>
>> 2015-07-20 8:33 GMT+02:00 Akhil Das <ak...@sigmoidanalytics.com>:
>> > Not in the uri, but in the hadoop configuration you can specify it.
>> >
>> > <property>
>> >   <name>fs.s3a.endpoint</name>
>> >   <description>AWS S3 endpoint to connect to. An up-to-date list is
>> >     provided in the AWS Documentation: regions and endpoints. Without
>> > this
>> >     property, the standard region (s3.amazonaws.com) is assumed.
>> >   </description>
>> > </property>
>> >
>> >
>> > Thanks
>> > Best Regards
>> >
>> > On Sun, Jul 19, 2015 at 9:13 PM, Schmirr Wurst <sc...@gmail.com>
>> > wrote:
>> >>
>> >> I want to use pithos, were do I can specify that endpoint, is it
>> >> possible in the url ?
>> >>
>> >> 2015-07-19 17:22 GMT+02:00 Akhil Das <ak...@sigmoidanalytics.com>:
>> >> > Could you name the Storage service that you are using? Most of them
>> >> > provides
>> >> > a S3 like RestAPI endpoint for you to hit.
>> >> >
>> >> > Thanks
>> >> > Best Regards
>> >> >
>> >> > On Fri, Jul 17, 2015 at 2:06 PM, Schmirr Wurst
>> >> > <sc...@gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> Hi,
>> >> >>
>> >> >> I wonder how to use S3 compatible Storage in Spark ?
>> >> >> If I'm using s3n:// url schema, the it will point to amazon, is
>> >> >> there
>> >> >> a way I can specify the host somewhere ?
>> >> >>
>> >> >>
>> >> >> ---------------------------------------------------------------------
>> >> >> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> >> >> For additional commands, e-mail: user-help@spark.apache.org
>> >> >>
>> >> >
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> >> For additional commands, e-mail: user-help@spark.apache.org
>> >>
>> >
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: use S3-Compatible Storage with spark

Posted by Akhil Das <ak...@sigmoidanalytics.com>.

You can add the jar in the classpath, and you can set the property like:

sc.hadoopConfiguration.set("fs.s3a.endpoint","storage.sigmoid.com")



Thanks
Best Regards

On Mon, Jul 20, 2015 at 9:41 PM, Schmirr Wurst <sc...@gmail.com>
wrote:

> Thanks, that is what I was looking for...
>
> Any Idea where I have to store and reference the corresponding
> hadoop-aws-2.6.0.jar ?:
>
> java.io.IOException: No FileSystem for scheme: s3n
>
> 2015-07-20 8:33 GMT+02:00 Akhil Das <ak...@sigmoidanalytics.com>:
> > Not in the uri, but in the hadoop configuration you can specify it.
> >
> > <property>
> >   <name>fs.s3a.endpoint</name>
> >   <description>AWS S3 endpoint to connect to. An up-to-date list is
> >     provided in the AWS Documentation: regions and endpoints. Without
> this
> >     property, the standard region (s3.amazonaws.com) is assumed.
> >   </description>
> > </property>
> >
> >
> > Thanks
> > Best Regards
> >
> > On Sun, Jul 19, 2015 at 9:13 PM, Schmirr Wurst <sc...@gmail.com>
> > wrote:
> >>
> >> I want to use pithos, were do I can specify that endpoint, is it
> >> possible in the url ?
> >>
> >> 2015-07-19 17:22 GMT+02:00 Akhil Das <ak...@sigmoidanalytics.com>:
> >> > Could you name the Storage service that you are using? Most of them
> >> > provides
> >> > a S3 like RestAPI endpoint for you to hit.
> >> >
> >> > Thanks
> >> > Best Regards
> >> >
> >> > On Fri, Jul 17, 2015 at 2:06 PM, Schmirr Wurst <
> schmirrwurst@gmail.com>
> >> > wrote:
> >> >>
> >> >> Hi,
> >> >>
> >> >> I wonder how to use S3 compatible Storage in Spark ?
> >> >> If I'm using s3n:// url schema, the it will point to amazon, is there
> >> >> a way I can specify the host somewhere ?
> >> >>
> >> >> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> >> >> For additional commands, e-mail: user-help@spark.apache.org
> >> >>
> >> >
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> >> For additional commands, e-mail: user-help@spark.apache.org
> >>
> >
>

Re: use S3-Compatible Storage with spark

Posted by Schmirr Wurst <sc...@gmail.com>.

I tried those 3 possibilities, and everything is working = endpoint param
is not working :
sc.hadoopConfiguration.set("s3service.s3-endpoint","test")
sc.hadoopConfiguration.set("fs.s3n.endpoint","test")
sc.hadoopConfiguration.set("fs.s3n.s3-endpoint","test")

2015-07-28 10:28 GMT+02:00 Akhil Das <ak...@sigmoidanalytics.com>:

> With s3n try this out:
>
> *s3service.s3-endpoint*The host name of the S3 service. You should only
> ever change this value from the default if you need to contact an
> alternative S3 endpoint for testing purposes.
> Default: s3.amazonaws.com
>
> Thanks
> Best Regards
>
> On Tue, Jul 28, 2015 at 1:54 PM, Schmirr Wurst <sc...@gmail.com>
> wrote:
>
>> Hi recompiled and retried, now its looking like this with s3a :
>> com.amazonaws.AmazonClientException: Unable to load AWS credentials
>> from any provider in the chain
>>
>> S3n is working find, (only problem is still the endpoint)
>>
>
>

Re: use S3-Compatible Storage with spark

Posted by Akhil Das <ak...@sigmoidanalytics.com>.

With s3n try this out:

*s3service.s3-endpoint*The host name of the S3 service. You should only
ever change this value from the default if you need to contact an
alternative S3 endpoint for testing purposes.
Default: s3.amazonaws.com

Thanks
Best Regards

On Tue, Jul 28, 2015 at 1:54 PM, Schmirr Wurst <sc...@gmail.com>
wrote:

> Hi recompiled and retried, now its looking like this with s3a :
> com.amazonaws.AmazonClientException: Unable to load AWS credentials
> from any provider in the chain
>
> S3n is working find, (only problem is still the endpoint)
>

Re: use S3-Compatible Storage with spark

Posted by Schmirr Wurst <sc...@gmail.com>.

Hi recompiled and retried, now its looking like this with s3a :
com.amazonaws.AmazonClientException: Unable to load AWS credentials
from any provider in the chain

S3n is working find, (only problem is still the endpoint)

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: use S3-Compatible Storage with spark

Posted by Akhil Das <ak...@sigmoidanalytics.com>.

That error is a jar conflict, you must be having multiple versions of
hadoop jar in the classpath. First you make sure you are able to access
your AWS S3 with s3a, then you give the endpoint configuration and try to
access the custom storage.

Thanks
Best Regards

On Mon, Jul 27, 2015 at 4:02 PM, Schmirr Wurst <sc...@gmail.com>
wrote:

> No with s3a, I have the following error :
> java.lang.NoSuchMethodError:
>
> com.amazonaws.services.s3.transfer.TransferManagerConfiguration.setMultipartUploadThreshold(I)V
> at
> org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:285)
>
> 2015-07-27 11:17 GMT+02:00 Akhil Das <ak...@sigmoidanalytics.com>:
> > So you are able to access your AWS S3 with s3a now? What is the error
> that
> > you are getting when you try to access the custom storage with
> > fs.s3a.endpoint?
> >
> > Thanks
> > Best Regards
> >
> > On Mon, Jul 27, 2015 at 2:44 PM, Schmirr Wurst <sc...@gmail.com>
> > wrote:
> >>
> >> I was able to access Amazon S3, but for some reason, the Endpoint
> >> parameter is ignored, and I'm not able to access to storage from my
> >> provider... :
> >>
> >> sc.hadoopConfiguration.set("fs.s3a.endpoint","test")
> >> sc.hadoopConfiguration.set("fs.s3a.awsAccessKeyId","")
> >> sc.hadoopConfiguration.set("fs.s3a.awsSecretAccessKey","")
> >>
> >> Any Idea why it doesn't work ?
> >>
> >> 2015-07-20 18:11 GMT+02:00 Schmirr Wurst <sc...@gmail.com>:
> >> > Thanks, that is what I was looking for...
> >> >
> >> > Any Idea where I have to store and reference the corresponding
> >> > hadoop-aws-2.6.0.jar ?:
> >> >
> >> > java.io.IOException: No FileSystem for scheme: s3n
> >> >
> >> > 2015-07-20 8:33 GMT+02:00 Akhil Das <ak...@sigmoidanalytics.com>:
> >> >> Not in the uri, but in the hadoop configuration you can specify it.
> >> >>
> >> >> <property>
> >> >>   <name>fs.s3a.endpoint</name>
> >> >>   <description>AWS S3 endpoint to connect to. An up-to-date list is
> >> >>     provided in the AWS Documentation: regions and endpoints. Without
> >> >> this
> >> >>     property, the standard region (s3.amazonaws.com) is assumed.
> >> >>   </description>
> >> >> </property>
> >> >>
> >> >>
> >> >> Thanks
> >> >> Best Regards
> >> >>
> >> >> On Sun, Jul 19, 2015 at 9:13 PM, Schmirr Wurst <
> schmirrwurst@gmail.com>
> >> >> wrote:
> >> >>>
> >> >>> I want to use pithos, were do I can specify that endpoint, is it
> >> >>> possible in the url ?
> >> >>>
> >> >>> 2015-07-19 17:22 GMT+02:00 Akhil Das <ak...@sigmoidanalytics.com>:
> >> >>> > Could you name the Storage service that you are using? Most of
> them
> >> >>> > provides
> >> >>> > a S3 like RestAPI endpoint for you to hit.
> >> >>> >
> >> >>> > Thanks
> >> >>> > Best Regards
> >> >>> >
> >> >>> > On Fri, Jul 17, 2015 at 2:06 PM, Schmirr Wurst
> >> >>> > <sc...@gmail.com>
> >> >>> > wrote:
> >> >>> >>
> >> >>> >> Hi,
> >> >>> >>
> >> >>> >> I wonder how to use S3 compatible Storage in Spark ?
> >> >>> >> If I'm using s3n:// url schema, the it will point to amazon, is
> >> >>> >> there
> >> >>> >> a way I can specify the host somewhere ?
> >> >>> >>
> >> >>> >>
> >> >>> >>
> ---------------------------------------------------------------------
> >> >>> >> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> >> >>> >> For additional commands, e-mail: user-help@spark.apache.org
> >> >>> >>
> >> >>> >
> >> >>>
> >> >>>
> ---------------------------------------------------------------------
> >> >>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> >> >>> For additional commands, e-mail: user-help@spark.apache.org
> >> >>>
> >> >>
> >
> >
>

Re: use S3-Compatible Storage with spark

Posted by Schmirr Wurst <sc...@gmail.com>.

No with s3a, I have the following error :
java.lang.NoSuchMethodError:
com.amazonaws.services.s3.transfer.TransferManagerConfiguration.setMultipartUploadThreshold(I)V
at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:285)

2015-07-27 11:17 GMT+02:00 Akhil Das <ak...@sigmoidanalytics.com>:
> So you are able to access your AWS S3 with s3a now? What is the error that
> you are getting when you try to access the custom storage with
> fs.s3a.endpoint?
>
> Thanks
> Best Regards
>
> On Mon, Jul 27, 2015 at 2:44 PM, Schmirr Wurst <sc...@gmail.com>
> wrote:
>>
>> I was able to access Amazon S3, but for some reason, the Endpoint
>> parameter is ignored, and I'm not able to access to storage from my
>> provider... :
>>
>> sc.hadoopConfiguration.set("fs.s3a.endpoint","test")
>> sc.hadoopConfiguration.set("fs.s3a.awsAccessKeyId","")
>> sc.hadoopConfiguration.set("fs.s3a.awsSecretAccessKey","")
>>
>> Any Idea why it doesn't work ?
>>
>> 2015-07-20 18:11 GMT+02:00 Schmirr Wurst <sc...@gmail.com>:
>> > Thanks, that is what I was looking for...
>> >
>> > Any Idea where I have to store and reference the corresponding
>> > hadoop-aws-2.6.0.jar ?:
>> >
>> > java.io.IOException: No FileSystem for scheme: s3n
>> >
>> > 2015-07-20 8:33 GMT+02:00 Akhil Das <ak...@sigmoidanalytics.com>:
>> >> Not in the uri, but in the hadoop configuration you can specify it.
>> >>
>> >> <property>
>> >>   <name>fs.s3a.endpoint</name>
>> >>   <description>AWS S3 endpoint to connect to. An up-to-date list is
>> >>     provided in the AWS Documentation: regions and endpoints. Without
>> >> this
>> >>     property, the standard region (s3.amazonaws.com) is assumed.
>> >>   </description>
>> >> </property>
>> >>
>> >>
>> >> Thanks
>> >> Best Regards
>> >>
>> >> On Sun, Jul 19, 2015 at 9:13 PM, Schmirr Wurst <sc...@gmail.com>
>> >> wrote:
>> >>>
>> >>> I want to use pithos, were do I can specify that endpoint, is it
>> >>> possible in the url ?
>> >>>
>> >>> 2015-07-19 17:22 GMT+02:00 Akhil Das <ak...@sigmoidanalytics.com>:
>> >>> > Could you name the Storage service that you are using? Most of them
>> >>> > provides
>> >>> > a S3 like RestAPI endpoint for you to hit.
>> >>> >
>> >>> > Thanks
>> >>> > Best Regards
>> >>> >
>> >>> > On Fri, Jul 17, 2015 at 2:06 PM, Schmirr Wurst
>> >>> > <sc...@gmail.com>
>> >>> > wrote:
>> >>> >>
>> >>> >> Hi,
>> >>> >>
>> >>> >> I wonder how to use S3 compatible Storage in Spark ?
>> >>> >> If I'm using s3n:// url schema, the it will point to amazon, is
>> >>> >> there
>> >>> >> a way I can specify the host somewhere ?
>> >>> >>
>> >>> >>
>> >>> >> ---------------------------------------------------------------------
>> >>> >> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> >>> >> For additional commands, e-mail: user-help@spark.apache.org
>> >>> >>
>> >>> >
>> >>>
>> >>> ---------------------------------------------------------------------
>> >>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> >>> For additional commands, e-mail: user-help@spark.apache.org
>> >>>
>> >>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: use S3-Compatible Storage with spark

Posted by Akhil Das <ak...@sigmoidanalytics.com>.

So you are able to access your AWS S3 with s3a now? What is the error that
you are getting when you try to access the custom storage with
fs.s3a.endpoint?

Thanks
Best Regards

On Mon, Jul 27, 2015 at 2:44 PM, Schmirr Wurst <sc...@gmail.com>
wrote:

> I was able to access Amazon S3, but for some reason, the Endpoint
> parameter is ignored, and I'm not able to access to storage from my
> provider... :
>
> sc.hadoopConfiguration.set("fs.s3a.endpoint","test")
> sc.hadoopConfiguration.set("fs.s3a.awsAccessKeyId","")
> sc.hadoopConfiguration.set("fs.s3a.awsSecretAccessKey","")
>
> Any Idea why it doesn't work ?
>
> 2015-07-20 18:11 GMT+02:00 Schmirr Wurst <sc...@gmail.com>:
> > Thanks, that is what I was looking for...
> >
> > Any Idea where I have to store and reference the corresponding
> > hadoop-aws-2.6.0.jar ?:
> >
> > java.io.IOException: No FileSystem for scheme: s3n
> >
> > 2015-07-20 8:33 GMT+02:00 Akhil Das <ak...@sigmoidanalytics.com>:
> >> Not in the uri, but in the hadoop configuration you can specify it.
> >>
> >> <property>
> >>   <name>fs.s3a.endpoint</name>
> >>   <description>AWS S3 endpoint to connect to. An up-to-date list is
> >>     provided in the AWS Documentation: regions and endpoints. Without
> this
> >>     property, the standard region (s3.amazonaws.com) is assumed.
> >>   </description>
> >> </property>
> >>
> >>
> >> Thanks
> >> Best Regards
> >>
> >> On Sun, Jul 19, 2015 at 9:13 PM, Schmirr Wurst <sc...@gmail.com>
> >> wrote:
> >>>
> >>> I want to use pithos, were do I can specify that endpoint, is it
> >>> possible in the url ?
> >>>
> >>> 2015-07-19 17:22 GMT+02:00 Akhil Das <ak...@sigmoidanalytics.com>:
> >>> > Could you name the Storage service that you are using? Most of them
> >>> > provides
> >>> > a S3 like RestAPI endpoint for you to hit.
> >>> >
> >>> > Thanks
> >>> > Best Regards
> >>> >
> >>> > On Fri, Jul 17, 2015 at 2:06 PM, Schmirr Wurst <
> schmirrwurst@gmail.com>
> >>> > wrote:
> >>> >>
> >>> >> Hi,
> >>> >>
> >>> >> I wonder how to use S3 compatible Storage in Spark ?
> >>> >> If I'm using s3n:// url schema, the it will point to amazon, is
> there
> >>> >> a way I can specify the host somewhere ?
> >>> >>
> >>> >>
> ---------------------------------------------------------------------
> >>> >> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> >>> >> For additional commands, e-mail: user-help@spark.apache.org
> >>> >>
> >>> >
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> >>> For additional commands, e-mail: user-help@spark.apache.org
> >>>
> >>
>

Re: use S3-Compatible Storage with spark

Posted by Schmirr Wurst <sc...@gmail.com>.

I was able to access Amazon S3, but for some reason, the Endpoint
parameter is ignored, and I'm not able to access to storage from my
provider... :

sc.hadoopConfiguration.set("fs.s3a.endpoint","test")
sc.hadoopConfiguration.set("fs.s3a.awsAccessKeyId","")
sc.hadoopConfiguration.set("fs.s3a.awsSecretAccessKey","")

Any Idea why it doesn't work ?

2015-07-20 18:11 GMT+02:00 Schmirr Wurst <sc...@gmail.com>:
> Thanks, that is what I was looking for...
>
> Any Idea where I have to store and reference the corresponding
> hadoop-aws-2.6.0.jar ?:
>
> java.io.IOException: No FileSystem for scheme: s3n
>
> 2015-07-20 8:33 GMT+02:00 Akhil Das <ak...@sigmoidanalytics.com>:
>> Not in the uri, but in the hadoop configuration you can specify it.
>>
>> <property>
>>   <name>fs.s3a.endpoint</name>
>>   <description>AWS S3 endpoint to connect to. An up-to-date list is
>>     provided in the AWS Documentation: regions and endpoints. Without this
>>     property, the standard region (s3.amazonaws.com) is assumed.
>>   </description>
>> </property>
>>
>>
>> Thanks
>> Best Regards
>>
>> On Sun, Jul 19, 2015 at 9:13 PM, Schmirr Wurst <sc...@gmail.com>
>> wrote:
>>>
>>> I want to use pithos, were do I can specify that endpoint, is it
>>> possible in the url ?
>>>
>>> 2015-07-19 17:22 GMT+02:00 Akhil Das <ak...@sigmoidanalytics.com>:
>>> > Could you name the Storage service that you are using? Most of them
>>> > provides
>>> > a S3 like RestAPI endpoint for you to hit.
>>> >
>>> > Thanks
>>> > Best Regards
>>> >
>>> > On Fri, Jul 17, 2015 at 2:06 PM, Schmirr Wurst <sc...@gmail.com>
>>> > wrote:
>>> >>
>>> >> Hi,
>>> >>
>>> >> I wonder how to use S3 compatible Storage in Spark ?
>>> >> If I'm using s3n:// url schema, the it will point to amazon, is there
>>> >> a way I can specify the host somewhere ?
>>> >>
>>> >> ---------------------------------------------------------------------
>>> >> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>> >> For additional commands, e-mail: user-help@spark.apache.org
>>> >>
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: user-help@spark.apache.org
>>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: use S3-Compatible Storage with spark

Posted by Schmirr Wurst <sc...@gmail.com>.

Thanks, that is what I was looking for...

Any Idea where I have to store and reference the corresponding
hadoop-aws-2.6.0.jar ?:

java.io.IOException: No FileSystem for scheme: s3n

2015-07-20 8:33 GMT+02:00 Akhil Das <ak...@sigmoidanalytics.com>:
> Not in the uri, but in the hadoop configuration you can specify it.
>
> <property>
>   <name>fs.s3a.endpoint</name>
>   <description>AWS S3 endpoint to connect to. An up-to-date list is
>     provided in the AWS Documentation: regions and endpoints. Without this
>     property, the standard region (s3.amazonaws.com) is assumed.
>   </description>
> </property>
>
>
> Thanks
> Best Regards
>
> On Sun, Jul 19, 2015 at 9:13 PM, Schmirr Wurst <sc...@gmail.com>
> wrote:
>>
>> I want to use pithos, were do I can specify that endpoint, is it
>> possible in the url ?
>>
>> 2015-07-19 17:22 GMT+02:00 Akhil Das <ak...@sigmoidanalytics.com>:
>> > Could you name the Storage service that you are using? Most of them
>> > provides
>> > a S3 like RestAPI endpoint for you to hit.
>> >
>> > Thanks
>> > Best Regards
>> >
>> > On Fri, Jul 17, 2015 at 2:06 PM, Schmirr Wurst <sc...@gmail.com>
>> > wrote:
>> >>
>> >> Hi,
>> >>
>> >> I wonder how to use S3 compatible Storage in Spark ?
>> >> If I'm using s3n:// url schema, the it will point to amazon, is there
>> >> a way I can specify the host somewhere ?
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> >> For additional commands, e-mail: user-help@spark.apache.org
>> >>
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: use S3-Compatible Storage with spark

Posted by Akhil Das <ak...@sigmoidanalytics.com>.

Not in the uri, but in the hadoop configuration you can specify it.

<property>
  <name>fs.s3a.endpoint</name>
  <description>AWS S3 endpoint to connect to. An up-to-date list is
    provided in the AWS Documentation: regions and endpoints. Without this
    property, the standard region (s3.amazonaws.com) is assumed.
  </description>
</property>


Thanks
Best Regards

On Sun, Jul 19, 2015 at 9:13 PM, Schmirr Wurst <sc...@gmail.com>
wrote:

> I want to use pithos, were do I can specify that endpoint, is it
> possible in the url ?
>
> 2015-07-19 17:22 GMT+02:00 Akhil Das <ak...@sigmoidanalytics.com>:
> > Could you name the Storage service that you are using? Most of them
> provides
> > a S3 like RestAPI endpoint for you to hit.
> >
> > Thanks
> > Best Regards
> >
> > On Fri, Jul 17, 2015 at 2:06 PM, Schmirr Wurst <sc...@gmail.com>
> > wrote:
> >>
> >> Hi,
> >>
> >> I wonder how to use S3 compatible Storage in Spark ?
> >> If I'm using s3n:// url schema, the it will point to amazon, is there
> >> a way I can specify the host somewhere ?
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> >> For additional commands, e-mail: user-help@spark.apache.org
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Fwd: use S3-Compatible Storage with spark

Posted by Schmirr Wurst <sc...@gmail.com>.

I want to use pithos, were do I can specify that endpoint, is it
possible in the url ?

2015-07-19 17:22 GMT+02:00 Akhil Das <ak...@sigmoidanalytics.com>:
> Could you name the Storage service that you are using? Most of them provides
> a S3 like RestAPI endpoint for you to hit.
>
> Thanks
> Best Regards
>
> On Fri, Jul 17, 2015 at 2:06 PM, Schmirr Wurst <sc...@gmail.com>
> wrote:
>>
>> Hi,
>>
>> I wonder how to use S3 compatible Storage in Spark ?
>> If I'm using s3n:// url schema, the it will point to amazon, is there
>> a way I can specify the host somewhere ?
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: use S3-Compatible Storage with spark

Posted by Akhil Das <ak...@sigmoidanalytics.com>.

Could you name the Storage service that you are using? Most of them
provides a S3 like RestAPI endpoint for you to hit.

Thanks
Best Regards

On Fri, Jul 17, 2015 at 2:06 PM, Schmirr Wurst <sc...@gmail.com>
wrote:

> Hi,
>
> I wonder how to use S3 compatible Storage in Spark ?
> If I'm using s3n:// url schema, the it will point to amazon, is there
> a way I can specify the host somewhere ?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>