You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by Elliot West <te...@gmail.com> on 2016/05/03 14:41:21 UTC

Securing secrets for S3 FileSystems in DistCp

Hello,

We're currently using DistCp and S3 FileSystems to move data from a vanilla
Apache Hadoop cluster to S3. We've been concerned about exposing our AWS
secrets on our shared, on-premise cluster. As  a work-around we've patched
DistCp to load these secrets from a JCEKS keystore. This seems to work
quite well, however we're not comfortable on relying on a DistCp fork.

What is the usual approach to achieve this with DistCp and is there a
feature or practice that we've overlooked? If not, might there be value in
us raising a JIRA ticket and submitting a patch for DistCp to include this
secure keystore functionality?

Thanks - Elliot.

Re: Securing secrets for S3 FileSystems in DistCp

Posted by Elliot West <te...@gmail.com>.

Hi Larry,

Thank you for the JIRA link and description. This is appears to be very
relevant to what we're trying to achieve. I'll have a read and try it out.

Elliot.


On 3 May 2016 at 14:09, Larry McCay <lm...@hortonworks.com> wrote:

> Hi Elliot -
>
> You may find the following patch interesting:
> https://issues.apache.org/jira/browse/HADOOP-12548
>
> This enables the use of the Credential Provider API to protect secrets for
> the s3a filesystem.
> The design document attached to it describes how to use it.
>
> If you are not using s3a, there is similar support for the credential
> provider API in s3 and s3n but there slight differences in the processing.
> S3a is considered the strategic filesystem for accessing s3 - as far as I
> can tell.
>
> Hope this is helpful.
>
> —larry
>
> On May 3, 2016, at 8:41 AM, Elliot West <te...@gmail.com> wrote:
>
> Hello,
>
> We're currently using DistCp and S3 FileSystems to move data from a
> vanilla Apache Hadoop cluster to S3. We've been concerned about exposing
> our AWS secrets on our shared, on-premise cluster. As  a work-around we've
> patched DistCp to load these secrets from a JCEKS keystore. This seems to
> work quite well, however we're not comfortable on relying on a DistCp fork.
>
> What is the usual approach to achieve this with DistCp and is there a
> feature or practice that we've overlooked? If not, might there be value in
> us raising a JIRA ticket and submitting a patch for DistCp to include this
> secure keystore functionality?
>
> Thanks - Elliot.
>
>
>

How to know when the sort phase starts

Posted by siscia <si...@yahoo.com.INVALID>.

Hi all,

for research purpose (we are working to know the completion time of an 
hadoop computation, if you are interested feel free to shoot me an 
email) I want to know when the sort phase starts for every reducers.

Without writing any code is possible to know when the sort phase start?
This information is logged anywhere?

I tried to look into the standard hadoop logs but I haven't found 
anything; it is possible that the information is actually there, but 
given the mole of noise I wasn't able to find it.

Thanks for your help

Simone



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: user-help@hadoop.apache.org

How to know when the sort phase starts

Posted by siscia <si...@yahoo.com.INVALID>.

Hi all,

for research purpose (we are working to know the completion time of an 
hadoop computation, if you are interested feel free to shoot me an 
email) I want to know when the sort phase starts for every reducers.

Without writing any code is possible to know when the sort phase start?
This information is logged anywhere?

I tried to look into the standard hadoop logs but I haven't found 
anything; it is possible that the information is actually there, but 
given the mole of noise I wasn't able to find it.

Thanks for your help

Simone

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: user-help@hadoop.apache.org

Re: Securing secrets for S3 FileSystems in DistCp

Posted by Larry McCay <lm...@hortonworks.com>.

Hi Elliot -

You may find the following patch interesting: https://issues.apache.org/jira/browse/HADOOP-12548

This enables the use of the Credential Provider API to protect secrets for the s3a filesystem.
The design document attached to it describes how to use it.

If you are not using s3a, there is similar support for the credential provider API in s3 and s3n but there slight differences in the processing.
S3a is considered the strategic filesystem for accessing s3 - as far as I can tell.

Hope this is helpful.

—larry

On May 3, 2016, at 8:41 AM, Elliot West <te...@gmail.com>> wrote:

Hello,

We're currently using DistCp and S3 FileSystems to move data from a vanilla Apache Hadoop cluster to S3. We've been concerned about exposing our AWS secrets on our shared, on-premise cluster. As  a work-around we've patched DistCp to load these secrets from a JCEKS keystore. This seems to work quite well, however we're not comfortable on relying on a DistCp fork.

What is the usual approach to achieve this with DistCp and is there a feature or practice that we've overlooked? If not, might there be value in us raising a JIRA ticket and submitting a patch for DistCp to include this secure keystore functionality?

Thanks - Elliot.

Re: Securing secrets for S3 FileSystems in DistCp

Posted by Elliot West <te...@gmail.com>.

Thanks for your reply.

We have IAM users, each with their own sets of keys. Could you explain how
I can use roles in this situation?

Elliot.

On 3 May 2016 at 13:46, Shekhar Sharma <sh...@gmail.com> wrote:

> Have u used  IAM (identity  access management ) roles ?
> On 3 May 2016 18:11, "Elliot West" <te...@gmail.com> wrote:
>
>> Hello,
>>
>> We're currently using DistCp and S3 FileSystems to move data from a
>> vanilla Apache Hadoop cluster to S3. We've been concerned about exposing
>> our AWS secrets on our shared, on-premise cluster. As  a work-around we've
>> patched DistCp to load these secrets from a JCEKS keystore. This seems to
>> work quite well, however we're not comfortable on relying on a DistCp fork.
>>
>> What is the usual approach to achieve this with DistCp and is there a
>> feature or practice that we've overlooked? If not, might there be value in
>> us raising a JIRA ticket and submitting a patch for DistCp to include this
>> secure keystore functionality?
>>
>> Thanks - Elliot.
>>
>

Re: Securing secrets for S3 FileSystems in DistCp

Posted by Shekhar Sharma <sh...@gmail.com>.

Have u used  IAM (identity  access management ) roles ?
On 3 May 2016 18:11, "Elliot West" <te...@gmail.com> wrote:

> Hello,
>
> We're currently using DistCp and S3 FileSystems to move data from a
> vanilla Apache Hadoop cluster to S3. We've been concerned about exposing
> our AWS secrets on our shared, on-premise cluster. As  a work-around we've
> patched DistCp to load these secrets from a JCEKS keystore. This seems to
> work quite well, however we're not comfortable on relying on a DistCp fork.
>
> What is the usual approach to achieve this with DistCp and is there a
> feature or practice that we've overlooked? If not, might there be value in
> us raising a JIRA ticket and submitting a patch for DistCp to include this
> secure keystore functionality?
>
> Thanks - Elliot.
>