You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Michael Segel <ms...@hotmail.com> on 2016/04/01 14:23:11 UTC

Re: Any documentation on Spark's security model beyond YARN?

Guys, 

Getting a bit off topic.  

Saying Security and HBase in the same sentence is a bit of a joke until HBase rejiggers its co-processers. Although’s Andrew’s fix could be enough to keep CSOs and their minions happy.

The larger picture is that Security has to stop being a ‘second thought’.  Once you start getting in to restricted and highly restricted data, you will have issues and anything you can do to stop leakage or the potential of leakage would be great. 

Getting back to spark specifically, you have components like the Thrift Service which can persist RDDs and I don’t see any restrictions on access. 

Does this mean integration w Ranger or Sentry? Does it mean rolling a separate solution? 

And if you’re going to look at Thrift, do you want to look at other potential areas as well? 

Please note: This may all be for nothing. It may be just having the discussion and coming to a conclusion as to the potential risks and how to mitigate is enough. 

Thx

-Mike

> On Mar 31, 2016, at 6:32 AM, Steve Loughran <st...@hortonworks.com> wrote:
> 
>> 
>> On 30 Mar 2016, at 21:02, Sean Busbey <bu...@cloudera.com> wrote:
>> 
>> On Wed, Mar 30, 2016 at 4:33 AM, Steve Loughran <st...@hortonworks.com> wrote:
>>> 
>>>> On 29 Mar 2016, at 22:19, Michael Segel <ms...@hotmail.com> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> So yeah, I know that Spark jobs running on a Hadoop cluster will inherit its security from the underlying YARN job.
>>>> However… that’s not really saying much when you think about some use cases.
>>>> 
>>>> Like using the thrift service …
>>>> 
>>>> I’m wondering what else is new and what people have been thinking about how to enhance spark’s security.
>>>> 
>>> 
>>> Been thinking a bit.
>>> 
>>> One thing to look at is renewal of hbase and hive tokens on long-lived services, alongside hdfs
>>> 
>>> 
>> 
>> I've been looking at this as well. The current work-around I'm using
>> is to use keytab logins on the executors, which is less than
>> desirable.
> 
> 
> OK, let's work together on this ... the current spark renewal code assumes its only for HDFS (indeed, that the filesystem is HDFS and therefore the #of tokens > 0); there' s no fundamental reason why the code in YarnSparkHadoopUtils can't run in the AM too.
> 
>> 
>> Since the HBase project maintains Spark integration points, it'd be
>> great if there were just a hook for services to provide "here's how to
>> renew" to a common renewal service.
>> 
> 
> 1. Wittenauer is doing some work on a tool for doing this; I'm pushing for it to be a fairly generic API. Even if Spark has to use reflection to get at it, at least it would be consistent across services. See https://issues.apache.org/jira/browse/HADOOP-12563 <https://issues.apache.org/jira/browse/HADOOP-12563>
> 
> 2. The topic of HTTPS based acquisition/use of HDFS tokens has arisen elsewhere; needed for long-haul job submission when  you don' t have a keytab to hand. This could be useful as it'd avoid actually needing hbase-*.jar on the classpath at submit time.
> 
> 
>> 
>> 
>> -- 
>> busbey
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org <ma...@spark.apache.org>
>> For additional commands, e-mail: dev-help@spark.apache.org <ma...@spark.apache.org>
>> 
>> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org <ma...@spark.apache.org>
> For additional commands, e-mail: dev-help@spark.apache.org <ma...@spark.apache.org>