You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by blazespinnaker <bl...@gmail.com> on 2016/11/04 06:41:19 UTC

sanboxing spark executors

Is there a good method / discussion / documentation on how to sandbox a spark
executor?   Assume the code is untrusted and you don't want it to be able to
make un validated network connections or do unvalidated alluxio/hdfs/file
io.




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/sanboxing-spark-executors-tp28014.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: sanboxing spark executors

Posted by Michael Gummelt <mg...@mesosphere.io>.
Mesos will let you run in docker containers, so you get filesystem
isolation, and we're about to merge CNI support:
https://github.com/apache/spark/pull/15740, which would allow you to set up
network policies.  Though you might be able to achieve whatever network
isolation you need without CNI, depending on your requirements.

As far as unauthenticated HDFS clusters, I would recommend against running
untrusted code on the same network as your secure HDFS cluster.

On Fri, Nov 4, 2016 at 4:13 PM, blazespinnaker <bl...@gmail.com>
wrote:

> In particular, we need to make sure the RDDs execute the lambda functions
> securely as they are provided by user code.
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/sanboxing-spark-executors-tp28014p28024.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>


-- 
Michael Gummelt
Software Engineer
Mesosphere

Re: sanboxing spark executors

Posted by Michael Segel <ms...@hotmail.com>.
Not that easy of a problem to solve… 

Can you impersonate the user who provided the code? 

I mean if Joe provides the lambda function, then it runs as Joe so it has joe’s permissions. 

Steve is right, you’d have to get down to your cluster’s security and authenticate the user before accepting the lambda code. You may also want to run with a restricted subset of permissions. 
(e.g. Joe is an admin, but he wants it to run as if its an untrusted user… this gets a bit more interesting.) 

And this beg’s the question… 

How are you sharing your RDDs across multiple users?  This too opens up a security question or two… 



> On Nov 4, 2016, at 6:13 PM, blazespinnaker <bl...@gmail.com> wrote:
> 
> In particular, we need to make sure the RDDs execute the lambda functions
> securely as they are provided by user code.
> 
> 
> 
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/sanboxing-spark-executors-tp28014p28024.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
> 


Re: sanboxing spark executors

Posted by blazespinnaker <bl...@gmail.com>.
In particular, we need to make sure the RDDs execute the lambda functions
securely as they are provided by user code.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/sanboxing-spark-executors-tp28014p28024.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: sanboxing spark executors

Posted by Calvin Jia <ji...@gmail.com>.
Hi,

If you are using the latest Alluxio release (1.3.0), authorization is
enabled, preventing users from accessing data they do not have permissions
to. For older versions, you will need to enable the security flag. The
documentation
on security <http://www.alluxio.org/docs/master/en/Security.html> has more
details.

Hope this helps,
Calvin

On Fri, Nov 4, 2016 at 6:31 AM, Andrew Holway <
andrew.holway@otternetworks.de> wrote:

> I think running it on a Mesos cluster could give you better control over
> this kinda stuff.
>
>
> On Fri, Nov 4, 2016 at 7:41 AM, blazespinnaker <bl...@gmail.com>
> wrote:
>
>> Is there a good method / discussion / documentation on how to sandbox a
>> spark
>> executor?   Assume the code is untrusted and you don't want it to be able
>> to
>> make un validated network connections or do unvalidated alluxio/hdfs/file
>> io.
>>
>>
>>
>>
>> --
>> View this message in context: http://apache-spark-user-list.
>> 1001560.n3.nabble.com/sanboxing-spark-executors-tp28014.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>
>>
>
>
> --
> Otter Networks UG
> http://otternetworks.de
> Gotenstraße 17
> 10829 Berlin
>

Re: sanboxing spark executors

Posted by Andrew Holway <an...@otternetworks.de>.
I think running it on a Mesos cluster could give you better control over
this kinda stuff.


On Fri, Nov 4, 2016 at 7:41 AM, blazespinnaker <bl...@gmail.com>
wrote:

> Is there a good method / discussion / documentation on how to sandbox a
> spark
> executor?   Assume the code is untrusted and you don't want it to be able
> to
> make un validated network connections or do unvalidated alluxio/hdfs/file
> io.
>
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/sanboxing-spark-executors-tp28014.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>


-- 
Otter Networks UG
http://otternetworks.de
Gotenstraße 17
10829 Berlin

Re: sanboxing spark executors

Posted by Steve Loughran <st...@hortonworks.com>.
> On 4 Nov 2016, at 06:41, blazespinnaker <bl...@gmail.com> wrote:
> 
> Is there a good method / discussion / documentation on how to sandbox a spark
> executor?   Assume the code is untrusted and you don't want it to be able to
> make un validated network connections or do unvalidated alluxio/hdfs/file


use Kerberos to auth HDFS connections, HBase, Hive. When enabled spark processes (all yarn processes) will run as different users in the cluster for isolation there too.

no easy way to monitor/block general outbound network connections though.  

> io.
> 
> 
> 
> 
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/sanboxing-spark-executors-tp28014.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org