You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Oleksandr Kalinin <al...@gmail.com> on 2018/08/13 16:02:04 UTC

Drillbit client connect authorization

Hello Drill community,

In multi-tenant YARN clusters, running multiple Drill-on-YARN clusters
seems as attractive feature as it enables leveraging on YARN mechanisms of
resource management and isolation. However, there seems to be simple access
restriction issue. Assume :

- Cluster A launched by user X
- Cluster B launched by user Y

Both users X and Y will be able to connect and run queries against clusters
A and B (in fact, that applies to any positively authenticated user, not
only X and Y). Whereas we obviously would like to ensure exclusive usage of
clusters by their owners - who are owners of respective YARN resources. In
case users X and Y are non-privileged DFS users and impersonation is not
enabled, then user A has access to data on behalf of user B and vice versa
which is additional potential security issue.

I was looking for possibilities to control connect authorization, but
couldn't find anything related yet. Do I miss something maybe? Are there
any other considerations, perhaps this point was already discussed before?

It could be possible to tweak PAM setup to trigger authentication failure
for "undesired" users but that looks like an overkill in terms of
complexity.

From user perspective, basic ACL configuration with users and groups
authorized to connect to Drillbit would already be sufficient IMO. Or
configuration switch to ensure that only owner user is authorized to
connect.

Best Regards,
Alex

Re: Drillbit client connect authorization

Posted by Keys Botzum <kb...@mapr.com>.
I think I know the problem but I am guessing.

When YARN jobs are launched the MapR client runtime in YARN authenticates to the RM using the current ticket. That results in the YARN RM knowing you are "foo' in this case. When the actual containers are launched they are started with a freshly generated ticket that is for the user that launched the job. That ticket is the original ticket, but rather a ticket generated on the fly for the job. The intent is that ticket attributes are copied but my bet is that constrained impersonation attributes got lost in the copy. What was copied was "can impersonate." I vaguely remember a defect in this area but my memory is fuzzy.

Please send me privately the support case information. I will contact support directly.

Keys
_______________________________
Keys Botzum 
Distinguished Engineer, Field Engineering
kbotzum@mapr.com
443-718-0098
MapR Technologies 
http://www.mapr.com

> On Aug 21, 2018, at 1:34 PM, Oleksandr Kalinin <al...@gmail.com> wrote:
> 
> Hi Keys,
> 
> Thanks for your reply. Neither I want to make this conversation specific to
> environment/vendor, so ready to go off the list any time as soon as anyone
> signals.
> 
> Thanks for clarifying item (2).
> 
> For item (1) yes we did check ticket contents with maprlogin print, it is
> correct (listing UID N and GID K). We will try with GID only, although I
> don't see anything wrong with inclusion of UID (user 'foo' impersonating
> user 'foo' should work :-))
> 
> We are sure that we launch Drill-on-YARN application with correct ticket.
> That is evident by the fact that Drillbit runs as ticket user 'foo' whereas
> we launch the application from private account shell session. But indeed we
> are not sure if / how MapR ticket credentials get passed along with YARN
> delegation tokens all the way down to Drillbit spawned by the YARN
> container and if such trick is actually supported at all? This is why
> earlier in this thread I mentioned that we are not sure if this idea is
> workable at all. Nevertheless, regardless of the ticket, we found it very
> surprising that impersonation actually seems to work for any user, also
> outside of 'bar' group, even though Drillbit process UID is 'foo' and 'foo'
> is not a privileged MapR user. This looks like additional issue and we will
> be debugging further into it. Of course, any suggestions or hints on this
> would be much appreciated.
> 
> Best Regards,
> Alex
> 
> 
> On Tue, Aug 21, 2018 at 6:56 PM Keys Botzum <kb...@mapr.com> wrote:
> 
>> Alex,
>> 
>> Obviously I don't want this conversation to sound too much like a vendor
>> conversation but I do want to be helpful. If folks think this is too vendor
>> specific I'm happy to take the conversation off list but others that are
>> using Drill on MapR might benefit here as well.
>> 
>> This is helpful. Let me take the easy question first. (2) is not working
>> because the POSIX client is not designed to work with constrained
>> impersonation tickets. This is a case of works as design. There is an
>> internal enhancement bug to address that for the FUSE version of the POSIX
>> client. If support isn't familiar, please tell them to look at internal
>> bugzilla bug #31117. If there is further confusion, please ask them to talk
>> to me.
>> 
>> Regarding (1), something isn't quite right here. In your generateticket
>> command you should not need to specify the -impersonateduids as that is
>> saying that the ticket can impersonate the user N which seems unrelated to
>> your needs. The -impersonatedgids K seems like the right thing to specify.
>> After you ran that command did you look at the output of maprlogin print to
>> ensure the ticket looks correct? More importantly are you sure Drill is
>> actually using that ticket? Given the behavior described I suspect Drill is
>> using another ticket. How did you configure Drill to use this ticket? My
>> suspicion is that Drill is still using the 'mapr' ticket in
>> /opt/mapr/conf/mapruserticket.
>> 
>> Keys
>> _______________________________
>> Keys Botzum
>> MapR Technologies
>> http://www.mapr.com
>> 
>>> On Aug 21, 2018, at 12:45 PM, Oleksandr Kalinin <al...@gmail.com>
>> wrote:
>>> 
>>> Hi Keys,
>>> 
>>> Assume we want to :
>>> - Run Drill cluster on YARN as user 'foo' (UID = N)
>>> - Authorize all users in group 'bar' (GID = K) for running Drill queries
>> on
>>> that cluster with impersonation enabled
>>> - All other users should be able to connect to the cluster, but their
>>> queries should fail with impersonation failure
>>> 
>>> We expected (wrongly?) that launching Drill cluster on YARN with
>> following
>>> MapR ticket would be suitable :
>>> 
>>> $ maprlogin generateticket -type servicewithimpersonation -user foo -out
>>> foo.ticket  -duration x:0:0 -impersonateduids N  -impersonatedgids K
>>> 
>>> However, we seem to have 2 issues :
>>> 
>>> 1. When accessing Drill cluster launched on YARN with above ticket, and
>>> even though 'foo' is non-privileged user, impersonation seems to work for
>>> users outside of 'bar' group(!)
>>> - we are currently puzzled by this behavior and continue to dig into the
>>> issue hoping that something is wrong with our test
>>> 
>>> 2. When using above ticket with another impersonating service - loopback
>>> NFS client - we observe that service does not perform expected
>>> impersonation. It only works for user 'foo'. Any other user using the
>>> service gets FS permission denied error. This is the issue I raise to
>> MapR
>>> already.
>>> 
>>> Thanks,
>>> Best Regards,
>>> Alex
>>> 
>>> On Tue, Aug 21, 2018 at 6:24 PM Keys Botzum <kb...@mapr.com> wrote:
>>> 
>>>> Can you comment on what isn't working with MapR in this scenario? I'm
>>>> familiar with impersonation tickets and constrained impersonation.
>>>> 
>>>> That said, I do agree that a general purpose feature in Drill that
>> allows
>>>> one to constrain who can issue queries seems useful.
>>>> 
>>>> Keys
>>>> _______________________________
>>>> Keys Botzum
>>>> MapR Technologies
>>>> http://www.mapr.com
>>>> 
>>>>> On Aug 21, 2018, at 3:47 AM, Joel Pfaff <jo...@gmail.com> wrote:
>>>>> 
>>>>> Hello,
>>>>> 
>>>>> "Unfortunately I have not used the setup described above but from
>>>>> explanation looks like the impersonation tickets will be used by
>>>> Drillbit's
>>>>> on Tenant A to restrict the MapR platform access by a limited set of
>>>>> Drillbit authenticated user. Using this any user in Tenant B will not
>> be
>>>>> able to execute query on Tenant A even though it can be authenticated
>>>>> successfully by the Drillbit in Tenant A. This way authorization check
>> is
>>>>> done at data layer."
>>>>> 
>>>>> Unfortunately, the tests we have done so far do not confirm this
>> expected
>>>>> behavior.
>>>>> That's why Alex opened a ticket for an Authorization framework :
>>>>> 
>>>> 
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_DRILL-2D6699&d=DwIBaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=GqmpS_1AHD_cvgkumRuDkBtRTvUsIvfjVomAQtdhBks&m=th4RzorF4mYi7oPGaRMacJVgsQwPrqO3721YuREqjM8&s=I9DqH7uLEEdgnaHNGN7zBJxfc5dtbDjJ09mLgcJdVB8&e=
>>>>> 
>>>>> We have also opened a ticket to MapR to clarify the expected behavior
>> of
>>>>> impersonation tickets with group restrictions.
>>>>> 
>>>>> Regards, Joel
>>>>> 
>>>>> On Sun, Aug 19, 2018 at 9:21 PM Oleksandr Kalinin <al...@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> Hi Sorabh,
>>>>>> 
>>>>>> In case of Hive, user connects to Hive server. Launching the query
>>>> launches
>>>>>> YARN application - each query is YARN application. To make sure that
>>>> query
>>>>>> uses YARN cluster resources launching user is authorized to use, YARN
>>>>>> authorization kicks in - e.g. YARN queue ACLs - mechanism a bit
>> similar
>>>> to
>>>>>> the one proposed in this thread. Once application is running,
>>>> impersonation
>>>>>> and data (FS) level authorization do the rest of the job like you say
>> -
>>>>>> that is indeed the key.
>>>>>> 
>>>>>> We use the same authorization model for Spark - to run Spark job, user
>>>> must
>>>>>> launch it as YARN application on specific YARN resource protected by
>>>> YARN
>>>>>> authorization, with impersonation and FS level authorization following
>>>> once
>>>>>> the job is running.
>>>>>> 
>>>>>> In case of Drill on YARN, user connects to Drill cluster which is
>>>> *already*
>>>>>> running as YARN application. Thus exposing that Drill cluster to any
>>>> user
>>>>>> in the entire YARN cluster we expose YARN resources users might be not
>>>>>> authorized to use. That is main issue we are trying to solve.
>>>>>> 
>>>>>> Hope this makes it clearer.
>>>>>> 
>>>>>> Best Regards,
>>>>>> Alex
>>>>>> 
>>>>>> 
>>>>>> On Fri, Aug 17, 2018 at 11:57 PM, Sorabh Hamirwasia <
>>>> shamirwasia@mapr.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi Joel/Alex,
>>>>>>> Thanks for explaining the use case with multi tenant cluster.
>>>>>>> 
>>>>>>> @Joel
>>>>>>> Unfortunately I have not used the setup described above but from
>>>>>>> explanation looks like the impersonation tickets will be used by
>>>>>> Drillbit's
>>>>>>> on Tenant A to restrict the MapR platform access by a limited set of
>>>>>>> Drillbit authenticated user. Using this any user in Tenant B will not
>>>> be
>>>>>>> able to execute query on Tenant A even though it can be authenticated
>>>>>>> successfully by the Drillbit in Tenant A. This way authorization
>> check
>>>> is
>>>>>>> done at data layer.
>>>>>>> 
>>>>>>> @Alex,
>>>>>>> Adding an authorization check for a valid authenticated cluster user
>>>>>>> shouldn't be a big change. Based on a configured set's of
>> users/group a
>>>>>>> subset of cluster user can be allowed to connect. But can you please
>>>>>> point
>>>>>>> to how other services do these authorization checks when running in
>>>> multi
>>>>>>> tenant environment ? Based on my understanding all these
>> authorization
>>>>>>> check in Hadoop system are done at data layer or they have a separate
>>>>>>> security service which does these checks along with other security
>>>> checks
>>>>>>> for authentication, etc.
>>>>>>> 
>>>>>>> Also please feel free to open a JIRA ticket with details.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Sorabh
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Fri, Aug 17, 2018 at 11:21 AM, Oleksandr Kalinin <
>>>> alexka79@gmail.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi Sorabh,
>>>>>>>> 
>>>>>>>> Thanks for you comments. Joel described in detail our current
>> thinking
>>>>>> on
>>>>>>>> how to overcome the issue. We are not yet 100% sure if it will
>>>> actually
>>>>>>>> work though.
>>>>>>>> 
>>>>>>>> Actually I raised this topic in this mailing list because I think
>> it's
>>>>>>> not
>>>>>>>> only specific to our setup. It's more about having nice "Drill on
>>>> YARN"
>>>>>>>> feature with very limited (frankly, no) access control which almost
>>>>>> makes
>>>>>>>> the feature unusable in environments where it is attractive - multi
>>>>>>> tenant
>>>>>>>> secure clusters. Supported security mechanisms are good for
>>>>>>> authentication,
>>>>>>>> but using them for authorization seems suboptimal. Typically, YARN
>>>>>>> clusters
>>>>>>>> run in single Kerberos realm and the need to introduce multiple
>> realms
>>>>>>> and
>>>>>>>> separate identities for Drill service is not at all convenient (I am
>>>>>>> pretty
>>>>>>>> sure that in many environments like ours it is a no go). And how
>> about
>>>>>>> use
>>>>>>>> cases with no Kerberos setup? If we can workaround access control by
>>>>>>>> MapR-specific security tickets like described by Joel - good for us,
>>>>>> but
>>>>>>>> what about other environments?
>>>>>>>> 
>>>>>>>> So the question is more whether it make sense to consider
>> introducing
>>>>>>> user
>>>>>>>> authorization feature. This thread refers only to session
>>>> authorization
>>>>>>> to
>>>>>>>> complement YARN feature, but it could be extendable of course, e.g.
>> in
>>>>>>>> similar ways like Drill already supports multiple authentication
>>>>>>>> mechanisms.
>>>>>>>> 
>>>>>>>> Thanks & Best Regards,
>>>>>>>> Alex
>>>>>>>> 
>>>>>>>> On Wed, Aug 15, 2018 at 11:30 PM, Sorabh Hamirwasia <
>>>>>>> shamirwasia@mapr.com>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi Oleksandr,
>>>>>>>>> Drill doesn't do any user management in itself, instead relies on
>> the
>>>>>>>>> corresponding security mechanisms in use to do it. It uses SASL
>>>>>>> framework
>>>>>>>>> to allow using different pluggable security mechanisms. So it
>> should
>>>>>> be
>>>>>>>>> upon the security mechanism in use to do the authorization level
>>>>>>> checks.
>>>>>>>>> For example in your use case if you want to allow only certain
>> set's
>>>>>> of
>>>>>>>>> users to connect to a cluster then you can choose to use Kerberos
>>>>>> with
>>>>>>>> each
>>>>>>>>> cluster running in different realms. This will ensure client users
>>>>>>>> running
>>>>>>>>> in corresponding realm can only connect to cluster running in that
>>>>>>> realm.
>>>>>>>>> 
>>>>>>>>> For the impersonation issue I think it's a configuration issue and
>>>>>> the
>>>>>>>>> behavior is expected where all queries whether from user A or B are
>>>>>>>>> executed as admin users.
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Sorabh
>>>>>>>>> 
>>>>>>>>> On Mon, Aug 13, 2018 at 9:02 AM, Oleksandr Kalinin <
>>>>>> alexka79@gmail.com
>>>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Hello Drill community,
>>>>>>>>>> 
>>>>>>>>>> In multi-tenant YARN clusters, running multiple Drill-on-YARN
>>>>>>> clusters
>>>>>>>>>> seems as attractive feature as it enables leveraging on YARN
>>>>>>> mechanisms
>>>>>>>>> of
>>>>>>>>>> resource management and isolation. However, there seems to be
>>>>>> simple
>>>>>>>>> access
>>>>>>>>>> restriction issue. Assume :
>>>>>>>>>> 
>>>>>>>>>> - Cluster A launched by user X
>>>>>>>>>> - Cluster B launched by user Y
>>>>>>>>>> 
>>>>>>>>>> Both users X and Y will be able to connect and run queries against
>>>>>>>>> clusters
>>>>>>>>>> A and B (in fact, that applies to any positively authenticated
>>>>>> user,
>>>>>>>> not
>>>>>>>>>> only X and Y). Whereas we obviously would like to ensure exclusive
>>>>>>>> usage
>>>>>>>>> of
>>>>>>>>>> clusters by their owners - who are owners of respective YARN
>>>>>>> resources.
>>>>>>>>> In
>>>>>>>>>> case users X and Y are non-privileged DFS users and impersonation
>>>>>> is
>>>>>>>> not
>>>>>>>>>> enabled, then user A has access to data on behalf of user B and
>>>>>> vice
>>>>>>>>> versa
>>>>>>>>>> which is additional potential security issue.
>>>>>>>>>> 
>>>>>>>>>> I was looking for possibilities to control connect authorization,
>>>>>> but
>>>>>>>>>> couldn't find anything related yet. Do I miss something maybe? Are
>>>>>>>> there
>>>>>>>>>> any other considerations, perhaps this point was already discussed
>>>>>>>>> before?
>>>>>>>>>> 
>>>>>>>>>> It could be possible to tweak PAM setup to trigger authentication
>>>>>>>> failure
>>>>>>>>>> for "undesired" users but that looks like an overkill in terms of
>>>>>>>>>> complexity.
>>>>>>>>>> 
>>>>>>>>>> From user perspective, basic ACL configuration with users and
>>>>>> groups
>>>>>>>>>> authorized to connect to Drillbit would already be sufficient IMO.
>>>>>> Or
>>>>>>>>>> configuration switch to ensure that only owner user is authorized
>>>>>> to
>>>>>>>>>> connect.
>>>>>>>>>> 
>>>>>>>>>> Best Regards,
>>>>>>>>>> Alex
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>> 


Re: Drillbit client connect authorization

Posted by Oleksandr Kalinin <al...@gmail.com>.
Hi Keys,

Thanks for your reply. Neither I want to make this conversation specific to
environment/vendor, so ready to go off the list any time as soon as anyone
signals.

Thanks for clarifying item (2).

For item (1) yes we did check ticket contents with maprlogin print, it is
correct (listing UID N and GID K). We will try with GID only, although I
don't see anything wrong with inclusion of UID (user 'foo' impersonating
user 'foo' should work :-))

We are sure that we launch Drill-on-YARN application with correct ticket.
That is evident by the fact that Drillbit runs as ticket user 'foo' whereas
we launch the application from private account shell session. But indeed we
are not sure if / how MapR ticket credentials get passed along with YARN
delegation tokens all the way down to Drillbit spawned by the YARN
container and if such trick is actually supported at all? This is why
earlier in this thread I mentioned that we are not sure if this idea is
workable at all. Nevertheless, regardless of the ticket, we found it very
surprising that impersonation actually seems to work for any user, also
outside of 'bar' group, even though Drillbit process UID is 'foo' and 'foo'
is not a privileged MapR user. This looks like additional issue and we will
be debugging further into it. Of course, any suggestions or hints on this
would be much appreciated.

Best Regards,
Alex


On Tue, Aug 21, 2018 at 6:56 PM Keys Botzum <kb...@mapr.com> wrote:

> Alex,
>
> Obviously I don't want this conversation to sound too much like a vendor
> conversation but I do want to be helpful. If folks think this is too vendor
> specific I'm happy to take the conversation off list but others that are
> using Drill on MapR might benefit here as well.
>
> This is helpful. Let me take the easy question first. (2) is not working
> because the POSIX client is not designed to work with constrained
> impersonation tickets. This is a case of works as design. There is an
> internal enhancement bug to address that for the FUSE version of the POSIX
> client. If support isn't familiar, please tell them to look at internal
> bugzilla bug #31117. If there is further confusion, please ask them to talk
> to me.
>
> Regarding (1), something isn't quite right here. In your generateticket
> command you should not need to specify the -impersonateduids as that is
> saying that the ticket can impersonate the user N which seems unrelated to
> your needs. The -impersonatedgids K seems like the right thing to specify.
> After you ran that command did you look at the output of maprlogin print to
> ensure the ticket looks correct? More importantly are you sure Drill is
> actually using that ticket? Given the behavior described I suspect Drill is
> using another ticket. How did you configure Drill to use this ticket? My
> suspicion is that Drill is still using the 'mapr' ticket in
> /opt/mapr/conf/mapruserticket.
>
> Keys
> _______________________________
> Keys Botzum
> MapR Technologies
> http://www.mapr.com
>
> > On Aug 21, 2018, at 12:45 PM, Oleksandr Kalinin <al...@gmail.com>
> wrote:
> >
> > Hi Keys,
> >
> > Assume we want to :
> > - Run Drill cluster on YARN as user 'foo' (UID = N)
> > - Authorize all users in group 'bar' (GID = K) for running Drill queries
> on
> > that cluster with impersonation enabled
> > - All other users should be able to connect to the cluster, but their
> > queries should fail with impersonation failure
> >
> > We expected (wrongly?) that launching Drill cluster on YARN with
> following
> > MapR ticket would be suitable :
> >
> > $ maprlogin generateticket -type servicewithimpersonation -user foo -out
> > foo.ticket  -duration x:0:0 -impersonateduids N  -impersonatedgids K
> >
> > However, we seem to have 2 issues :
> >
> > 1. When accessing Drill cluster launched on YARN with above ticket, and
> > even though 'foo' is non-privileged user, impersonation seems to work for
> > users outside of 'bar' group(!)
> > - we are currently puzzled by this behavior and continue to dig into the
> > issue hoping that something is wrong with our test
> >
> > 2. When using above ticket with another impersonating service - loopback
> > NFS client - we observe that service does not perform expected
> > impersonation. It only works for user 'foo'. Any other user using the
> > service gets FS permission denied error. This is the issue I raise to
> MapR
> > already.
> >
> > Thanks,
> > Best Regards,
> > Alex
> >
> > On Tue, Aug 21, 2018 at 6:24 PM Keys Botzum <kb...@mapr.com> wrote:
> >
> >> Can you comment on what isn't working with MapR in this scenario? I'm
> >> familiar with impersonation tickets and constrained impersonation.
> >>
> >> That said, I do agree that a general purpose feature in Drill that
> allows
> >> one to constrain who can issue queries seems useful.
> >>
> >> Keys
> >> _______________________________
> >> Keys Botzum
> >> MapR Technologies
> >> http://www.mapr.com
> >>
> >>> On Aug 21, 2018, at 3:47 AM, Joel Pfaff <jo...@gmail.com> wrote:
> >>>
> >>> Hello,
> >>>
> >>> "Unfortunately I have not used the setup described above but from
> >>> explanation looks like the impersonation tickets will be used by
> >> Drillbit's
> >>> on Tenant A to restrict the MapR platform access by a limited set of
> >>> Drillbit authenticated user. Using this any user in Tenant B will not
> be
> >>> able to execute query on Tenant A even though it can be authenticated
> >>> successfully by the Drillbit in Tenant A. This way authorization check
> is
> >>> done at data layer."
> >>>
> >>> Unfortunately, the tests we have done so far do not confirm this
> expected
> >>> behavior.
> >>> That's why Alex opened a ticket for an Authorization framework :
> >>>
> >>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_DRILL-2D6699&d=DwIBaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=GqmpS_1AHD_cvgkumRuDkBtRTvUsIvfjVomAQtdhBks&m=th4RzorF4mYi7oPGaRMacJVgsQwPrqO3721YuREqjM8&s=I9DqH7uLEEdgnaHNGN7zBJxfc5dtbDjJ09mLgcJdVB8&e=
> >>>
> >>> We have also opened a ticket to MapR to clarify the expected behavior
> of
> >>> impersonation tickets with group restrictions.
> >>>
> >>> Regards, Joel
> >>>
> >>> On Sun, Aug 19, 2018 at 9:21 PM Oleksandr Kalinin <al...@gmail.com>
> >>> wrote:
> >>>
> >>>> Hi Sorabh,
> >>>>
> >>>> In case of Hive, user connects to Hive server. Launching the query
> >> launches
> >>>> YARN application - each query is YARN application. To make sure that
> >> query
> >>>> uses YARN cluster resources launching user is authorized to use, YARN
> >>>> authorization kicks in - e.g. YARN queue ACLs - mechanism a bit
> similar
> >> to
> >>>> the one proposed in this thread. Once application is running,
> >> impersonation
> >>>> and data (FS) level authorization do the rest of the job like you say
> -
> >>>> that is indeed the key.
> >>>>
> >>>> We use the same authorization model for Spark - to run Spark job, user
> >> must
> >>>> launch it as YARN application on specific YARN resource protected by
> >> YARN
> >>>> authorization, with impersonation and FS level authorization following
> >> once
> >>>> the job is running.
> >>>>
> >>>> In case of Drill on YARN, user connects to Drill cluster which is
> >> *already*
> >>>> running as YARN application. Thus exposing that Drill cluster to any
> >> user
> >>>> in the entire YARN cluster we expose YARN resources users might be not
> >>>> authorized to use. That is main issue we are trying to solve.
> >>>>
> >>>> Hope this makes it clearer.
> >>>>
> >>>> Best Regards,
> >>>> Alex
> >>>>
> >>>>
> >>>> On Fri, Aug 17, 2018 at 11:57 PM, Sorabh Hamirwasia <
> >> shamirwasia@mapr.com>
> >>>> wrote:
> >>>>
> >>>>> Hi Joel/Alex,
> >>>>> Thanks for explaining the use case with multi tenant cluster.
> >>>>>
> >>>>> @Joel
> >>>>> Unfortunately I have not used the setup described above but from
> >>>>> explanation looks like the impersonation tickets will be used by
> >>>> Drillbit's
> >>>>> on Tenant A to restrict the MapR platform access by a limited set of
> >>>>> Drillbit authenticated user. Using this any user in Tenant B will not
> >> be
> >>>>> able to execute query on Tenant A even though it can be authenticated
> >>>>> successfully by the Drillbit in Tenant A. This way authorization
> check
> >> is
> >>>>> done at data layer.
> >>>>>
> >>>>> @Alex,
> >>>>> Adding an authorization check for a valid authenticated cluster user
> >>>>> shouldn't be a big change. Based on a configured set's of
> users/group a
> >>>>> subset of cluster user can be allowed to connect. But can you please
> >>>> point
> >>>>> to how other services do these authorization checks when running in
> >> multi
> >>>>> tenant environment ? Based on my understanding all these
> authorization
> >>>>> check in Hadoop system are done at data layer or they have a separate
> >>>>> security service which does these checks along with other security
> >> checks
> >>>>> for authentication, etc.
> >>>>>
> >>>>> Also please feel free to open a JIRA ticket with details.
> >>>>>
> >>>>> Thanks,
> >>>>> Sorabh
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Fri, Aug 17, 2018 at 11:21 AM, Oleksandr Kalinin <
> >> alexka79@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi Sorabh,
> >>>>>>
> >>>>>> Thanks for you comments. Joel described in detail our current
> thinking
> >>>> on
> >>>>>> how to overcome the issue. We are not yet 100% sure if it will
> >> actually
> >>>>>> work though.
> >>>>>>
> >>>>>> Actually I raised this topic in this mailing list because I think
> it's
> >>>>> not
> >>>>>> only specific to our setup. It's more about having nice "Drill on
> >> YARN"
> >>>>>> feature with very limited (frankly, no) access control which almost
> >>>> makes
> >>>>>> the feature unusable in environments where it is attractive - multi
> >>>>> tenant
> >>>>>> secure clusters. Supported security mechanisms are good for
> >>>>> authentication,
> >>>>>> but using them for authorization seems suboptimal. Typically, YARN
> >>>>> clusters
> >>>>>> run in single Kerberos realm and the need to introduce multiple
> realms
> >>>>> and
> >>>>>> separate identities for Drill service is not at all convenient (I am
> >>>>> pretty
> >>>>>> sure that in many environments like ours it is a no go). And how
> about
> >>>>> use
> >>>>>> cases with no Kerberos setup? If we can workaround access control by
> >>>>>> MapR-specific security tickets like described by Joel - good for us,
> >>>> but
> >>>>>> what about other environments?
> >>>>>>
> >>>>>> So the question is more whether it make sense to consider
> introducing
> >>>>> user
> >>>>>> authorization feature. This thread refers only to session
> >> authorization
> >>>>> to
> >>>>>> complement YARN feature, but it could be extendable of course, e.g.
> in
> >>>>>> similar ways like Drill already supports multiple authentication
> >>>>>> mechanisms.
> >>>>>>
> >>>>>> Thanks & Best Regards,
> >>>>>> Alex
> >>>>>>
> >>>>>> On Wed, Aug 15, 2018 at 11:30 PM, Sorabh Hamirwasia <
> >>>>> shamirwasia@mapr.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Hi Oleksandr,
> >>>>>>> Drill doesn't do any user management in itself, instead relies on
> the
> >>>>>>> corresponding security mechanisms in use to do it. It uses SASL
> >>>>> framework
> >>>>>>> to allow using different pluggable security mechanisms. So it
> should
> >>>> be
> >>>>>>> upon the security mechanism in use to do the authorization level
> >>>>> checks.
> >>>>>>> For example in your use case if you want to allow only certain
> set's
> >>>> of
> >>>>>>> users to connect to a cluster then you can choose to use Kerberos
> >>>> with
> >>>>>> each
> >>>>>>> cluster running in different realms. This will ensure client users
> >>>>>> running
> >>>>>>> in corresponding realm can only connect to cluster running in that
> >>>>> realm.
> >>>>>>>
> >>>>>>> For the impersonation issue I think it's a configuration issue and
> >>>> the
> >>>>>>> behavior is expected where all queries whether from user A or B are
> >>>>>>> executed as admin users.
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Sorabh
> >>>>>>>
> >>>>>>> On Mon, Aug 13, 2018 at 9:02 AM, Oleksandr Kalinin <
> >>>> alexka79@gmail.com
> >>>>>>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Hello Drill community,
> >>>>>>>>
> >>>>>>>> In multi-tenant YARN clusters, running multiple Drill-on-YARN
> >>>>> clusters
> >>>>>>>> seems as attractive feature as it enables leveraging on YARN
> >>>>> mechanisms
> >>>>>>> of
> >>>>>>>> resource management and isolation. However, there seems to be
> >>>> simple
> >>>>>>> access
> >>>>>>>> restriction issue. Assume :
> >>>>>>>>
> >>>>>>>> - Cluster A launched by user X
> >>>>>>>> - Cluster B launched by user Y
> >>>>>>>>
> >>>>>>>> Both users X and Y will be able to connect and run queries against
> >>>>>>> clusters
> >>>>>>>> A and B (in fact, that applies to any positively authenticated
> >>>> user,
> >>>>>> not
> >>>>>>>> only X and Y). Whereas we obviously would like to ensure exclusive
> >>>>>> usage
> >>>>>>> of
> >>>>>>>> clusters by their owners - who are owners of respective YARN
> >>>>> resources.
> >>>>>>> In
> >>>>>>>> case users X and Y are non-privileged DFS users and impersonation
> >>>> is
> >>>>>> not
> >>>>>>>> enabled, then user A has access to data on behalf of user B and
> >>>> vice
> >>>>>>> versa
> >>>>>>>> which is additional potential security issue.
> >>>>>>>>
> >>>>>>>> I was looking for possibilities to control connect authorization,
> >>>> but
> >>>>>>>> couldn't find anything related yet. Do I miss something maybe? Are
> >>>>>> there
> >>>>>>>> any other considerations, perhaps this point was already discussed
> >>>>>>> before?
> >>>>>>>>
> >>>>>>>> It could be possible to tweak PAM setup to trigger authentication
> >>>>>> failure
> >>>>>>>> for "undesired" users but that looks like an overkill in terms of
> >>>>>>>> complexity.
> >>>>>>>>
> >>>>>>>> From user perspective, basic ACL configuration with users and
> >>>> groups
> >>>>>>>> authorized to connect to Drillbit would already be sufficient IMO.
> >>>> Or
> >>>>>>>> configuration switch to ensure that only owner user is authorized
> >>>> to
> >>>>>>>> connect.
> >>>>>>>>
> >>>>>>>> Best Regards,
> >>>>>>>> Alex
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>
> >>
>
>

Re: Drillbit client connect authorization

Posted by Keys Botzum <kb...@mapr.com>.
Alex,

Obviously I don't want this conversation to sound too much like a vendor conversation but I do want to be helpful. If folks think this is too vendor specific I'm happy to take the conversation off list but others that are using Drill on MapR might benefit here as well.

This is helpful. Let me take the easy question first. (2) is not working because the POSIX client is not designed to work with constrained impersonation tickets. This is a case of works as design. There is an internal enhancement bug to address that for the FUSE version of the POSIX client. If support isn't familiar, please tell them to look at internal bugzilla bug #31117. If there is further confusion, please ask them to talk to me.

Regarding (1), something isn't quite right here. In your generateticket command you should not need to specify the -impersonateduids as that is saying that the ticket can impersonate the user N which seems unrelated to your needs. The -impersonatedgids K seems like the right thing to specify. After you ran that command did you look at the output of maprlogin print to ensure the ticket looks correct? More importantly are you sure Drill is actually using that ticket? Given the behavior described I suspect Drill is using another ticket. How did you configure Drill to use this ticket? My suspicion is that Drill is still using the 'mapr' ticket in /opt/mapr/conf/mapruserticket.

Keys
_______________________________
Keys Botzum 
MapR Technologies 
http://www.mapr.com

> On Aug 21, 2018, at 12:45 PM, Oleksandr Kalinin <al...@gmail.com> wrote:
> 
> Hi Keys,
> 
> Assume we want to :
> - Run Drill cluster on YARN as user 'foo' (UID = N)
> - Authorize all users in group 'bar' (GID = K) for running Drill queries on
> that cluster with impersonation enabled
> - All other users should be able to connect to the cluster, but their
> queries should fail with impersonation failure
> 
> We expected (wrongly?) that launching Drill cluster on YARN with following
> MapR ticket would be suitable :
> 
> $ maprlogin generateticket -type servicewithimpersonation -user foo -out
> foo.ticket  -duration x:0:0 -impersonateduids N  -impersonatedgids K
> 
> However, we seem to have 2 issues :
> 
> 1. When accessing Drill cluster launched on YARN with above ticket, and
> even though 'foo' is non-privileged user, impersonation seems to work for
> users outside of 'bar' group(!)
> - we are currently puzzled by this behavior and continue to dig into the
> issue hoping that something is wrong with our test
> 
> 2. When using above ticket with another impersonating service - loopback
> NFS client - we observe that service does not perform expected
> impersonation. It only works for user 'foo'. Any other user using the
> service gets FS permission denied error. This is the issue I raise to MapR
> already.
> 
> Thanks,
> Best Regards,
> Alex
> 
> On Tue, Aug 21, 2018 at 6:24 PM Keys Botzum <kb...@mapr.com> wrote:
> 
>> Can you comment on what isn't working with MapR in this scenario? I'm
>> familiar with impersonation tickets and constrained impersonation.
>> 
>> That said, I do agree that a general purpose feature in Drill that allows
>> one to constrain who can issue queries seems useful.
>> 
>> Keys
>> _______________________________
>> Keys Botzum
>> MapR Technologies
>> http://www.mapr.com
>> 
>>> On Aug 21, 2018, at 3:47 AM, Joel Pfaff <jo...@gmail.com> wrote:
>>> 
>>> Hello,
>>> 
>>> "Unfortunately I have not used the setup described above but from
>>> explanation looks like the impersonation tickets will be used by
>> Drillbit's
>>> on Tenant A to restrict the MapR platform access by a limited set of
>>> Drillbit authenticated user. Using this any user in Tenant B will not be
>>> able to execute query on Tenant A even though it can be authenticated
>>> successfully by the Drillbit in Tenant A. This way authorization check is
>>> done at data layer."
>>> 
>>> Unfortunately, the tests we have done so far do not confirm this expected
>>> behavior.
>>> That's why Alex opened a ticket for an Authorization framework :
>>> 
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_DRILL-2D6699&d=DwIBaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=GqmpS_1AHD_cvgkumRuDkBtRTvUsIvfjVomAQtdhBks&m=th4RzorF4mYi7oPGaRMacJVgsQwPrqO3721YuREqjM8&s=I9DqH7uLEEdgnaHNGN7zBJxfc5dtbDjJ09mLgcJdVB8&e=
>>> 
>>> We have also opened a ticket to MapR to clarify the expected behavior of
>>> impersonation tickets with group restrictions.
>>> 
>>> Regards, Joel
>>> 
>>> On Sun, Aug 19, 2018 at 9:21 PM Oleksandr Kalinin <al...@gmail.com>
>>> wrote:
>>> 
>>>> Hi Sorabh,
>>>> 
>>>> In case of Hive, user connects to Hive server. Launching the query
>> launches
>>>> YARN application - each query is YARN application. To make sure that
>> query
>>>> uses YARN cluster resources launching user is authorized to use, YARN
>>>> authorization kicks in - e.g. YARN queue ACLs - mechanism a bit similar
>> to
>>>> the one proposed in this thread. Once application is running,
>> impersonation
>>>> and data (FS) level authorization do the rest of the job like you say -
>>>> that is indeed the key.
>>>> 
>>>> We use the same authorization model for Spark - to run Spark job, user
>> must
>>>> launch it as YARN application on specific YARN resource protected by
>> YARN
>>>> authorization, with impersonation and FS level authorization following
>> once
>>>> the job is running.
>>>> 
>>>> In case of Drill on YARN, user connects to Drill cluster which is
>> *already*
>>>> running as YARN application. Thus exposing that Drill cluster to any
>> user
>>>> in the entire YARN cluster we expose YARN resources users might be not
>>>> authorized to use. That is main issue we are trying to solve.
>>>> 
>>>> Hope this makes it clearer.
>>>> 
>>>> Best Regards,
>>>> Alex
>>>> 
>>>> 
>>>> On Fri, Aug 17, 2018 at 11:57 PM, Sorabh Hamirwasia <
>> shamirwasia@mapr.com>
>>>> wrote:
>>>> 
>>>>> Hi Joel/Alex,
>>>>> Thanks for explaining the use case with multi tenant cluster.
>>>>> 
>>>>> @Joel
>>>>> Unfortunately I have not used the setup described above but from
>>>>> explanation looks like the impersonation tickets will be used by
>>>> Drillbit's
>>>>> on Tenant A to restrict the MapR platform access by a limited set of
>>>>> Drillbit authenticated user. Using this any user in Tenant B will not
>> be
>>>>> able to execute query on Tenant A even though it can be authenticated
>>>>> successfully by the Drillbit in Tenant A. This way authorization check
>> is
>>>>> done at data layer.
>>>>> 
>>>>> @Alex,
>>>>> Adding an authorization check for a valid authenticated cluster user
>>>>> shouldn't be a big change. Based on a configured set's of users/group a
>>>>> subset of cluster user can be allowed to connect. But can you please
>>>> point
>>>>> to how other services do these authorization checks when running in
>> multi
>>>>> tenant environment ? Based on my understanding all these authorization
>>>>> check in Hadoop system are done at data layer or they have a separate
>>>>> security service which does these checks along with other security
>> checks
>>>>> for authentication, etc.
>>>>> 
>>>>> Also please feel free to open a JIRA ticket with details.
>>>>> 
>>>>> Thanks,
>>>>> Sorabh
>>>>> 
>>>>> 
>>>>> 
>>>>> On Fri, Aug 17, 2018 at 11:21 AM, Oleksandr Kalinin <
>> alexka79@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> Hi Sorabh,
>>>>>> 
>>>>>> Thanks for you comments. Joel described in detail our current thinking
>>>> on
>>>>>> how to overcome the issue. We are not yet 100% sure if it will
>> actually
>>>>>> work though.
>>>>>> 
>>>>>> Actually I raised this topic in this mailing list because I think it's
>>>>> not
>>>>>> only specific to our setup. It's more about having nice "Drill on
>> YARN"
>>>>>> feature with very limited (frankly, no) access control which almost
>>>> makes
>>>>>> the feature unusable in environments where it is attractive - multi
>>>>> tenant
>>>>>> secure clusters. Supported security mechanisms are good for
>>>>> authentication,
>>>>>> but using them for authorization seems suboptimal. Typically, YARN
>>>>> clusters
>>>>>> run in single Kerberos realm and the need to introduce multiple realms
>>>>> and
>>>>>> separate identities for Drill service is not at all convenient (I am
>>>>> pretty
>>>>>> sure that in many environments like ours it is a no go). And how about
>>>>> use
>>>>>> cases with no Kerberos setup? If we can workaround access control by
>>>>>> MapR-specific security tickets like described by Joel - good for us,
>>>> but
>>>>>> what about other environments?
>>>>>> 
>>>>>> So the question is more whether it make sense to consider introducing
>>>>> user
>>>>>> authorization feature. This thread refers only to session
>> authorization
>>>>> to
>>>>>> complement YARN feature, but it could be extendable of course, e.g. in
>>>>>> similar ways like Drill already supports multiple authentication
>>>>>> mechanisms.
>>>>>> 
>>>>>> Thanks & Best Regards,
>>>>>> Alex
>>>>>> 
>>>>>> On Wed, Aug 15, 2018 at 11:30 PM, Sorabh Hamirwasia <
>>>>> shamirwasia@mapr.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi Oleksandr,
>>>>>>> Drill doesn't do any user management in itself, instead relies on the
>>>>>>> corresponding security mechanisms in use to do it. It uses SASL
>>>>> framework
>>>>>>> to allow using different pluggable security mechanisms. So it should
>>>> be
>>>>>>> upon the security mechanism in use to do the authorization level
>>>>> checks.
>>>>>>> For example in your use case if you want to allow only certain set's
>>>> of
>>>>>>> users to connect to a cluster then you can choose to use Kerberos
>>>> with
>>>>>> each
>>>>>>> cluster running in different realms. This will ensure client users
>>>>>> running
>>>>>>> in corresponding realm can only connect to cluster running in that
>>>>> realm.
>>>>>>> 
>>>>>>> For the impersonation issue I think it's a configuration issue and
>>>> the
>>>>>>> behavior is expected where all queries whether from user A or B are
>>>>>>> executed as admin users.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Sorabh
>>>>>>> 
>>>>>>> On Mon, Aug 13, 2018 at 9:02 AM, Oleksandr Kalinin <
>>>> alexka79@gmail.com
>>>>>> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hello Drill community,
>>>>>>>> 
>>>>>>>> In multi-tenant YARN clusters, running multiple Drill-on-YARN
>>>>> clusters
>>>>>>>> seems as attractive feature as it enables leveraging on YARN
>>>>> mechanisms
>>>>>>> of
>>>>>>>> resource management and isolation. However, there seems to be
>>>> simple
>>>>>>> access
>>>>>>>> restriction issue. Assume :
>>>>>>>> 
>>>>>>>> - Cluster A launched by user X
>>>>>>>> - Cluster B launched by user Y
>>>>>>>> 
>>>>>>>> Both users X and Y will be able to connect and run queries against
>>>>>>> clusters
>>>>>>>> A and B (in fact, that applies to any positively authenticated
>>>> user,
>>>>>> not
>>>>>>>> only X and Y). Whereas we obviously would like to ensure exclusive
>>>>>> usage
>>>>>>> of
>>>>>>>> clusters by their owners - who are owners of respective YARN
>>>>> resources.
>>>>>>> In
>>>>>>>> case users X and Y are non-privileged DFS users and impersonation
>>>> is
>>>>>> not
>>>>>>>> enabled, then user A has access to data on behalf of user B and
>>>> vice
>>>>>>> versa
>>>>>>>> which is additional potential security issue.
>>>>>>>> 
>>>>>>>> I was looking for possibilities to control connect authorization,
>>>> but
>>>>>>>> couldn't find anything related yet. Do I miss something maybe? Are
>>>>>> there
>>>>>>>> any other considerations, perhaps this point was already discussed
>>>>>>> before?
>>>>>>>> 
>>>>>>>> It could be possible to tweak PAM setup to trigger authentication
>>>>>> failure
>>>>>>>> for "undesired" users but that looks like an overkill in terms of
>>>>>>>> complexity.
>>>>>>>> 
>>>>>>>> From user perspective, basic ACL configuration with users and
>>>> groups
>>>>>>>> authorized to connect to Drillbit would already be sufficient IMO.
>>>> Or
>>>>>>>> configuration switch to ensure that only owner user is authorized
>>>> to
>>>>>>>> connect.
>>>>>>>> 
>>>>>>>> Best Regards,
>>>>>>>> Alex
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> 
>> 


Re: Drillbit client connect authorization

Posted by Oleksandr Kalinin <al...@gmail.com>.
Hi Keys,

Assume we want to :
- Run Drill cluster on YARN as user 'foo' (UID = N)
- Authorize all users in group 'bar' (GID = K) for running Drill queries on
that cluster with impersonation enabled
- All other users should be able to connect to the cluster, but their
queries should fail with impersonation failure

We expected (wrongly?) that launching Drill cluster on YARN with following
MapR ticket would be suitable :

$ maprlogin generateticket -type servicewithimpersonation -user foo -out
foo.ticket  -duration x:0:0 -impersonateduids N  -impersonatedgids K

However, we seem to have 2 issues :

1. When accessing Drill cluster launched on YARN with above ticket, and
even though 'foo' is non-privileged user, impersonation seems to work for
users outside of 'bar' group(!)
- we are currently puzzled by this behavior and continue to dig into the
issue hoping that something is wrong with our test

2. When using above ticket with another impersonating service - loopback
NFS client - we observe that service does not perform expected
impersonation. It only works for user 'foo'. Any other user using the
service gets FS permission denied error. This is the issue I raise to MapR
already.

Thanks,
Best Regards,
Alex

On Tue, Aug 21, 2018 at 6:24 PM Keys Botzum <kb...@mapr.com> wrote:

> Can you comment on what isn't working with MapR in this scenario? I'm
> familiar with impersonation tickets and constrained impersonation.
>
> That said, I do agree that a general purpose feature in Drill that allows
> one to constrain who can issue queries seems useful.
>
> Keys
> _______________________________
> Keys Botzum
> MapR Technologies
> http://www.mapr.com
>
> > On Aug 21, 2018, at 3:47 AM, Joel Pfaff <jo...@gmail.com> wrote:
> >
> > Hello,
> >
> > "Unfortunately I have not used the setup described above but from
> > explanation looks like the impersonation tickets will be used by
> Drillbit's
> > on Tenant A to restrict the MapR platform access by a limited set of
> > Drillbit authenticated user. Using this any user in Tenant B will not be
> > able to execute query on Tenant A even though it can be authenticated
> > successfully by the Drillbit in Tenant A. This way authorization check is
> > done at data layer."
> >
> > Unfortunately, the tests we have done so far do not confirm this expected
> > behavior.
> > That's why Alex opened a ticket for an Authorization framework :
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_DRILL-2D6699&d=DwIBaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=GqmpS_1AHD_cvgkumRuDkBtRTvUsIvfjVomAQtdhBks&m=th4RzorF4mYi7oPGaRMacJVgsQwPrqO3721YuREqjM8&s=I9DqH7uLEEdgnaHNGN7zBJxfc5dtbDjJ09mLgcJdVB8&e=
> >
> > We have also opened a ticket to MapR to clarify the expected behavior of
> > impersonation tickets with group restrictions.
> >
> > Regards, Joel
> >
> > On Sun, Aug 19, 2018 at 9:21 PM Oleksandr Kalinin <al...@gmail.com>
> > wrote:
> >
> >> Hi Sorabh,
> >>
> >> In case of Hive, user connects to Hive server. Launching the query
> launches
> >> YARN application - each query is YARN application. To make sure that
> query
> >> uses YARN cluster resources launching user is authorized to use, YARN
> >> authorization kicks in - e.g. YARN queue ACLs - mechanism a bit similar
> to
> >> the one proposed in this thread. Once application is running,
> impersonation
> >> and data (FS) level authorization do the rest of the job like you say -
> >> that is indeed the key.
> >>
> >> We use the same authorization model for Spark - to run Spark job, user
> must
> >> launch it as YARN application on specific YARN resource protected by
> YARN
> >> authorization, with impersonation and FS level authorization following
> once
> >> the job is running.
> >>
> >> In case of Drill on YARN, user connects to Drill cluster which is
> *already*
> >> running as YARN application. Thus exposing that Drill cluster to any
> user
> >> in the entire YARN cluster we expose YARN resources users might be not
> >> authorized to use. That is main issue we are trying to solve.
> >>
> >> Hope this makes it clearer.
> >>
> >> Best Regards,
> >> Alex
> >>
> >>
> >> On Fri, Aug 17, 2018 at 11:57 PM, Sorabh Hamirwasia <
> shamirwasia@mapr.com>
> >> wrote:
> >>
> >>> Hi Joel/Alex,
> >>> Thanks for explaining the use case with multi tenant cluster.
> >>>
> >>> @Joel
> >>> Unfortunately I have not used the setup described above but from
> >>> explanation looks like the impersonation tickets will be used by
> >> Drillbit's
> >>> on Tenant A to restrict the MapR platform access by a limited set of
> >>> Drillbit authenticated user. Using this any user in Tenant B will not
> be
> >>> able to execute query on Tenant A even though it can be authenticated
> >>> successfully by the Drillbit in Tenant A. This way authorization check
> is
> >>> done at data layer.
> >>>
> >>> @Alex,
> >>> Adding an authorization check for a valid authenticated cluster user
> >>> shouldn't be a big change. Based on a configured set's of users/group a
> >>> subset of cluster user can be allowed to connect. But can you please
> >> point
> >>> to how other services do these authorization checks when running in
> multi
> >>> tenant environment ? Based on my understanding all these authorization
> >>> check in Hadoop system are done at data layer or they have a separate
> >>> security service which does these checks along with other security
> checks
> >>> for authentication, etc.
> >>>
> >>> Also please feel free to open a JIRA ticket with details.
> >>>
> >>> Thanks,
> >>> Sorabh
> >>>
> >>>
> >>>
> >>> On Fri, Aug 17, 2018 at 11:21 AM, Oleksandr Kalinin <
> alexka79@gmail.com>
> >>> wrote:
> >>>
> >>>> Hi Sorabh,
> >>>>
> >>>> Thanks for you comments. Joel described in detail our current thinking
> >> on
> >>>> how to overcome the issue. We are not yet 100% sure if it will
> actually
> >>>> work though.
> >>>>
> >>>> Actually I raised this topic in this mailing list because I think it's
> >>> not
> >>>> only specific to our setup. It's more about having nice "Drill on
> YARN"
> >>>> feature with very limited (frankly, no) access control which almost
> >> makes
> >>>> the feature unusable in environments where it is attractive - multi
> >>> tenant
> >>>> secure clusters. Supported security mechanisms are good for
> >>> authentication,
> >>>> but using them for authorization seems suboptimal. Typically, YARN
> >>> clusters
> >>>> run in single Kerberos realm and the need to introduce multiple realms
> >>> and
> >>>> separate identities for Drill service is not at all convenient (I am
> >>> pretty
> >>>> sure that in many environments like ours it is a no go). And how about
> >>> use
> >>>> cases with no Kerberos setup? If we can workaround access control by
> >>>> MapR-specific security tickets like described by Joel - good for us,
> >> but
> >>>> what about other environments?
> >>>>
> >>>> So the question is more whether it make sense to consider introducing
> >>> user
> >>>> authorization feature. This thread refers only to session
> authorization
> >>> to
> >>>> complement YARN feature, but it could be extendable of course, e.g. in
> >>>> similar ways like Drill already supports multiple authentication
> >>>> mechanisms.
> >>>>
> >>>> Thanks & Best Regards,
> >>>> Alex
> >>>>
> >>>> On Wed, Aug 15, 2018 at 11:30 PM, Sorabh Hamirwasia <
> >>> shamirwasia@mapr.com>
> >>>> wrote:
> >>>>
> >>>>> Hi Oleksandr,
> >>>>> Drill doesn't do any user management in itself, instead relies on the
> >>>>> corresponding security mechanisms in use to do it. It uses SASL
> >>> framework
> >>>>> to allow using different pluggable security mechanisms. So it should
> >> be
> >>>>> upon the security mechanism in use to do the authorization level
> >>> checks.
> >>>>> For example in your use case if you want to allow only certain set's
> >> of
> >>>>> users to connect to a cluster then you can choose to use Kerberos
> >> with
> >>>> each
> >>>>> cluster running in different realms. This will ensure client users
> >>>> running
> >>>>> in corresponding realm can only connect to cluster running in that
> >>> realm.
> >>>>>
> >>>>> For the impersonation issue I think it's a configuration issue and
> >> the
> >>>>> behavior is expected where all queries whether from user A or B are
> >>>>> executed as admin users.
> >>>>>
> >>>>> Thanks,
> >>>>> Sorabh
> >>>>>
> >>>>> On Mon, Aug 13, 2018 at 9:02 AM, Oleksandr Kalinin <
> >> alexka79@gmail.com
> >>>>
> >>>>> wrote:
> >>>>>
> >>>>>> Hello Drill community,
> >>>>>>
> >>>>>> In multi-tenant YARN clusters, running multiple Drill-on-YARN
> >>> clusters
> >>>>>> seems as attractive feature as it enables leveraging on YARN
> >>> mechanisms
> >>>>> of
> >>>>>> resource management and isolation. However, there seems to be
> >> simple
> >>>>> access
> >>>>>> restriction issue. Assume :
> >>>>>>
> >>>>>> - Cluster A launched by user X
> >>>>>> - Cluster B launched by user Y
> >>>>>>
> >>>>>> Both users X and Y will be able to connect and run queries against
> >>>>> clusters
> >>>>>> A and B (in fact, that applies to any positively authenticated
> >> user,
> >>>> not
> >>>>>> only X and Y). Whereas we obviously would like to ensure exclusive
> >>>> usage
> >>>>> of
> >>>>>> clusters by their owners - who are owners of respective YARN
> >>> resources.
> >>>>> In
> >>>>>> case users X and Y are non-privileged DFS users and impersonation
> >> is
> >>>> not
> >>>>>> enabled, then user A has access to data on behalf of user B and
> >> vice
> >>>>> versa
> >>>>>> which is additional potential security issue.
> >>>>>>
> >>>>>> I was looking for possibilities to control connect authorization,
> >> but
> >>>>>> couldn't find anything related yet. Do I miss something maybe? Are
> >>>> there
> >>>>>> any other considerations, perhaps this point was already discussed
> >>>>> before?
> >>>>>>
> >>>>>> It could be possible to tweak PAM setup to trigger authentication
> >>>> failure
> >>>>>> for "undesired" users but that looks like an overkill in terms of
> >>>>>> complexity.
> >>>>>>
> >>>>>> From user perspective, basic ACL configuration with users and
> >> groups
> >>>>>> authorized to connect to Drillbit would already be sufficient IMO.
> >> Or
> >>>>>> configuration switch to ensure that only owner user is authorized
> >> to
> >>>>>> connect.
> >>>>>>
> >>>>>> Best Regards,
> >>>>>> Alex
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
>
>

Re: Drillbit client connect authorization

Posted by Keys Botzum <kb...@mapr.com>.
Can you comment on what isn't working with MapR in this scenario? I'm familiar with impersonation tickets and constrained impersonation.

That said, I do agree that a general purpose feature in Drill that allows one to constrain who can issue queries seems useful.

Keys
_______________________________
Keys Botzum 
MapR Technologies 
http://www.mapr.com

> On Aug 21, 2018, at 3:47 AM, Joel Pfaff <jo...@gmail.com> wrote:
> 
> Hello,
> 
> "Unfortunately I have not used the setup described above but from
> explanation looks like the impersonation tickets will be used by Drillbit's
> on Tenant A to restrict the MapR platform access by a limited set of
> Drillbit authenticated user. Using this any user in Tenant B will not be
> able to execute query on Tenant A even though it can be authenticated
> successfully by the Drillbit in Tenant A. This way authorization check is
> done at data layer."
> 
> Unfortunately, the tests we have done so far do not confirm this expected
> behavior.
> That's why Alex opened a ticket for an Authorization framework :
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_DRILL-2D6699&d=DwIBaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=GqmpS_1AHD_cvgkumRuDkBtRTvUsIvfjVomAQtdhBks&m=th4RzorF4mYi7oPGaRMacJVgsQwPrqO3721YuREqjM8&s=I9DqH7uLEEdgnaHNGN7zBJxfc5dtbDjJ09mLgcJdVB8&e=
> 
> We have also opened a ticket to MapR to clarify the expected behavior of
> impersonation tickets with group restrictions.
> 
> Regards, Joel
> 
> On Sun, Aug 19, 2018 at 9:21 PM Oleksandr Kalinin <al...@gmail.com>
> wrote:
> 
>> Hi Sorabh,
>> 
>> In case of Hive, user connects to Hive server. Launching the query launches
>> YARN application - each query is YARN application. To make sure that query
>> uses YARN cluster resources launching user is authorized to use, YARN
>> authorization kicks in - e.g. YARN queue ACLs - mechanism a bit similar to
>> the one proposed in this thread. Once application is running, impersonation
>> and data (FS) level authorization do the rest of the job like you say -
>> that is indeed the key.
>> 
>> We use the same authorization model for Spark - to run Spark job, user must
>> launch it as YARN application on specific YARN resource protected by YARN
>> authorization, with impersonation and FS level authorization following once
>> the job is running.
>> 
>> In case of Drill on YARN, user connects to Drill cluster which is *already*
>> running as YARN application. Thus exposing that Drill cluster to any user
>> in the entire YARN cluster we expose YARN resources users might be not
>> authorized to use. That is main issue we are trying to solve.
>> 
>> Hope this makes it clearer.
>> 
>> Best Regards,
>> Alex
>> 
>> 
>> On Fri, Aug 17, 2018 at 11:57 PM, Sorabh Hamirwasia <sh...@mapr.com>
>> wrote:
>> 
>>> Hi Joel/Alex,
>>> Thanks for explaining the use case with multi tenant cluster.
>>> 
>>> @Joel
>>> Unfortunately I have not used the setup described above but from
>>> explanation looks like the impersonation tickets will be used by
>> Drillbit's
>>> on Tenant A to restrict the MapR platform access by a limited set of
>>> Drillbit authenticated user. Using this any user in Tenant B will not be
>>> able to execute query on Tenant A even though it can be authenticated
>>> successfully by the Drillbit in Tenant A. This way authorization check is
>>> done at data layer.
>>> 
>>> @Alex,
>>> Adding an authorization check for a valid authenticated cluster user
>>> shouldn't be a big change. Based on a configured set's of users/group a
>>> subset of cluster user can be allowed to connect. But can you please
>> point
>>> to how other services do these authorization checks when running in multi
>>> tenant environment ? Based on my understanding all these authorization
>>> check in Hadoop system are done at data layer or they have a separate
>>> security service which does these checks along with other security checks
>>> for authentication, etc.
>>> 
>>> Also please feel free to open a JIRA ticket with details.
>>> 
>>> Thanks,
>>> Sorabh
>>> 
>>> 
>>> 
>>> On Fri, Aug 17, 2018 at 11:21 AM, Oleksandr Kalinin <al...@gmail.com>
>>> wrote:
>>> 
>>>> Hi Sorabh,
>>>> 
>>>> Thanks for you comments. Joel described in detail our current thinking
>> on
>>>> how to overcome the issue. We are not yet 100% sure if it will actually
>>>> work though.
>>>> 
>>>> Actually I raised this topic in this mailing list because I think it's
>>> not
>>>> only specific to our setup. It's more about having nice "Drill on YARN"
>>>> feature with very limited (frankly, no) access control which almost
>> makes
>>>> the feature unusable in environments where it is attractive - multi
>>> tenant
>>>> secure clusters. Supported security mechanisms are good for
>>> authentication,
>>>> but using them for authorization seems suboptimal. Typically, YARN
>>> clusters
>>>> run in single Kerberos realm and the need to introduce multiple realms
>>> and
>>>> separate identities for Drill service is not at all convenient (I am
>>> pretty
>>>> sure that in many environments like ours it is a no go). And how about
>>> use
>>>> cases with no Kerberos setup? If we can workaround access control by
>>>> MapR-specific security tickets like described by Joel - good for us,
>> but
>>>> what about other environments?
>>>> 
>>>> So the question is more whether it make sense to consider introducing
>>> user
>>>> authorization feature. This thread refers only to session authorization
>>> to
>>>> complement YARN feature, but it could be extendable of course, e.g. in
>>>> similar ways like Drill already supports multiple authentication
>>>> mechanisms.
>>>> 
>>>> Thanks & Best Regards,
>>>> Alex
>>>> 
>>>> On Wed, Aug 15, 2018 at 11:30 PM, Sorabh Hamirwasia <
>>> shamirwasia@mapr.com>
>>>> wrote:
>>>> 
>>>>> Hi Oleksandr,
>>>>> Drill doesn't do any user management in itself, instead relies on the
>>>>> corresponding security mechanisms in use to do it. It uses SASL
>>> framework
>>>>> to allow using different pluggable security mechanisms. So it should
>> be
>>>>> upon the security mechanism in use to do the authorization level
>>> checks.
>>>>> For example in your use case if you want to allow only certain set's
>> of
>>>>> users to connect to a cluster then you can choose to use Kerberos
>> with
>>>> each
>>>>> cluster running in different realms. This will ensure client users
>>>> running
>>>>> in corresponding realm can only connect to cluster running in that
>>> realm.
>>>>> 
>>>>> For the impersonation issue I think it's a configuration issue and
>> the
>>>>> behavior is expected where all queries whether from user A or B are
>>>>> executed as admin users.
>>>>> 
>>>>> Thanks,
>>>>> Sorabh
>>>>> 
>>>>> On Mon, Aug 13, 2018 at 9:02 AM, Oleksandr Kalinin <
>> alexka79@gmail.com
>>>> 
>>>>> wrote:
>>>>> 
>>>>>> Hello Drill community,
>>>>>> 
>>>>>> In multi-tenant YARN clusters, running multiple Drill-on-YARN
>>> clusters
>>>>>> seems as attractive feature as it enables leveraging on YARN
>>> mechanisms
>>>>> of
>>>>>> resource management and isolation. However, there seems to be
>> simple
>>>>> access
>>>>>> restriction issue. Assume :
>>>>>> 
>>>>>> - Cluster A launched by user X
>>>>>> - Cluster B launched by user Y
>>>>>> 
>>>>>> Both users X and Y will be able to connect and run queries against
>>>>> clusters
>>>>>> A and B (in fact, that applies to any positively authenticated
>> user,
>>>> not
>>>>>> only X and Y). Whereas we obviously would like to ensure exclusive
>>>> usage
>>>>> of
>>>>>> clusters by their owners - who are owners of respective YARN
>>> resources.
>>>>> In
>>>>>> case users X and Y are non-privileged DFS users and impersonation
>> is
>>>> not
>>>>>> enabled, then user A has access to data on behalf of user B and
>> vice
>>>>> versa
>>>>>> which is additional potential security issue.
>>>>>> 
>>>>>> I was looking for possibilities to control connect authorization,
>> but
>>>>>> couldn't find anything related yet. Do I miss something maybe? Are
>>>> there
>>>>>> any other considerations, perhaps this point was already discussed
>>>>> before?
>>>>>> 
>>>>>> It could be possible to tweak PAM setup to trigger authentication
>>>> failure
>>>>>> for "undesired" users but that looks like an overkill in terms of
>>>>>> complexity.
>>>>>> 
>>>>>> From user perspective, basic ACL configuration with users and
>> groups
>>>>>> authorized to connect to Drillbit would already be sufficient IMO.
>> Or
>>>>>> configuration switch to ensure that only owner user is authorized
>> to
>>>>>> connect.
>>>>>> 
>>>>>> Best Regards,
>>>>>> Alex
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 


Re: Drillbit client connect authorization

Posted by Joel Pfaff <jo...@gmail.com>.
Hello,

"Unfortunately I have not used the setup described above but from
explanation looks like the impersonation tickets will be used by Drillbit's
on Tenant A to restrict the MapR platform access by a limited set of
Drillbit authenticated user. Using this any user in Tenant B will not be
able to execute query on Tenant A even though it can be authenticated
successfully by the Drillbit in Tenant A. This way authorization check is
done at data layer."

Unfortunately, the tests we have done so far do not confirm this expected
behavior.
That's why Alex opened a ticket for an Authorization framework :
https://issues.apache.org/jira/browse/DRILL-6699

We have also opened a ticket to MapR to clarify the expected behavior of
impersonation tickets with group restrictions.

Regards, Joel

On Sun, Aug 19, 2018 at 9:21 PM Oleksandr Kalinin <al...@gmail.com>
wrote:

> Hi Sorabh,
>
> In case of Hive, user connects to Hive server. Launching the query launches
> YARN application - each query is YARN application. To make sure that query
> uses YARN cluster resources launching user is authorized to use, YARN
> authorization kicks in - e.g. YARN queue ACLs - mechanism a bit similar to
> the one proposed in this thread. Once application is running, impersonation
> and data (FS) level authorization do the rest of the job like you say -
> that is indeed the key.
>
> We use the same authorization model for Spark - to run Spark job, user must
> launch it as YARN application on specific YARN resource protected by YARN
> authorization, with impersonation and FS level authorization following once
> the job is running.
>
> In case of Drill on YARN, user connects to Drill cluster which is *already*
> running as YARN application. Thus exposing that Drill cluster to any user
> in the entire YARN cluster we expose YARN resources users might be not
> authorized to use. That is main issue we are trying to solve.
>
> Hope this makes it clearer.
>
> Best Regards,
> Alex
>
>
> On Fri, Aug 17, 2018 at 11:57 PM, Sorabh Hamirwasia <sh...@mapr.com>
> wrote:
>
> > Hi Joel/Alex,
> > Thanks for explaining the use case with multi tenant cluster.
> >
> > @Joel
> > Unfortunately I have not used the setup described above but from
> > explanation looks like the impersonation tickets will be used by
> Drillbit's
> > on Tenant A to restrict the MapR platform access by a limited set of
> > Drillbit authenticated user. Using this any user in Tenant B will not be
> > able to execute query on Tenant A even though it can be authenticated
> > successfully by the Drillbit in Tenant A. This way authorization check is
> > done at data layer.
> >
> > @Alex,
> > Adding an authorization check for a valid authenticated cluster user
> > shouldn't be a big change. Based on a configured set's of users/group a
> > subset of cluster user can be allowed to connect. But can you please
> point
> > to how other services do these authorization checks when running in multi
> > tenant environment ? Based on my understanding all these authorization
> > check in Hadoop system are done at data layer or they have a separate
> > security service which does these checks along with other security checks
> > for authentication, etc.
> >
> > Also please feel free to open a JIRA ticket with details.
> >
> > Thanks,
> > Sorabh
> >
> >
> >
> > On Fri, Aug 17, 2018 at 11:21 AM, Oleksandr Kalinin <al...@gmail.com>
> > wrote:
> >
> > > Hi Sorabh,
> > >
> > > Thanks for you comments. Joel described in detail our current thinking
> on
> > > how to overcome the issue. We are not yet 100% sure if it will actually
> > > work though.
> > >
> > > Actually I raised this topic in this mailing list because I think it's
> > not
> > > only specific to our setup. It's more about having nice "Drill on YARN"
> > > feature with very limited (frankly, no) access control which almost
> makes
> > > the feature unusable in environments where it is attractive - multi
> > tenant
> > > secure clusters. Supported security mechanisms are good for
> > authentication,
> > > but using them for authorization seems suboptimal. Typically, YARN
> > clusters
> > > run in single Kerberos realm and the need to introduce multiple realms
> > and
> > > separate identities for Drill service is not at all convenient (I am
> > pretty
> > > sure that in many environments like ours it is a no go). And how about
> > use
> > > cases with no Kerberos setup? If we can workaround access control by
> > > MapR-specific security tickets like described by Joel - good for us,
> but
> > > what about other environments?
> > >
> > > So the question is more whether it make sense to consider introducing
> > user
> > > authorization feature. This thread refers only to session authorization
> > to
> > > complement YARN feature, but it could be extendable of course, e.g. in
> > > similar ways like Drill already supports multiple authentication
> > > mechanisms.
> > >
> > > Thanks & Best Regards,
> > > Alex
> > >
> > > On Wed, Aug 15, 2018 at 11:30 PM, Sorabh Hamirwasia <
> > shamirwasia@mapr.com>
> > > wrote:
> > >
> > > > Hi Oleksandr,
> > > > Drill doesn't do any user management in itself, instead relies on the
> > > > corresponding security mechanisms in use to do it. It uses SASL
> > framework
> > > > to allow using different pluggable security mechanisms. So it should
> be
> > > > upon the security mechanism in use to do the authorization level
> > checks.
> > > > For example in your use case if you want to allow only certain set's
> of
> > > > users to connect to a cluster then you can choose to use Kerberos
> with
> > > each
> > > > cluster running in different realms. This will ensure client users
> > > running
> > > > in corresponding realm can only connect to cluster running in that
> > realm.
> > > >
> > > > For the impersonation issue I think it's a configuration issue and
> the
> > > > behavior is expected where all queries whether from user A or B are
> > > > executed as admin users.
> > > >
> > > > Thanks,
> > > > Sorabh
> > > >
> > > > On Mon, Aug 13, 2018 at 9:02 AM, Oleksandr Kalinin <
> alexka79@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Hello Drill community,
> > > > >
> > > > > In multi-tenant YARN clusters, running multiple Drill-on-YARN
> > clusters
> > > > > seems as attractive feature as it enables leveraging on YARN
> > mechanisms
> > > > of
> > > > > resource management and isolation. However, there seems to be
> simple
> > > > access
> > > > > restriction issue. Assume :
> > > > >
> > > > > - Cluster A launched by user X
> > > > > - Cluster B launched by user Y
> > > > >
> > > > > Both users X and Y will be able to connect and run queries against
> > > > clusters
> > > > > A and B (in fact, that applies to any positively authenticated
> user,
> > > not
> > > > > only X and Y). Whereas we obviously would like to ensure exclusive
> > > usage
> > > > of
> > > > > clusters by their owners - who are owners of respective YARN
> > resources.
> > > > In
> > > > > case users X and Y are non-privileged DFS users and impersonation
> is
> > > not
> > > > > enabled, then user A has access to data on behalf of user B and
> vice
> > > > versa
> > > > > which is additional potential security issue.
> > > > >
> > > > > I was looking for possibilities to control connect authorization,
> but
> > > > > couldn't find anything related yet. Do I miss something maybe? Are
> > > there
> > > > > any other considerations, perhaps this point was already discussed
> > > > before?
> > > > >
> > > > > It could be possible to tweak PAM setup to trigger authentication
> > > failure
> > > > > for "undesired" users but that looks like an overkill in terms of
> > > > > complexity.
> > > > >
> > > > > From user perspective, basic ACL configuration with users and
> groups
> > > > > authorized to connect to Drillbit would already be sufficient IMO.
> Or
> > > > > configuration switch to ensure that only owner user is authorized
> to
> > > > > connect.
> > > > >
> > > > > Best Regards,
> > > > > Alex
> > > > >
> > > >
> > >
> >
>

Re: Drillbit client connect authorization

Posted by Oleksandr Kalinin <al...@gmail.com>.
Hi Sorabh,

In case of Hive, user connects to Hive server. Launching the query launches
YARN application - each query is YARN application. To make sure that query
uses YARN cluster resources launching user is authorized to use, YARN
authorization kicks in - e.g. YARN queue ACLs - mechanism a bit similar to
the one proposed in this thread. Once application is running, impersonation
and data (FS) level authorization do the rest of the job like you say -
that is indeed the key.

We use the same authorization model for Spark - to run Spark job, user must
launch it as YARN application on specific YARN resource protected by YARN
authorization, with impersonation and FS level authorization following once
the job is running.

In case of Drill on YARN, user connects to Drill cluster which is *already*
running as YARN application. Thus exposing that Drill cluster to any user
in the entire YARN cluster we expose YARN resources users might be not
authorized to use. That is main issue we are trying to solve.

Hope this makes it clearer.

Best Regards,
Alex


On Fri, Aug 17, 2018 at 11:57 PM, Sorabh Hamirwasia <sh...@mapr.com>
wrote:

> Hi Joel/Alex,
> Thanks for explaining the use case with multi tenant cluster.
>
> @Joel
> Unfortunately I have not used the setup described above but from
> explanation looks like the impersonation tickets will be used by Drillbit's
> on Tenant A to restrict the MapR platform access by a limited set of
> Drillbit authenticated user. Using this any user in Tenant B will not be
> able to execute query on Tenant A even though it can be authenticated
> successfully by the Drillbit in Tenant A. This way authorization check is
> done at data layer.
>
> @Alex,
> Adding an authorization check for a valid authenticated cluster user
> shouldn't be a big change. Based on a configured set's of users/group a
> subset of cluster user can be allowed to connect. But can you please point
> to how other services do these authorization checks when running in multi
> tenant environment ? Based on my understanding all these authorization
> check in Hadoop system are done at data layer or they have a separate
> security service which does these checks along with other security checks
> for authentication, etc.
>
> Also please feel free to open a JIRA ticket with details.
>
> Thanks,
> Sorabh
>
>
>
> On Fri, Aug 17, 2018 at 11:21 AM, Oleksandr Kalinin <al...@gmail.com>
> wrote:
>
> > Hi Sorabh,
> >
> > Thanks for you comments. Joel described in detail our current thinking on
> > how to overcome the issue. We are not yet 100% sure if it will actually
> > work though.
> >
> > Actually I raised this topic in this mailing list because I think it's
> not
> > only specific to our setup. It's more about having nice "Drill on YARN"
> > feature with very limited (frankly, no) access control which almost makes
> > the feature unusable in environments where it is attractive - multi
> tenant
> > secure clusters. Supported security mechanisms are good for
> authentication,
> > but using them for authorization seems suboptimal. Typically, YARN
> clusters
> > run in single Kerberos realm and the need to introduce multiple realms
> and
> > separate identities for Drill service is not at all convenient (I am
> pretty
> > sure that in many environments like ours it is a no go). And how about
> use
> > cases with no Kerberos setup? If we can workaround access control by
> > MapR-specific security tickets like described by Joel - good for us, but
> > what about other environments?
> >
> > So the question is more whether it make sense to consider introducing
> user
> > authorization feature. This thread refers only to session authorization
> to
> > complement YARN feature, but it could be extendable of course, e.g. in
> > similar ways like Drill already supports multiple authentication
> > mechanisms.
> >
> > Thanks & Best Regards,
> > Alex
> >
> > On Wed, Aug 15, 2018 at 11:30 PM, Sorabh Hamirwasia <
> shamirwasia@mapr.com>
> > wrote:
> >
> > > Hi Oleksandr,
> > > Drill doesn't do any user management in itself, instead relies on the
> > > corresponding security mechanisms in use to do it. It uses SASL
> framework
> > > to allow using different pluggable security mechanisms. So it should be
> > > upon the security mechanism in use to do the authorization level
> checks.
> > > For example in your use case if you want to allow only certain set's of
> > > users to connect to a cluster then you can choose to use Kerberos with
> > each
> > > cluster running in different realms. This will ensure client users
> > running
> > > in corresponding realm can only connect to cluster running in that
> realm.
> > >
> > > For the impersonation issue I think it's a configuration issue and the
> > > behavior is expected where all queries whether from user A or B are
> > > executed as admin users.
> > >
> > > Thanks,
> > > Sorabh
> > >
> > > On Mon, Aug 13, 2018 at 9:02 AM, Oleksandr Kalinin <alexka79@gmail.com
> >
> > > wrote:
> > >
> > > > Hello Drill community,
> > > >
> > > > In multi-tenant YARN clusters, running multiple Drill-on-YARN
> clusters
> > > > seems as attractive feature as it enables leveraging on YARN
> mechanisms
> > > of
> > > > resource management and isolation. However, there seems to be simple
> > > access
> > > > restriction issue. Assume :
> > > >
> > > > - Cluster A launched by user X
> > > > - Cluster B launched by user Y
> > > >
> > > > Both users X and Y will be able to connect and run queries against
> > > clusters
> > > > A and B (in fact, that applies to any positively authenticated user,
> > not
> > > > only X and Y). Whereas we obviously would like to ensure exclusive
> > usage
> > > of
> > > > clusters by their owners - who are owners of respective YARN
> resources.
> > > In
> > > > case users X and Y are non-privileged DFS users and impersonation is
> > not
> > > > enabled, then user A has access to data on behalf of user B and vice
> > > versa
> > > > which is additional potential security issue.
> > > >
> > > > I was looking for possibilities to control connect authorization, but
> > > > couldn't find anything related yet. Do I miss something maybe? Are
> > there
> > > > any other considerations, perhaps this point was already discussed
> > > before?
> > > >
> > > > It could be possible to tweak PAM setup to trigger authentication
> > failure
> > > > for "undesired" users but that looks like an overkill in terms of
> > > > complexity.
> > > >
> > > > From user perspective, basic ACL configuration with users and groups
> > > > authorized to connect to Drillbit would already be sufficient IMO. Or
> > > > configuration switch to ensure that only owner user is authorized to
> > > > connect.
> > > >
> > > > Best Regards,
> > > > Alex
> > > >
> > >
> >
>

Re: Drillbit client connect authorization

Posted by Sorabh Hamirwasia <sh...@mapr.com>.
Hi Joel/Alex,
Thanks for explaining the use case with multi tenant cluster.

@Joel
Unfortunately I have not used the setup described above but from
explanation looks like the impersonation tickets will be used by Drillbit's
on Tenant A to restrict the MapR platform access by a limited set of
Drillbit authenticated user. Using this any user in Tenant B will not be
able to execute query on Tenant A even though it can be authenticated
successfully by the Drillbit in Tenant A. This way authorization check is
done at data layer.

@Alex,
Adding an authorization check for a valid authenticated cluster user
shouldn't be a big change. Based on a configured set's of users/group a
subset of cluster user can be allowed to connect. But can you please point
to how other services do these authorization checks when running in multi
tenant environment ? Based on my understanding all these authorization
check in Hadoop system are done at data layer or they have a separate
security service which does these checks along with other security checks
for authentication, etc.

Also please feel free to open a JIRA ticket with details.

Thanks,
Sorabh



On Fri, Aug 17, 2018 at 11:21 AM, Oleksandr Kalinin <al...@gmail.com>
wrote:

> Hi Sorabh,
>
> Thanks for you comments. Joel described in detail our current thinking on
> how to overcome the issue. We are not yet 100% sure if it will actually
> work though.
>
> Actually I raised this topic in this mailing list because I think it's not
> only specific to our setup. It's more about having nice "Drill on YARN"
> feature with very limited (frankly, no) access control which almost makes
> the feature unusable in environments where it is attractive - multi tenant
> secure clusters. Supported security mechanisms are good for authentication,
> but using them for authorization seems suboptimal. Typically, YARN clusters
> run in single Kerberos realm and the need to introduce multiple realms and
> separate identities for Drill service is not at all convenient (I am pretty
> sure that in many environments like ours it is a no go). And how about use
> cases with no Kerberos setup? If we can workaround access control by
> MapR-specific security tickets like described by Joel - good for us, but
> what about other environments?
>
> So the question is more whether it make sense to consider introducing user
> authorization feature. This thread refers only to session authorization to
> complement YARN feature, but it could be extendable of course, e.g. in
> similar ways like Drill already supports multiple authentication
> mechanisms.
>
> Thanks & Best Regards,
> Alex
>
> On Wed, Aug 15, 2018 at 11:30 PM, Sorabh Hamirwasia <sh...@mapr.com>
> wrote:
>
> > Hi Oleksandr,
> > Drill doesn't do any user management in itself, instead relies on the
> > corresponding security mechanisms in use to do it. It uses SASL framework
> > to allow using different pluggable security mechanisms. So it should be
> > upon the security mechanism in use to do the authorization level checks.
> > For example in your use case if you want to allow only certain set's of
> > users to connect to a cluster then you can choose to use Kerberos with
> each
> > cluster running in different realms. This will ensure client users
> running
> > in corresponding realm can only connect to cluster running in that realm.
> >
> > For the impersonation issue I think it's a configuration issue and the
> > behavior is expected where all queries whether from user A or B are
> > executed as admin users.
> >
> > Thanks,
> > Sorabh
> >
> > On Mon, Aug 13, 2018 at 9:02 AM, Oleksandr Kalinin <al...@gmail.com>
> > wrote:
> >
> > > Hello Drill community,
> > >
> > > In multi-tenant YARN clusters, running multiple Drill-on-YARN clusters
> > > seems as attractive feature as it enables leveraging on YARN mechanisms
> > of
> > > resource management and isolation. However, there seems to be simple
> > access
> > > restriction issue. Assume :
> > >
> > > - Cluster A launched by user X
> > > - Cluster B launched by user Y
> > >
> > > Both users X and Y will be able to connect and run queries against
> > clusters
> > > A and B (in fact, that applies to any positively authenticated user,
> not
> > > only X and Y). Whereas we obviously would like to ensure exclusive
> usage
> > of
> > > clusters by their owners - who are owners of respective YARN resources.
> > In
> > > case users X and Y are non-privileged DFS users and impersonation is
> not
> > > enabled, then user A has access to data on behalf of user B and vice
> > versa
> > > which is additional potential security issue.
> > >
> > > I was looking for possibilities to control connect authorization, but
> > > couldn't find anything related yet. Do I miss something maybe? Are
> there
> > > any other considerations, perhaps this point was already discussed
> > before?
> > >
> > > It could be possible to tweak PAM setup to trigger authentication
> failure
> > > for "undesired" users but that looks like an overkill in terms of
> > > complexity.
> > >
> > > From user perspective, basic ACL configuration with users and groups
> > > authorized to connect to Drillbit would already be sufficient IMO. Or
> > > configuration switch to ensure that only owner user is authorized to
> > > connect.
> > >
> > > Best Regards,
> > > Alex
> > >
> >
>

Re: Drillbit client connect authorization

Posted by Oleksandr Kalinin <al...@gmail.com>.
Hi Sorabh,

Thanks for you comments. Joel described in detail our current thinking on
how to overcome the issue. We are not yet 100% sure if it will actually
work though.

Actually I raised this topic in this mailing list because I think it's not
only specific to our setup. It's more about having nice "Drill on YARN"
feature with very limited (frankly, no) access control which almost makes
the feature unusable in environments where it is attractive - multi tenant
secure clusters. Supported security mechanisms are good for authentication,
but using them for authorization seems suboptimal. Typically, YARN clusters
run in single Kerberos realm and the need to introduce multiple realms and
separate identities for Drill service is not at all convenient (I am pretty
sure that in many environments like ours it is a no go). And how about use
cases with no Kerberos setup? If we can workaround access control by
MapR-specific security tickets like described by Joel - good for us, but
what about other environments?

So the question is more whether it make sense to consider introducing user
authorization feature. This thread refers only to session authorization to
complement YARN feature, but it could be extendable of course, e.g. in
similar ways like Drill already supports multiple authentication mechanisms.

Thanks & Best Regards,
Alex

On Wed, Aug 15, 2018 at 11:30 PM, Sorabh Hamirwasia <sh...@mapr.com>
wrote:

> Hi Oleksandr,
> Drill doesn't do any user management in itself, instead relies on the
> corresponding security mechanisms in use to do it. It uses SASL framework
> to allow using different pluggable security mechanisms. So it should be
> upon the security mechanism in use to do the authorization level checks.
> For example in your use case if you want to allow only certain set's of
> users to connect to a cluster then you can choose to use Kerberos with each
> cluster running in different realms. This will ensure client users running
> in corresponding realm can only connect to cluster running in that realm.
>
> For the impersonation issue I think it's a configuration issue and the
> behavior is expected where all queries whether from user A or B are
> executed as admin users.
>
> Thanks,
> Sorabh
>
> On Mon, Aug 13, 2018 at 9:02 AM, Oleksandr Kalinin <al...@gmail.com>
> wrote:
>
> > Hello Drill community,
> >
> > In multi-tenant YARN clusters, running multiple Drill-on-YARN clusters
> > seems as attractive feature as it enables leveraging on YARN mechanisms
> of
> > resource management and isolation. However, there seems to be simple
> access
> > restriction issue. Assume :
> >
> > - Cluster A launched by user X
> > - Cluster B launched by user Y
> >
> > Both users X and Y will be able to connect and run queries against
> clusters
> > A and B (in fact, that applies to any positively authenticated user, not
> > only X and Y). Whereas we obviously would like to ensure exclusive usage
> of
> > clusters by their owners - who are owners of respective YARN resources.
> In
> > case users X and Y are non-privileged DFS users and impersonation is not
> > enabled, then user A has access to data on behalf of user B and vice
> versa
> > which is additional potential security issue.
> >
> > I was looking for possibilities to control connect authorization, but
> > couldn't find anything related yet. Do I miss something maybe? Are there
> > any other considerations, perhaps this point was already discussed
> before?
> >
> > It could be possible to tweak PAM setup to trigger authentication failure
> > for "undesired" users but that looks like an overkill in terms of
> > complexity.
> >
> > From user perspective, basic ACL configuration with users and groups
> > authorized to connect to Drillbit would already be sufficient IMO. Or
> > configuration switch to ensure that only owner user is authorized to
> > connect.
> >
> > Best Regards,
> > Alex
> >
>

Re: Drillbit client connect authorization

Posted by Joel Pfaff <jo...@gmail.com>.
Hello,

After discussing this topic with Alex, this is what we are trying to do.
We are using MapR, with different tenants for different functional teams,
each of these being represented by a dedicated LDAP group.
We are planning to deploy Drill on Yarn, so that each group can deploy its
own drill cluster, that will take resources from their own tenant's
resources.

So there will be a User Group A launching a Drill cluster A running on a
tenant A.
And a User Group B launching a Drill cluster B running on a tenant B.

We plan to use impersonation, so that Drill cannot be used to read data
that the user account is not allowed to access.
With the way the authentication works, nothing will prevent users from the
group A to run queries on the Drill cluster deployed on the tenant B.
Since the user from the group A will have a valid authentication, the Drill
cluster B will happily process the query.
So while this setup does prevent privilege escalation, it does not prevent
one group from stealing resources from another group.

As a work-around, we plan to use service with impersonation tickets, with
restricted groups. So that the Drill cluster running on the tenant A, can
only ever support the impersonation to the group A.
Is it a supported setup, are you aware of anyone using this kind of setup?

Regards, Joel


On Wed, Aug 15, 2018 at 11:30 PM Sorabh Hamirwasia <sh...@mapr.com>
wrote:

> Hi Oleksandr,
> Drill doesn't do any user management in itself, instead relies on the
> corresponding security mechanisms in use to do it. It uses SASL framework
> to allow using different pluggable security mechanisms. So it should be
> upon the security mechanism in use to do the authorization level checks.
> For example in your use case if you want to allow only certain set's of
> users to connect to a cluster then you can choose to use Kerberos with each
> cluster running in different realms. This will ensure client users running
> in corresponding realm can only connect to cluster running in that realm.
>
> For the impersonation issue I think it's a configuration issue and the
> behavior is expected where all queries whether from user A or B are
> executed as admin users.
>
> Thanks,
> Sorabh
>
> On Mon, Aug 13, 2018 at 9:02 AM, Oleksandr Kalinin <al...@gmail.com>
> wrote:
>
> > Hello Drill community,
> >
> > In multi-tenant YARN clusters, running multiple Drill-on-YARN clusters
> > seems as attractive feature as it enables leveraging on YARN mechanisms
> of
> > resource management and isolation. However, there seems to be simple
> access
> > restriction issue. Assume :
> >
> > - Cluster A launched by user X
> > - Cluster B launched by user Y
> >
> > Both users X and Y will be able to connect and run queries against
> clusters
> > A and B (in fact, that applies to any positively authenticated user, not
> > only X and Y). Whereas we obviously would like to ensure exclusive usage
> of
> > clusters by their owners - who are owners of respective YARN resources.
> In
> > case users X and Y are non-privileged DFS users and impersonation is not
> > enabled, then user A has access to data on behalf of user B and vice
> versa
> > which is additional potential security issue.
> >
> > I was looking for possibilities to control connect authorization, but
> > couldn't find anything related yet. Do I miss something maybe? Are there
> > any other considerations, perhaps this point was already discussed
> before?
> >
> > It could be possible to tweak PAM setup to trigger authentication failure
> > for "undesired" users but that looks like an overkill in terms of
> > complexity.
> >
> > From user perspective, basic ACL configuration with users and groups
> > authorized to connect to Drillbit would already be sufficient IMO. Or
> > configuration switch to ensure that only owner user is authorized to
> > connect.
> >
> > Best Regards,
> > Alex
> >
>

Re: Drillbit client connect authorization

Posted by Sorabh Hamirwasia <sh...@mapr.com>.
Hi Oleksandr,
Drill doesn't do any user management in itself, instead relies on the
corresponding security mechanisms in use to do it. It uses SASL framework
to allow using different pluggable security mechanisms. So it should be
upon the security mechanism in use to do the authorization level checks.
For example in your use case if you want to allow only certain set's of
users to connect to a cluster then you can choose to use Kerberos with each
cluster running in different realms. This will ensure client users running
in corresponding realm can only connect to cluster running in that realm.

For the impersonation issue I think it's a configuration issue and the
behavior is expected where all queries whether from user A or B are
executed as admin users.

Thanks,
Sorabh

On Mon, Aug 13, 2018 at 9:02 AM, Oleksandr Kalinin <al...@gmail.com>
wrote:

> Hello Drill community,
>
> In multi-tenant YARN clusters, running multiple Drill-on-YARN clusters
> seems as attractive feature as it enables leveraging on YARN mechanisms of
> resource management and isolation. However, there seems to be simple access
> restriction issue. Assume :
>
> - Cluster A launched by user X
> - Cluster B launched by user Y
>
> Both users X and Y will be able to connect and run queries against clusters
> A and B (in fact, that applies to any positively authenticated user, not
> only X and Y). Whereas we obviously would like to ensure exclusive usage of
> clusters by their owners - who are owners of respective YARN resources. In
> case users X and Y are non-privileged DFS users and impersonation is not
> enabled, then user A has access to data on behalf of user B and vice versa
> which is additional potential security issue.
>
> I was looking for possibilities to control connect authorization, but
> couldn't find anything related yet. Do I miss something maybe? Are there
> any other considerations, perhaps this point was already discussed before?
>
> It could be possible to tweak PAM setup to trigger authentication failure
> for "undesired" users but that looks like an overkill in terms of
> complexity.
>
> From user perspective, basic ACL configuration with users and groups
> authorized to connect to Drillbit would already be sufficient IMO. Or
> configuration switch to ensure that only owner user is authorized to
> connect.
>
> Best Regards,
> Alex
>