You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by John Omernik <jo...@omernik.com> on 2015/11/06 15:57:56 UTC

REFRESH TABLE METADATA - Access Denied

I ran REFRESH TABLE METADATA on a table, it completed successfully.

When I tried a subsequent query, I get a IOException: Permission Denied on
.drill.parquet_metadata.

I am running drill with authentication.  I ran the REFRESH TABLE METADATA
as user X, it appears the .drill.parquet_metadata was created and owned by
the user the drill bits are running as as is created with -rwxr-x-r-x

My question is this: So, I can see why the file is owned by the drill bit
user, and the file is created with all can read permissions, but why am I
getting a permission denied when user X is trying to run a query?

Re: REFRESH TABLE METADATA - Access Denied

Posted by John Omernik <jo...@omernik.com>.
https://issues.apache.org/jira/browse/DRILL-4143


On Mon, Feb 15, 2016 at 1:30 PM, Neeraja Rentachintala <
nrentachintala@maprtech.com> wrote:

> John
> What is the JIRA# where you are adding more info.
>
> -thanks
>
> On Mon, Feb 15, 2016 at 11:10 AM, John Omernik <jo...@omernik.com> wrote:
>
> > Arg, this problem is crazy. (I'll put this in the JIRA too)  So after
> > waiting a while, and loading more data. I tried to refresh table metadata
> > on the table, using the dataadm user (basically the user who owns the
> > data). Note all directories and files are owned by dataadm:dataadm and
> the
> > permissions are 770.  This worked before, but this time, when I ran
> >
> > REFRESH TABLE METADATA mytable;
> >
> > I get
> >
> > "false| Error: 2126.29602.2546226
> > /data/prod/mytable/2015-011-12/.drill.parquet_metadata (Permission
> > denied)12:44
> >
> > This is the SAME shell where I ran it before, and I loaded more data
> (note
> > the directory in question was already loaded, that was no touched).
> >
> > Then I use the find command to remove all the .drill.parquet_metadata
> > files. and run the REFRESH TABLE METADATA command again:
> >
> > This time the command works. Great.
> >
> > If I run it again, right after: It runs successfully again.
> >
> > 12:35  Ran it a third time, and it worked.
> > 12:37 Ran it a fourth time: and it worked. (Note all the parquet_metadata
> > files are owned by my drillbituser: drillbitgroup (in this case,
> mapr:mapr)
> > despite the meta operation being done by the data owner.
> > 12:39 Another process *running as dataadm* loaded a new day of data
> > (2016-02-12)  No other data was altered here.
> > 12:40 Ran REFRESH TABLE METADATA a fifth time: Got the error. Maybe it
> has
> > to do with adding data? Error on 2015-11-12 again....
> > 12:41 A new Process loaded more data.  (2016-02-11, and 2016-02-10
> loaded)
> > Process completes succesfully, disabled at this time. for troubleshooting
> > (not more data being loaded)
> > 12:42 Attempt REFRESH TABLE METADATA again, same error on 2015-11-12
> > 12:43 Removed all .drill.parquet_metadata files using find command
> > 12:44 Ran REFRESH TABLE METADATA - This time ran with success.  Will now
> > run and check without data loading. May have to do with data loading...
> > 12:52 Ran REFRESH: Success
> > 12:58 Ran REFRESH: Success
> > 1:00 Forced Reload of 2016-02-15.  Basically making it so the folder
> > "2016-02-15" did not have a .drill.parquet_metadata file (while the other
> > days did)
> > 1:01 Ran REFRESH : Error: 2126.27460.2555888
> > /data/prod/mytable/2015-11-12/.drill.parquet_metadata (Permission denied)
> > (Same file, not sure why it picks on this file, nothing is changed there)
> > (Even validated, no files modifed since 12:58 when the parquet_metadata
> > file was modified, all parquet files still have the same modified times
> of
> > when they were loaded, Feb 9th)
> >
> >
> >
> > So thoughts:
> >
> > 1. When running REFRESH TABLE METADATA, it checks to see if all the files
> > in the subdirectories exist, if they don't it starts to "do things"
> > 2. The date 2015-11-12 probably keeps coming out is because it's first in
> > .drill.parquet_metadata located in /mytable (not in the individual
> > directories)
> > 3. After the REFRESH failed, I checked some files.
> > 2015-11-12/.drill.parquet_metadata was a 0 size files. (Like it was
> > attempted to be rewritten and failed) Looking in 2016-11-13, the
> > .drill.parquet_metadata file has data in it.
> > 4. To test #3, I rm .drill.parquet_metadata from 2015-11-12, and run the
> > refresh command again. Interesting... when I do that, I get permissioned
> > denied on the 2015-11-12 directory again, this time, intead of the file
> > owned by the driillbit user (and having the drillbit user group, in this
> > case mapr)  I have a file of 0 bytes, with "dataadm:datareaders"  as the
> > owner. That's interesting... shouldn't it be mapr:mapr (the drillbit
> user?)
> >
> > So this seems to be the crux of the issue... what should happen here? all
> > metadata operations be checked to see if the user issuing it has
> > permissions, and then writes happening as the drillbit user?  Any other
> > thoughts here?
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Mon, Feb 15, 2016 at 10:20 AM, John Omernik <jo...@omernik.com> wrote:
> >
> > > So I am not sure what's happened here. The JIRA isn't filled out, but I
> > > can't seem to reproduce the problem. Was this stealth fixed? Based on
> > some
> > > testing, even when the data directory is owned by a different user than
> > the
> > > drillbit, the .parquet_metadata files are created as mapr:mapr with 755
> > > permissions.  And when it refreshes now, there are no errors.  So Maybe
> > all
> > > fixed?
> > >
> > > Thanks
> > >
> > > On Sun, Feb 14, 2016 at 2:20 PM, John Omernik <jo...@omernik.com>
> wrote:
> > >
> > >> I'd like to revive this thread. Specifically, what should the expect
> > >> behavior of the refresh metadata be when running with impersonation?
> > >>
> > >> Drill Bit User: mapr
> > >> Data User (owner): jdoe
> > >> Authenticated User: jdoe
> > >>
> > >> So if a base folder, mytable, has subdirectories of dates, 2015-01-01,
> > >> 2015-01-02 etc. And all the data is owned by jdoe:datareaders, and the
> > >> permissions are 750 on all directories and files, how SHOULD the
> REFRESH
> > >> METADATA command be expected to operated if run in sqlline
> > authenticated as
> > >> jdoe? (What will the permissions on the metadata files be etc)
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> On Mon, Nov 30, 2015 at 10:16 AM, Jacques Nadeau <ja...@dremio.com>
> > >> wrote:
> > >>
> > >>> >
> > >>> > The output from Drill and the Markup interpreter on Jira apparently
> > >>> had a
> > >>> > family argument at Thanksgiving, and don't agree on all things...
> > >>>
> > >>>
> > >>> Made my morning :)
> > >>>
> > >>
> > >>
> > >
> >
>

Re: REFRESH TABLE METADATA - Access Denied

Posted by Neeraja Rentachintala <nr...@maprtech.com>.
John
What is the JIRA# where you are adding more info.

-thanks

On Mon, Feb 15, 2016 at 11:10 AM, John Omernik <jo...@omernik.com> wrote:

> Arg, this problem is crazy. (I'll put this in the JIRA too)  So after
> waiting a while, and loading more data. I tried to refresh table metadata
> on the table, using the dataadm user (basically the user who owns the
> data). Note all directories and files are owned by dataadm:dataadm and the
> permissions are 770.  This worked before, but this time, when I ran
>
> REFRESH TABLE METADATA mytable;
>
> I get
>
> "false| Error: 2126.29602.2546226
> /data/prod/mytable/2015-011-12/.drill.parquet_metadata (Permission
> denied)12:44
>
> This is the SAME shell where I ran it before, and I loaded more data (note
> the directory in question was already loaded, that was no touched).
>
> Then I use the find command to remove all the .drill.parquet_metadata
> files. and run the REFRESH TABLE METADATA command again:
>
> This time the command works. Great.
>
> If I run it again, right after: It runs successfully again.
>
> 12:35  Ran it a third time, and it worked.
> 12:37 Ran it a fourth time: and it worked. (Note all the parquet_metadata
> files are owned by my drillbituser: drillbitgroup (in this case, mapr:mapr)
> despite the meta operation being done by the data owner.
> 12:39 Another process *running as dataadm* loaded a new day of data
> (2016-02-12)  No other data was altered here.
> 12:40 Ran REFRESH TABLE METADATA a fifth time: Got the error. Maybe it has
> to do with adding data? Error on 2015-11-12 again....
> 12:41 A new Process loaded more data.  (2016-02-11, and 2016-02-10 loaded)
> Process completes succesfully, disabled at this time. for troubleshooting
> (not more data being loaded)
> 12:42 Attempt REFRESH TABLE METADATA again, same error on 2015-11-12
> 12:43 Removed all .drill.parquet_metadata files using find command
> 12:44 Ran REFRESH TABLE METADATA - This time ran with success.  Will now
> run and check without data loading. May have to do with data loading...
> 12:52 Ran REFRESH: Success
> 12:58 Ran REFRESH: Success
> 1:00 Forced Reload of 2016-02-15.  Basically making it so the folder
> "2016-02-15" did not have a .drill.parquet_metadata file (while the other
> days did)
> 1:01 Ran REFRESH : Error: 2126.27460.2555888
> /data/prod/mytable/2015-11-12/.drill.parquet_metadata (Permission denied)
> (Same file, not sure why it picks on this file, nothing is changed there)
> (Even validated, no files modifed since 12:58 when the parquet_metadata
> file was modified, all parquet files still have the same modified times of
> when they were loaded, Feb 9th)
>
>
>
> So thoughts:
>
> 1. When running REFRESH TABLE METADATA, it checks to see if all the files
> in the subdirectories exist, if they don't it starts to "do things"
> 2. The date 2015-11-12 probably keeps coming out is because it's first in
> .drill.parquet_metadata located in /mytable (not in the individual
> directories)
> 3. After the REFRESH failed, I checked some files.
> 2015-11-12/.drill.parquet_metadata was a 0 size files. (Like it was
> attempted to be rewritten and failed) Looking in 2016-11-13, the
> .drill.parquet_metadata file has data in it.
> 4. To test #3, I rm .drill.parquet_metadata from 2015-11-12, and run the
> refresh command again. Interesting... when I do that, I get permissioned
> denied on the 2015-11-12 directory again, this time, intead of the file
> owned by the driillbit user (and having the drillbit user group, in this
> case mapr)  I have a file of 0 bytes, with "dataadm:datareaders"  as the
> owner. That's interesting... shouldn't it be mapr:mapr (the drillbit user?)
>
> So this seems to be the crux of the issue... what should happen here? all
> metadata operations be checked to see if the user issuing it has
> permissions, and then writes happening as the drillbit user?  Any other
> thoughts here?
>
>
>
>
>
>
>
>
>
> On Mon, Feb 15, 2016 at 10:20 AM, John Omernik <jo...@omernik.com> wrote:
>
> > So I am not sure what's happened here. The JIRA isn't filled out, but I
> > can't seem to reproduce the problem. Was this stealth fixed? Based on
> some
> > testing, even when the data directory is owned by a different user than
> the
> > drillbit, the .parquet_metadata files are created as mapr:mapr with 755
> > permissions.  And when it refreshes now, there are no errors.  So Maybe
> all
> > fixed?
> >
> > Thanks
> >
> > On Sun, Feb 14, 2016 at 2:20 PM, John Omernik <jo...@omernik.com> wrote:
> >
> >> I'd like to revive this thread. Specifically, what should the expect
> >> behavior of the refresh metadata be when running with impersonation?
> >>
> >> Drill Bit User: mapr
> >> Data User (owner): jdoe
> >> Authenticated User: jdoe
> >>
> >> So if a base folder, mytable, has subdirectories of dates, 2015-01-01,
> >> 2015-01-02 etc. And all the data is owned by jdoe:datareaders, and the
> >> permissions are 750 on all directories and files, how SHOULD the REFRESH
> >> METADATA command be expected to operated if run in sqlline
> authenticated as
> >> jdoe? (What will the permissions on the metadata files be etc)
> >>
> >>
> >>
> >>
> >>
> >> On Mon, Nov 30, 2015 at 10:16 AM, Jacques Nadeau <ja...@dremio.com>
> >> wrote:
> >>
> >>> >
> >>> > The output from Drill and the Markup interpreter on Jira apparently
> >>> had a
> >>> > family argument at Thanksgiving, and don't agree on all things...
> >>>
> >>>
> >>> Made my morning :)
> >>>
> >>
> >>
> >
>

Re: REFRESH TABLE METADATA - Access Denied

Posted by John Omernik <jo...@omernik.com>.
Arg, this problem is crazy. (I'll put this in the JIRA too)  So after
waiting a while, and loading more data. I tried to refresh table metadata
on the table, using the dataadm user (basically the user who owns the
data). Note all directories and files are owned by dataadm:dataadm and the
permissions are 770.  This worked before, but this time, when I ran

REFRESH TABLE METADATA mytable;

I get

"false| Error: 2126.29602.2546226
/data/prod/mytable/2015-011-12/.drill.parquet_metadata (Permission
denied)12:44

This is the SAME shell where I ran it before, and I loaded more data (note
the directory in question was already loaded, that was no touched).

Then I use the find command to remove all the .drill.parquet_metadata
files. and run the REFRESH TABLE METADATA command again:

This time the command works. Great.

If I run it again, right after: It runs successfully again.

12:35  Ran it a third time, and it worked.
12:37 Ran it a fourth time: and it worked. (Note all the parquet_metadata
files are owned by my drillbituser: drillbitgroup (in this case, mapr:mapr)
despite the meta operation being done by the data owner.
12:39 Another process *running as dataadm* loaded a new day of data
(2016-02-12)  No other data was altered here.
12:40 Ran REFRESH TABLE METADATA a fifth time: Got the error. Maybe it has
to do with adding data? Error on 2015-11-12 again....
12:41 A new Process loaded more data.  (2016-02-11, and 2016-02-10 loaded)
Process completes succesfully, disabled at this time. for troubleshooting
(not more data being loaded)
12:42 Attempt REFRESH TABLE METADATA again, same error on 2015-11-12
12:43 Removed all .drill.parquet_metadata files using find command
12:44 Ran REFRESH TABLE METADATA - This time ran with success.  Will now
run and check without data loading. May have to do with data loading...
12:52 Ran REFRESH: Success
12:58 Ran REFRESH: Success
1:00 Forced Reload of 2016-02-15.  Basically making it so the folder
"2016-02-15" did not have a .drill.parquet_metadata file (while the other
days did)
1:01 Ran REFRESH : Error: 2126.27460.2555888
/data/prod/mytable/2015-11-12/.drill.parquet_metadata (Permission denied)
(Same file, not sure why it picks on this file, nothing is changed there)
(Even validated, no files modifed since 12:58 when the parquet_metadata
file was modified, all parquet files still have the same modified times of
when they were loaded, Feb 9th)



So thoughts:

1. When running REFRESH TABLE METADATA, it checks to see if all the files
in the subdirectories exist, if they don't it starts to "do things"
2. The date 2015-11-12 probably keeps coming out is because it's first in
.drill.parquet_metadata located in /mytable (not in the individual
directories)
3. After the REFRESH failed, I checked some files.
2015-11-12/.drill.parquet_metadata was a 0 size files. (Like it was
attempted to be rewritten and failed) Looking in 2016-11-13, the
.drill.parquet_metadata file has data in it.
4. To test #3, I rm .drill.parquet_metadata from 2015-11-12, and run the
refresh command again. Interesting... when I do that, I get permissioned
denied on the 2015-11-12 directory again, this time, intead of the file
owned by the driillbit user (and having the drillbit user group, in this
case mapr)  I have a file of 0 bytes, with "dataadm:datareaders"  as the
owner. That's interesting... shouldn't it be mapr:mapr (the drillbit user?)

So this seems to be the crux of the issue... what should happen here? all
metadata operations be checked to see if the user issuing it has
permissions, and then writes happening as the drillbit user?  Any other
thoughts here?









On Mon, Feb 15, 2016 at 10:20 AM, John Omernik <jo...@omernik.com> wrote:

> So I am not sure what's happened here. The JIRA isn't filled out, but I
> can't seem to reproduce the problem. Was this stealth fixed? Based on some
> testing, even when the data directory is owned by a different user than the
> drillbit, the .parquet_metadata files are created as mapr:mapr with 755
> permissions.  And when it refreshes now, there are no errors.  So Maybe all
> fixed?
>
> Thanks
>
> On Sun, Feb 14, 2016 at 2:20 PM, John Omernik <jo...@omernik.com> wrote:
>
>> I'd like to revive this thread. Specifically, what should the expect
>> behavior of the refresh metadata be when running with impersonation?
>>
>> Drill Bit User: mapr
>> Data User (owner): jdoe
>> Authenticated User: jdoe
>>
>> So if a base folder, mytable, has subdirectories of dates, 2015-01-01,
>> 2015-01-02 etc. And all the data is owned by jdoe:datareaders, and the
>> permissions are 750 on all directories and files, how SHOULD the REFRESH
>> METADATA command be expected to operated if run in sqlline authenticated as
>> jdoe? (What will the permissions on the metadata files be etc)
>>
>>
>>
>>
>>
>> On Mon, Nov 30, 2015 at 10:16 AM, Jacques Nadeau <ja...@dremio.com>
>> wrote:
>>
>>> >
>>> > The output from Drill and the Markup interpreter on Jira apparently
>>> had a
>>> > family argument at Thanksgiving, and don't agree on all things...
>>>
>>>
>>> Made my morning :)
>>>
>>
>>
>

Re: REFRESH TABLE METADATA - Access Denied

Posted by John Omernik <jo...@omernik.com>.
So I am not sure what's happened here. The JIRA isn't filled out, but I
can't seem to reproduce the problem. Was this stealth fixed? Based on some
testing, even when the data directory is owned by a different user than the
drillbit, the .parquet_metadata files are created as mapr:mapr with 755
permissions.  And when it refreshes now, there are no errors.  So Maybe all
fixed?

Thanks

On Sun, Feb 14, 2016 at 2:20 PM, John Omernik <jo...@omernik.com> wrote:

> I'd like to revive this thread. Specifically, what should the expect
> behavior of the refresh metadata be when running with impersonation?
>
> Drill Bit User: mapr
> Data User (owner): jdoe
> Authenticated User: jdoe
>
> So if a base folder, mytable, has subdirectories of dates, 2015-01-01,
> 2015-01-02 etc. And all the data is owned by jdoe:datareaders, and the
> permissions are 750 on all directories and files, how SHOULD the REFRESH
> METADATA command be expected to operated if run in sqlline authenticated as
> jdoe? (What will the permissions on the metadata files be etc)
>
>
>
>
>
> On Mon, Nov 30, 2015 at 10:16 AM, Jacques Nadeau <ja...@dremio.com>
> wrote:
>
>> >
>> > The output from Drill and the Markup interpreter on Jira apparently had
>> a
>> > family argument at Thanksgiving, and don't agree on all things...
>>
>>
>> Made my morning :)
>>
>
>

Re: REFRESH TABLE METADATA - Access Denied

Posted by John Omernik <jo...@omernik.com>.
I'd like to revive this thread. Specifically, what should the expect
behavior of the refresh metadata be when running with impersonation?

Drill Bit User: mapr
Data User (owner): jdoe
Authenticated User: jdoe

So if a base folder, mytable, has subdirectories of dates, 2015-01-01,
2015-01-02 etc. And all the data is owned by jdoe:datareaders, and the
permissions are 750 on all directories and files, how SHOULD the REFRESH
METADATA command be expected to operated if run in sqlline authenticated as
jdoe? (What will the permissions on the metadata files be etc)





On Mon, Nov 30, 2015 at 10:16 AM, Jacques Nadeau <ja...@dremio.com> wrote:

> >
> > The output from Drill and the Markup interpreter on Jira apparently had a
> > family argument at Thanksgiving, and don't agree on all things...
>
>
> Made my morning :)
>

Re: REFRESH TABLE METADATA - Access Denied

Posted by Jacques Nadeau <ja...@dremio.com>.
>
> The output from Drill and the Markup interpreter on Jira apparently had a
> family argument at Thanksgiving, and don't agree on all things...


Made my morning :)

Re: REFRESH TABLE METADATA - Access Denied

Posted by John Omernik <jo...@omernik.com>.
I've created https://issues.apache.org/jira/browse/DRILL-4143

The output from Drill and the Markup interpreter on Jira apparently had a
family argument at Thanksgiving, and don't agree on all things... Looking
at the JIRA, while it's not pretty, it still conveys what I am going for.
Please review, and see if I left anything out from this thread, I tried to
summarize and provide a reproduction plan.

On Thu, Nov 26, 2015 at 11:04 PM, Jacques Nadeau <ja...@dremio.com> wrote:

> Yes, please do.
> On Nov 25, 2015 7:07 AM, "John Omernik" <jo...@omernik.com> wrote:
>
> > Should we do a JIRA on this? It seems important...
> >
> > On Wed, Nov 11, 2015 at 5:15 PM, John Omernik <jo...@omernik.com> wrote:
> >
> > > For me it's very strange. If I delete all the .drill.parquet_metadata
> > > files, I can create and then run a query.  I can wait 5 minutes, and
> come
> > > back and run the same query, and then I get the permission denied, if I
> > try
> > > to run the REFRESH METADATA again, then it too fails with permission
> > denied
> > > until I erase all the files.
> > >
> > > What is strange here is the .drill.parquet_metadata file is owned by
> the
> > > drillbit user, and has rwxr-xr-x.  Thus, based on those permissions,
> the
> > > nondrillbit user STILL should be able to read the file with no issues.
> > >  (This is not something that your last bullet describes, instead it's
> > > restricting others from writing, not reading)
> > >
> > > In addition, when I try to run the query, it appears that the
> > non-drillbit
> > > user is trying to issue a file create, and per Keys, it's already there
> > > (and they don't have permissions to write).
> > >
> > > There are a number of things that are not happening correctly then
> based
> > > on your understanding/description of what's happening
> > >
> > > 1. The file that is created is not limited in reading to the drillbit
> > user
> > > 2. When a query is run, the file is not accessed by the drillbit user,
> > > it's not even accessed by the authenticated user, instead the
> > authenticated
> > > user tries to overwrite the file (which makes very little sense to me
> on
> > a
> > > select query)
> > >
> > > The only thing that is (apparently) happening correctly is the initial
> > > REFRESH command is creating the files as the drillbit user, however,
> > > subsequent operations don't seem to be working right... so I am not
> sure
> > if
> > > that is a 3rd bullet in the "things that appear broken" list.
> > >
> > > Using the Drill Audit logs was very helpful here, if there is anything
> > > else I can do to help test/troubleshoot this, let me know.
> > >
> > >
> > >
> > >
> > > On Wed, Nov 11, 2015 at 4:54 PM, Vince Gonzalez <
> > vince.gonzalez@gmail.com>
> > > wrote:
> > >
> > >> Ok, I'm seeing the behavior you describe except for the last bullet -
> > the
> > >> permissions on the file would allow for anyone to read the cache file.
> > >>
> > >> $ ls -la
> > >> total 3499
> > >> drwxr-xr-x 2 ec2-user ec2-user       5 Nov 11 21:18 .
> > >> drwxrwxrwx 4 ec2-user ec2-user       2 Nov 11 21:18 ..
> > >> -rwxr-xr-x 1 ec2-user ec2-user 1068250 Nov 11 21:18 1_0_0.parquet
> > >> -rwxr-xr-x 1 ec2-user ec2-user  789341 Nov 11 21:18 1_1_0.parquet
> > >> -rwxr-xr-x 1 ec2-user ec2-user  952667 Nov 11 21:18 1_2_0.parquet
> > >> -rwxr-xr-x 1 ec2-user ec2-user  755805 Nov 11 21:18 1_3_0.parquet
> > >> *-rwxr-xr-x 1 mapr     mapr       14033 Nov 11 21:18
> > >> .drill.parquet_metadata*
> > >>
> > >> On Wed, Nov 11, 2015 at 5:29 PM, Neeraja Rentachintala <
> > >> nrentachintala@maprtech.com> wrote:
> > >>
> > >> > John, Vince
> > >> > I am little confused by this email thread.
> > >> > From the original description by John, I thought that the issue
> > refresh
> > >> > metadata command is running successfully (and the cache is created
> > with
> > >> the
> > >> > Drillbit user as owner) , but at query time it fails for any user
> > (even
> > >> > though the user has permissions on the directory/dataset).
> > >> >
> > >> > Per the latest discussion, it seems like you are hitting permission
> > >> denied
> > >> > when running 'refresh metadata' command itself.
> > >> >
> > >> > Just wanted to share what I think the right behavior here is. Feel
> > free
> > >> to
> > >> > comment.
> > >> >
> > >> > - When Refresh metadata command is run, the cache files get created
> > with
> > >> > drillbit user as the owner (irrespective of whoever is running the
> > >> command
> > >> > and impersonation is turned on)
> > >> > - When a select query comes in on the table , the corresponding
> cache
> > >> file
> > >> > is always accessed as drillbit user (irrespective of whoever is
> > running
> > >> the
> > >> > command and impersonation is turned on)
> > >> > - The cache file created through refresh metadata command should
> > >> restrict
> > >> > access to any other users other than the drillbit user (so there is
> no
> > >> > leakage of metadata for someone going to file system opening the
> file
> > >> i.e
> > >> > cache is Drill's internal planning purposes and not meant as user
> > level
> > >> > cache).
> > >> >
> > >> > If the above is not happening, it seems like a bug.
> > >> >
> > >> > thanks
> > >> > Neeraja
> > >> >
> > >> > On Wed, Nov 11, 2015 at 2:07 PM, kbotzum <kb...@maprtech.com>
> > wrote:
> > >> >
> > >> > > MapR audit records print the errno value to indicate
> > success/failure.
> > >> > Thus
> > >> > > status 17 means errno 17 which means EEXIST. Looks like Drill is
> > >> trying
> > >> > to
> > >> > > create a file that already exists.
> > >> > >
> > >> > > I’ll defer to others as to why Drill might do that.
> > >> > >
> > >> > > Keys
> > >> > > _______________________________
> > >> > > Keys Botzum
> > >> > > Senior Principal Technologist
> > >> > > kbotzum@mapr.com
> > >> > > 443-718-0098
> > >> > > MapR Technologies
> > >> > > http://www.mapr.com
> > >> > >
> > >> > >
> > >> > >
> > >> > > On Nov 11, 2015, at 4:09 PM, John Omernik <jo...@omernik.com>
> wrote:
> > >> > >
> > >> > > > I turned on MapR Auditing (This is a handy feature) and found
> that
> > >> > when I
> > >> > > > run a query (that is giving me access denied.. my query is
> select
> > *
> > >> > from
> > >> > > > table limit 1) Per MapR the user I am logged in as (mapradm) is
> > >> trying
> > >> > to
> > >> > > > do a create operation on the .drill.parquet_metadata operation
> > and I
> > >> > > > guessing it's failing with status: 17 (Not sure what this means,
> > >> > > successes
> > >> > > > appear to be "0".  What was intersting was the "CREATE" being
> > >> attempted
> > >> > > > three times.   Any thoughts on why a select * from tables limit
> 1
> > >> would
> > >> > > try
> > >> > > > to initiate a create operation on the .drill.parquet_metadata
> > file?
> > >> > > >
> > >> > > > On Wed, Nov 11, 2015 at 2:25 PM, John Omernik <john@omernik.com
> >
> > >> > wrote:
> > >> > > >
> > >> > > >> I take it back.
> > >> > > >>
> > >> > > >> I went to run a query, in the same session that had worked, and
> > >> now I
> > >> > am
> > >> > > >> getting permission denied.
> > >> > > >>
> > >> > > >> I do have a query running created new directories every 5
> > minutes,
> > >> > > >> however, these aren't the directories that are giving me
> > permission
> > >> > > denied.
> > >> > > >>  Did you try running an aggregate query accross all data? This
> > is a
> > >> > > >> interesting one to track down, not sure why I am getting the
> > access
> > >> > > denied
> > >> > > >> now,
> > >> > > >>
> > >> > > >> the .drill.parquet_metadata file in the directory that I am
> > getting
> > >> > the
> > >> > > >> error on is owned by mapr:mapr and has rwxr-xr-x  permissions.
> > This
> > >> > > tells
> > >> > > >> me that both the user of the drillbits (mapr) and the user I am
> > >> logged
> > >> > > into
> > >> > > >> in sqlline (mapradm) should be able to read the file... so why
> > do I
> > >> > get
> > >> > > an
> > >> > > >> access denied in running a query. I any assistance would be
> > >> valuable
> > >> > > here
> > >> > > >> in that there are some great performance increases with the
> > >> metadata
> > >> > > >> caching, and I don't want to miss out on that.
> > >> > > >>
> > >> > > >> On Wed, Nov 11, 2015 at 2:18 PM, John Omernik <
> john@omernik.com>
> > >> > wrote:
> > >> > > >>
> > >> > > >>> All files are owned by mapr:mapr?
> > >> > > >>>
> > >> > > >>> I have a setup where mapr is the user running the drillbit,
> but
> > >> then
> > >> > I
> > >> > > >>> have a directory that is owned by a another user.
> > mapradm:mapradm
> > >> on
> > >> > > all
> > >> > > >>> files. (Permissions on directories and files appears to be
> > >> > rwxr-x-r-x)
> > >> > > When
> > >> > > >>> I run the REFRESH TABLE metatdata the .drill.parquet_metadata
> > file
> > >> > gets
> > >> > > >>> created as mapr:mapr with rwxr-xr-x.
> > >> > > >>>
> > >> > > >>> So
> > >> > > >>> Drillbit User:mapr
> > >> > > >>> Directory (and subdirectories/files) owner: mapradm:mapradm
> > >> > > >>> Directory permissions (all files and folder under main
> > directory)
> > >> > > >>> rwxr-x-r-x
> > >> > > >>>
> > >> > > >>> I authenticated to drill via sqlline as user mapradm (this
> user
> > >> > should
> > >> > > be
> > >> > > >>> able to read and write just fine to all directories).
> > >> > > >>>
> > >> > > >>> Now, one thing I did notice is my mapr user was not in the
> > mapradm
> > >> > > group,
> > >> > > >>> therefore, didn't have write permissions anywhere... when I
> > fixed
> > >> > that
> > >> > > on
> > >> > > >>> all nodes, and then I manually deleted the metadatafiles,
> things
> > >> seem
> > >> > > to be
> > >> > > >>> working. I wonder if that was my issue?
> > >> > > >>>
> > >> > > >>> Basically, the user running the drillbits need to be able to
> > write
> > >> > > files
> > >> > > >>> (the .drill.parquet_metadata)  or something bad will happen
> :) I
> > >> will
> > >> > > do
> > >> > > >>> more testing. This may be a good candidate for some
> > documentation
> > >> > work
> > >> > > to
> > >> > > >>> understand what permissions are required to be able to query
> > >> these.
> > >> > > >>>
> > >> > > >>>
> > >> > > >>>
> > >> > > >>>
> > >> > > >>> On Wed, Nov 11, 2015 at 1:36 PM, Vince Gonzalez <
> > >> > > vince.gonzalez@gmail.com
> > >> > > >>>> wrote:
> > >> > > >>>
> > >> > > >>>> Hi John, I tried this and didn't find any issues. Let me know
> > if
> > >> I
> > >> > > didn't
> > >> > > >>>> follow your reproduction faithfully.
> > >> > > >>>>
> > >> > > >>>> $ sqlline -u jdbc:drill: -n ec2-user -p mapr
> > >> > > >>>> apache drill 1.2.0
> > >> > > >>>> "drill baby drill"
> > >> > > >>>> 0: jdbc:drill:> refresh table metadata dfs.`/tmp/flows`;
> > >> > > >>>>
> > +-------+------------------------------------------------------+
> > >> > > >>>> |  ok   |                       summary
> > |
> > >> > > >>>>
> > +-------+------------------------------------------------------+
> > >> > > >>>> | true  | Successfully updated metadata for table /tmp/flows.
> > |
> > >> > > >>>>
> > +-------+------------------------------------------------------+
> > >> > > >>>> 1 row selected (32.27 seconds)
> > >> > > >>>> 0: jdbc:drill:> select srcIP,dstIP from dfs.`/tmp/flows`
> limit
> > >> 12;
> > >> > > >>>> +---------------+---------------+
> > >> > > >>>> |     srcIP     |     dstIP     |
> > >> > > >>>> +---------------+---------------+
> > >> > > >>>> | 172.16.2.152  | 172.16.1.58   |
> > >> > > >>>> | 172.16.1.58   | 172.16.2.152  |
> > >> > > >>>> | 172.16.2.152  | 172.16.2.73   |
> > >> > > >>>> | 172.16.2.152  | 172.16.2.73   |
> > >> > > >>>> | 172.16.2.73   | 172.16.2.152  |
> > >> > > >>>> | 172.16.2.152  | 172.16.2.73   |
> > >> > > >>>> | 172.16.2.152  | 172.16.2.73   |
> > >> > > >>>> | 172.16.2.152  | 172.16.2.73   |
> > >> > > >>>> | 172.16.2.73   | 172.16.2.152  |
> > >> > > >>>> | 172.16.2.73   | 172.16.2.152  |
> > >> > > >>>> | 172.16.2.73   | 172.16.2.152  |
> > >> > > >>>> | 172.16.2.152  | 172.16.2.73   |
> > >> > > >>>> +---------------+---------------+
> > >> > > >>>> 12 rows selected (5.654 seconds)
> > >> > > >>>>
> > >> > > >>>> And here's what my table structure looks like (as seen via
> MapR
> > >> > NFS):
> > >> > > >>>>
> > >> > > >>>> $ tree /mapr/vgonzalez.drill/tmp/flows/ | head -15
> > >> > > >>>> /mapr/vgonzalez.drill/tmp/flows/
> > >> > > >>>> └── 2015
> > >> > > >>>>    └── 11
> > >> > > >>>>        ├── 10
> > >> > > >>>>        │   ├── 21
> > >> > > >>>>        │   │   ├── 39
> > >> > > >>>>        │   │   │   ├── 03
> > >> > > >>>>        │   │   │   │   ├── _common_metadata
> > >> > > >>>>        │   │   │   │   ├── _metadata
> > >> > > >>>>        │   │   │   │   ├──
> > >> > > >>>> part-r-00000-853882bd-66d8-4505-96ba-f0a282e374de.gz.parquet
> > >> > > >>>>        │   │   │   │   └── _SUCCESS
> > >> > > >>>>        │   │   │   └── 20
> > >> > > >>>>        │   │   │       ├── _common_metadata
> > >> > > >>>>        │   │   │       ├── _metadata
> > >> > > >>>>        │   │   │       ├──
> > >> > > >>>> part-r-00000-37a94549-8e56-46d5-be88-cb28e6d8bc35.gz.parquet
> > >> > > >>>>
> > >> > > >>>> My parquet was created in Spark, not Drill. Not sure if
> that's
> > >> > > relevant.
> > >> > > >>>>
> > >> > > >>>> I have authentication and impersonation turned on, and the
> > files
> > >> are
> > >> > > >>>> owned
> > >> > > >>>> by mapr:mapr. Here's my drill-override.conf:
> > >> > > >>>>
> > >> > > >>>> drill.exec: {
> > >> > > >>>>  cluster-id: "vgonzalez_drill-drillbits",
> > >> > > >>>> zk.connect:
> > >> > > >>>>
> > >> > > >>>>
> > >> > >
> > >> >
> > >>
> >
> "ip-172-16-2-36.ec2.internal:5181,ip-172-16-2-37.ec2.internal:5181,ip-172-16-2-38.ec2.internal:5181"
> > >> > > >>>> }
> > >> > > >>>> drill.exec.impersonation: { enabled: true,
> > >> max_chained_user_hops: 3
> > >> > }
> > >> > > >>>> drill.exec { security.user.auth { enabled: true, packages +=
> > >> > > >>>> "org.apache.drill.exec.rpc.user.security", impl: "pam",
> > >> > pam_profiles:
> > >> > > [
> > >> > > >>>> "login","sudo","sshd","password-auth" ] } }
> > >> > > >>>>
> > >> > > >>>>
> > >> > > >>>>
> > >> > > >>>>
> > >> > > >>>>
> > >> > > >>>> On Tue, Nov 10, 2015 at 1:17 PM, John Omernik <
> > john@omernik.com>
> > >> > > wrote:
> > >> > > >>>>
> > >> > > >>>>> Cool, looking forward to it.
> > >> > > >>>>>
> > >> > > >>>>> On Mon, Nov 9, 2015 at 7:21 PM, Vince Gonzalez <
> > >> > > >>>> vince.gonzalez@gmail.com>
> > >> > > >>>>> wrote:
> > >> > > >>>>>
> > >> > > >>>>>> Hey John, I have a secure cluster and some parquet files,
> > I'll
> > >> try
> > >> > > >>>> this
> > >> > > >>>>> out
> > >> > > >>>>>> and report back.
> > >> > > >>>>>>
> > >> > > >>>>>> On Monday, November 9, 2015, John Omernik <
> john@omernik.com>
> > >> > wrote:
> > >> > > >>>>>>
> > >> > > >>>>>>> Has anyone been able to try/test this? I am curious if
> it's
> > me
> > >> > only
> > >> > > >>>>> issue
> > >> > > >>>>>>> or something more of bug so I can open a JIRA if needed.
> > >> > > >>>>>>>
> > >> > > >>>>>>> John
> > >> > > >>>>>>>
> > >> > > >>>>>>> On Fri, Nov 6, 2015 at 11:06 AM, John Omernik <
> > >> john@omernik.com
> > >> > > >>>>>>> <javascript:;>> wrote:
> > >> > > >>>>>>>
> > >> > > >>>>>>>> If someone has authorization/authentication setup, to
> > >> reproduce:
> > >> > > >>>>>>>>
> > >> > > >>>>>>>> Have a Parquet table with directories underneath the main
> > (I
> > >> > have
> > >> > > >>>>>>>> directories per day)
> > >> > > >>>>>>>>
> > >> > > >>>>>>>> Then issue REFRESH TABLE METADATA on the root of the
> table
> > >> > > >>>> running an
> > >> > > >>>>>>>> authenticated user other than the drill bit user. (I am
> > using
> > >> > > >>>> mapr, I
> > >> > > >>>>>>> used
> > >> > > >>>>>>>> my user to run the query, and yes I have access to the
> > data)
> > >> > > >>>>>>>>
> > >> > > >>>>>>>> Then run a normal query and see what the result is. .
> > >> > > >>>>>>>>
> > >> > > >>>>>>>> John
> > >> > > >>>>>>>>
> > >> > > >>>>>>>> On Fri, Nov 6, 2015 at 10:22 AM, Neeraja Rentachintala <
> > >> > > >>>>>>>> nrentachintala@maprtech.com <javascript:;>> wrote:
> > >> > > >>>>>>>>
> > >> > > >>>>>>>>> This doesn't make sense and seems like a bug.
> > >> > > >>>>>>>>> I think the right behavior is for the Drillbit to access
> > the
> > >> > > >>>> cache
> > >> > > >>>>> as
> > >> > > >>>>>>>>> Drillbit user at the query time (there is no user level
> > >> > metadata
> > >> > > >>>>> cache
> > >> > > >>>>>>> in
> > >> > > >>>>>>>>> Drill at this point).
> > >> > > >>>>>>>>>
> > >> > > >>>>>>>>>
> > >> > > >>>>>>>>>
> > >> > > >>>>>>>>> On Fri, Nov 6, 2015 at 6:57 AM, John Omernik <
> > >> john@omernik.com
> > >> > > >>>>>>> <javascript:;>> wrote:
> > >> > > >>>>>>>>>
> > >> > > >>>>>>>>>> I ran REFRESH TABLE METADATA on a table, it completed
> > >> > > >>>>> successfully.
> > >> > > >>>>>>>>>>
> > >> > > >>>>>>>>>> When I tried a subsequent query, I get a IOException:
> > >> > > >>>> Permission
> > >> > > >>>>>>> Denied
> > >> > > >>>>>>>>> on
> > >> > > >>>>>>>>>> .drill.parquet_metadata.
> > >> > > >>>>>>>>>>
> > >> > > >>>>>>>>>> I am running drill with authentication.  I ran the
> > REFRESH
> > >> > > >>>> TABLE
> > >> > > >>>>>>>>> METADATA
> > >> > > >>>>>>>>>> as user X, it appears the .drill.parquet_metadata was
> > >> created
> > >> > > >>>> and
> > >> > > >>>>>>> owned
> > >> > > >>>>>>>>> by
> > >> > > >>>>>>>>>> the user the drill bits are running as as is created
> with
> > >> > > >>>>>> -rwxr-x-r-x
> > >> > > >>>>>>>>>>
> > >> > > >>>>>>>>>> My question is this: So, I can see why the file is
> owned
> > by
> > >> > > >>>> the
> > >> > > >>>>>> drill
> > >> > > >>>>>>>>> bit
> > >> > > >>>>>>>>>> user, and the file is created with all can read
> > >> permissions,
> > >> > > >>>> but
> > >> > > >>>>> why
> > >> > > >>>>>>> am
> > >> > > >>>>>>>>> I
> > >> > > >>>>>>>>>> getting a permission denied when user X is trying to
> run
> > a
> > >> > > >>>> query?
> > >> > > >>>>>>>>>>
> > >> > > >>>>>>>>>
> > >> > > >>>>>>>>
> > >> > > >>>>>>>>
> > >> > > >>>>>>>
> > >> > > >>>>>>
> > >> > > >>>>>
> > >> > > >>>>
> > >> > > >>>
> > >> > > >>>
> > >> > > >>
> > >> > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: REFRESH TABLE METADATA - Access Denied

Posted by Jacques Nadeau <ja...@dremio.com>.
Yes, please do.
On Nov 25, 2015 7:07 AM, "John Omernik" <jo...@omernik.com> wrote:

> Should we do a JIRA on this? It seems important...
>
> On Wed, Nov 11, 2015 at 5:15 PM, John Omernik <jo...@omernik.com> wrote:
>
> > For me it's very strange. If I delete all the .drill.parquet_metadata
> > files, I can create and then run a query.  I can wait 5 minutes, and come
> > back and run the same query, and then I get the permission denied, if I
> try
> > to run the REFRESH METADATA again, then it too fails with permission
> denied
> > until I erase all the files.
> >
> > What is strange here is the .drill.parquet_metadata file is owned by the
> > drillbit user, and has rwxr-xr-x.  Thus, based on those permissions, the
> > nondrillbit user STILL should be able to read the file with no issues.
> >  (This is not something that your last bullet describes, instead it's
> > restricting others from writing, not reading)
> >
> > In addition, when I try to run the query, it appears that the
> non-drillbit
> > user is trying to issue a file create, and per Keys, it's already there
> > (and they don't have permissions to write).
> >
> > There are a number of things that are not happening correctly then based
> > on your understanding/description of what's happening
> >
> > 1. The file that is created is not limited in reading to the drillbit
> user
> > 2. When a query is run, the file is not accessed by the drillbit user,
> > it's not even accessed by the authenticated user, instead the
> authenticated
> > user tries to overwrite the file (which makes very little sense to me on
> a
> > select query)
> >
> > The only thing that is (apparently) happening correctly is the initial
> > REFRESH command is creating the files as the drillbit user, however,
> > subsequent operations don't seem to be working right... so I am not sure
> if
> > that is a 3rd bullet in the "things that appear broken" list.
> >
> > Using the Drill Audit logs was very helpful here, if there is anything
> > else I can do to help test/troubleshoot this, let me know.
> >
> >
> >
> >
> > On Wed, Nov 11, 2015 at 4:54 PM, Vince Gonzalez <
> vince.gonzalez@gmail.com>
> > wrote:
> >
> >> Ok, I'm seeing the behavior you describe except for the last bullet -
> the
> >> permissions on the file would allow for anyone to read the cache file.
> >>
> >> $ ls -la
> >> total 3499
> >> drwxr-xr-x 2 ec2-user ec2-user       5 Nov 11 21:18 .
> >> drwxrwxrwx 4 ec2-user ec2-user       2 Nov 11 21:18 ..
> >> -rwxr-xr-x 1 ec2-user ec2-user 1068250 Nov 11 21:18 1_0_0.parquet
> >> -rwxr-xr-x 1 ec2-user ec2-user  789341 Nov 11 21:18 1_1_0.parquet
> >> -rwxr-xr-x 1 ec2-user ec2-user  952667 Nov 11 21:18 1_2_0.parquet
> >> -rwxr-xr-x 1 ec2-user ec2-user  755805 Nov 11 21:18 1_3_0.parquet
> >> *-rwxr-xr-x 1 mapr     mapr       14033 Nov 11 21:18
> >> .drill.parquet_metadata*
> >>
> >> On Wed, Nov 11, 2015 at 5:29 PM, Neeraja Rentachintala <
> >> nrentachintala@maprtech.com> wrote:
> >>
> >> > John, Vince
> >> > I am little confused by this email thread.
> >> > From the original description by John, I thought that the issue
> refresh
> >> > metadata command is running successfully (and the cache is created
> with
> >> the
> >> > Drillbit user as owner) , but at query time it fails for any user
> (even
> >> > though the user has permissions on the directory/dataset).
> >> >
> >> > Per the latest discussion, it seems like you are hitting permission
> >> denied
> >> > when running 'refresh metadata' command itself.
> >> >
> >> > Just wanted to share what I think the right behavior here is. Feel
> free
> >> to
> >> > comment.
> >> >
> >> > - When Refresh metadata command is run, the cache files get created
> with
> >> > drillbit user as the owner (irrespective of whoever is running the
> >> command
> >> > and impersonation is turned on)
> >> > - When a select query comes in on the table , the corresponding cache
> >> file
> >> > is always accessed as drillbit user (irrespective of whoever is
> running
> >> the
> >> > command and impersonation is turned on)
> >> > - The cache file created through refresh metadata command should
> >> restrict
> >> > access to any other users other than the drillbit user (so there is no
> >> > leakage of metadata for someone going to file system opening the file
> >> i.e
> >> > cache is Drill's internal planning purposes and not meant as user
> level
> >> > cache).
> >> >
> >> > If the above is not happening, it seems like a bug.
> >> >
> >> > thanks
> >> > Neeraja
> >> >
> >> > On Wed, Nov 11, 2015 at 2:07 PM, kbotzum <kb...@maprtech.com>
> wrote:
> >> >
> >> > > MapR audit records print the errno value to indicate
> success/failure.
> >> > Thus
> >> > > status 17 means errno 17 which means EEXIST. Looks like Drill is
> >> trying
> >> > to
> >> > > create a file that already exists.
> >> > >
> >> > > I’ll defer to others as to why Drill might do that.
> >> > >
> >> > > Keys
> >> > > _______________________________
> >> > > Keys Botzum
> >> > > Senior Principal Technologist
> >> > > kbotzum@mapr.com
> >> > > 443-718-0098
> >> > > MapR Technologies
> >> > > http://www.mapr.com
> >> > >
> >> > >
> >> > >
> >> > > On Nov 11, 2015, at 4:09 PM, John Omernik <jo...@omernik.com> wrote:
> >> > >
> >> > > > I turned on MapR Auditing (This is a handy feature) and found that
> >> > when I
> >> > > > run a query (that is giving me access denied.. my query is select
> *
> >> > from
> >> > > > table limit 1) Per MapR the user I am logged in as (mapradm) is
> >> trying
> >> > to
> >> > > > do a create operation on the .drill.parquet_metadata operation
> and I
> >> > > > guessing it's failing with status: 17 (Not sure what this means,
> >> > > successes
> >> > > > appear to be "0".  What was intersting was the "CREATE" being
> >> attempted
> >> > > > three times.   Any thoughts on why a select * from tables limit 1
> >> would
> >> > > try
> >> > > > to initiate a create operation on the .drill.parquet_metadata
> file?
> >> > > >
> >> > > > On Wed, Nov 11, 2015 at 2:25 PM, John Omernik <jo...@omernik.com>
> >> > wrote:
> >> > > >
> >> > > >> I take it back.
> >> > > >>
> >> > > >> I went to run a query, in the same session that had worked, and
> >> now I
> >> > am
> >> > > >> getting permission denied.
> >> > > >>
> >> > > >> I do have a query running created new directories every 5
> minutes,
> >> > > >> however, these aren't the directories that are giving me
> permission
> >> > > denied.
> >> > > >>  Did you try running an aggregate query accross all data? This
> is a
> >> > > >> interesting one to track down, not sure why I am getting the
> access
> >> > > denied
> >> > > >> now,
> >> > > >>
> >> > > >> the .drill.parquet_metadata file in the directory that I am
> getting
> >> > the
> >> > > >> error on is owned by mapr:mapr and has rwxr-xr-x  permissions.
> This
> >> > > tells
> >> > > >> me that both the user of the drillbits (mapr) and the user I am
> >> logged
> >> > > into
> >> > > >> in sqlline (mapradm) should be able to read the file... so why
> do I
> >> > get
> >> > > an
> >> > > >> access denied in running a query. I any assistance would be
> >> valuable
> >> > > here
> >> > > >> in that there are some great performance increases with the
> >> metadata
> >> > > >> caching, and I don't want to miss out on that.
> >> > > >>
> >> > > >> On Wed, Nov 11, 2015 at 2:18 PM, John Omernik <jo...@omernik.com>
> >> > wrote:
> >> > > >>
> >> > > >>> All files are owned by mapr:mapr?
> >> > > >>>
> >> > > >>> I have a setup where mapr is the user running the drillbit, but
> >> then
> >> > I
> >> > > >>> have a directory that is owned by a another user.
> mapradm:mapradm
> >> on
> >> > > all
> >> > > >>> files. (Permissions on directories and files appears to be
> >> > rwxr-x-r-x)
> >> > > When
> >> > > >>> I run the REFRESH TABLE metatdata the .drill.parquet_metadata
> file
> >> > gets
> >> > > >>> created as mapr:mapr with rwxr-xr-x.
> >> > > >>>
> >> > > >>> So
> >> > > >>> Drillbit User:mapr
> >> > > >>> Directory (and subdirectories/files) owner: mapradm:mapradm
> >> > > >>> Directory permissions (all files and folder under main
> directory)
> >> > > >>> rwxr-x-r-x
> >> > > >>>
> >> > > >>> I authenticated to drill via sqlline as user mapradm (this user
> >> > should
> >> > > be
> >> > > >>> able to read and write just fine to all directories).
> >> > > >>>
> >> > > >>> Now, one thing I did notice is my mapr user was not in the
> mapradm
> >> > > group,
> >> > > >>> therefore, didn't have write permissions anywhere... when I
> fixed
> >> > that
> >> > > on
> >> > > >>> all nodes, and then I manually deleted the metadatafiles, things
> >> seem
> >> > > to be
> >> > > >>> working. I wonder if that was my issue?
> >> > > >>>
> >> > > >>> Basically, the user running the drillbits need to be able to
> write
> >> > > files
> >> > > >>> (the .drill.parquet_metadata)  or something bad will happen :) I
> >> will
> >> > > do
> >> > > >>> more testing. This may be a good candidate for some
> documentation
> >> > work
> >> > > to
> >> > > >>> understand what permissions are required to be able to query
> >> these.
> >> > > >>>
> >> > > >>>
> >> > > >>>
> >> > > >>>
> >> > > >>> On Wed, Nov 11, 2015 at 1:36 PM, Vince Gonzalez <
> >> > > vince.gonzalez@gmail.com
> >> > > >>>> wrote:
> >> > > >>>
> >> > > >>>> Hi John, I tried this and didn't find any issues. Let me know
> if
> >> I
> >> > > didn't
> >> > > >>>> follow your reproduction faithfully.
> >> > > >>>>
> >> > > >>>> $ sqlline -u jdbc:drill: -n ec2-user -p mapr
> >> > > >>>> apache drill 1.2.0
> >> > > >>>> "drill baby drill"
> >> > > >>>> 0: jdbc:drill:> refresh table metadata dfs.`/tmp/flows`;
> >> > > >>>>
> +-------+------------------------------------------------------+
> >> > > >>>> |  ok   |                       summary
> |
> >> > > >>>>
> +-------+------------------------------------------------------+
> >> > > >>>> | true  | Successfully updated metadata for table /tmp/flows.
> |
> >> > > >>>>
> +-------+------------------------------------------------------+
> >> > > >>>> 1 row selected (32.27 seconds)
> >> > > >>>> 0: jdbc:drill:> select srcIP,dstIP from dfs.`/tmp/flows` limit
> >> 12;
> >> > > >>>> +---------------+---------------+
> >> > > >>>> |     srcIP     |     dstIP     |
> >> > > >>>> +---------------+---------------+
> >> > > >>>> | 172.16.2.152  | 172.16.1.58   |
> >> > > >>>> | 172.16.1.58   | 172.16.2.152  |
> >> > > >>>> | 172.16.2.152  | 172.16.2.73   |
> >> > > >>>> | 172.16.2.152  | 172.16.2.73   |
> >> > > >>>> | 172.16.2.73   | 172.16.2.152  |
> >> > > >>>> | 172.16.2.152  | 172.16.2.73   |
> >> > > >>>> | 172.16.2.152  | 172.16.2.73   |
> >> > > >>>> | 172.16.2.152  | 172.16.2.73   |
> >> > > >>>> | 172.16.2.73   | 172.16.2.152  |
> >> > > >>>> | 172.16.2.73   | 172.16.2.152  |
> >> > > >>>> | 172.16.2.73   | 172.16.2.152  |
> >> > > >>>> | 172.16.2.152  | 172.16.2.73   |
> >> > > >>>> +---------------+---------------+
> >> > > >>>> 12 rows selected (5.654 seconds)
> >> > > >>>>
> >> > > >>>> And here's what my table structure looks like (as seen via MapR
> >> > NFS):
> >> > > >>>>
> >> > > >>>> $ tree /mapr/vgonzalez.drill/tmp/flows/ | head -15
> >> > > >>>> /mapr/vgonzalez.drill/tmp/flows/
> >> > > >>>> └── 2015
> >> > > >>>>    └── 11
> >> > > >>>>        ├── 10
> >> > > >>>>        │   ├── 21
> >> > > >>>>        │   │   ├── 39
> >> > > >>>>        │   │   │   ├── 03
> >> > > >>>>        │   │   │   │   ├── _common_metadata
> >> > > >>>>        │   │   │   │   ├── _metadata
> >> > > >>>>        │   │   │   │   ├──
> >> > > >>>> part-r-00000-853882bd-66d8-4505-96ba-f0a282e374de.gz.parquet
> >> > > >>>>        │   │   │   │   └── _SUCCESS
> >> > > >>>>        │   │   │   └── 20
> >> > > >>>>        │   │   │       ├── _common_metadata
> >> > > >>>>        │   │   │       ├── _metadata
> >> > > >>>>        │   │   │       ├──
> >> > > >>>> part-r-00000-37a94549-8e56-46d5-be88-cb28e6d8bc35.gz.parquet
> >> > > >>>>
> >> > > >>>> My parquet was created in Spark, not Drill. Not sure if that's
> >> > > relevant.
> >> > > >>>>
> >> > > >>>> I have authentication and impersonation turned on, and the
> files
> >> are
> >> > > >>>> owned
> >> > > >>>> by mapr:mapr. Here's my drill-override.conf:
> >> > > >>>>
> >> > > >>>> drill.exec: {
> >> > > >>>>  cluster-id: "vgonzalez_drill-drillbits",
> >> > > >>>> zk.connect:
> >> > > >>>>
> >> > > >>>>
> >> > >
> >> >
> >>
> "ip-172-16-2-36.ec2.internal:5181,ip-172-16-2-37.ec2.internal:5181,ip-172-16-2-38.ec2.internal:5181"
> >> > > >>>> }
> >> > > >>>> drill.exec.impersonation: { enabled: true,
> >> max_chained_user_hops: 3
> >> > }
> >> > > >>>> drill.exec { security.user.auth { enabled: true, packages +=
> >> > > >>>> "org.apache.drill.exec.rpc.user.security", impl: "pam",
> >> > pam_profiles:
> >> > > [
> >> > > >>>> "login","sudo","sshd","password-auth" ] } }
> >> > > >>>>
> >> > > >>>>
> >> > > >>>>
> >> > > >>>>
> >> > > >>>>
> >> > > >>>> On Tue, Nov 10, 2015 at 1:17 PM, John Omernik <
> john@omernik.com>
> >> > > wrote:
> >> > > >>>>
> >> > > >>>>> Cool, looking forward to it.
> >> > > >>>>>
> >> > > >>>>> On Mon, Nov 9, 2015 at 7:21 PM, Vince Gonzalez <
> >> > > >>>> vince.gonzalez@gmail.com>
> >> > > >>>>> wrote:
> >> > > >>>>>
> >> > > >>>>>> Hey John, I have a secure cluster and some parquet files,
> I'll
> >> try
> >> > > >>>> this
> >> > > >>>>> out
> >> > > >>>>>> and report back.
> >> > > >>>>>>
> >> > > >>>>>> On Monday, November 9, 2015, John Omernik <jo...@omernik.com>
> >> > wrote:
> >> > > >>>>>>
> >> > > >>>>>>> Has anyone been able to try/test this? I am curious if it's
> me
> >> > only
> >> > > >>>>> issue
> >> > > >>>>>>> or something more of bug so I can open a JIRA if needed.
> >> > > >>>>>>>
> >> > > >>>>>>> John
> >> > > >>>>>>>
> >> > > >>>>>>> On Fri, Nov 6, 2015 at 11:06 AM, John Omernik <
> >> john@omernik.com
> >> > > >>>>>>> <javascript:;>> wrote:
> >> > > >>>>>>>
> >> > > >>>>>>>> If someone has authorization/authentication setup, to
> >> reproduce:
> >> > > >>>>>>>>
> >> > > >>>>>>>> Have a Parquet table with directories underneath the main
> (I
> >> > have
> >> > > >>>>>>>> directories per day)
> >> > > >>>>>>>>
> >> > > >>>>>>>> Then issue REFRESH TABLE METADATA on the root of the table
> >> > > >>>> running an
> >> > > >>>>>>>> authenticated user other than the drill bit user. (I am
> using
> >> > > >>>> mapr, I
> >> > > >>>>>>> used
> >> > > >>>>>>>> my user to run the query, and yes I have access to the
> data)
> >> > > >>>>>>>>
> >> > > >>>>>>>> Then run a normal query and see what the result is. .
> >> > > >>>>>>>>
> >> > > >>>>>>>> John
> >> > > >>>>>>>>
> >> > > >>>>>>>> On Fri, Nov 6, 2015 at 10:22 AM, Neeraja Rentachintala <
> >> > > >>>>>>>> nrentachintala@maprtech.com <javascript:;>> wrote:
> >> > > >>>>>>>>
> >> > > >>>>>>>>> This doesn't make sense and seems like a bug.
> >> > > >>>>>>>>> I think the right behavior is for the Drillbit to access
> the
> >> > > >>>> cache
> >> > > >>>>> as
> >> > > >>>>>>>>> Drillbit user at the query time (there is no user level
> >> > metadata
> >> > > >>>>> cache
> >> > > >>>>>>> in
> >> > > >>>>>>>>> Drill at this point).
> >> > > >>>>>>>>>
> >> > > >>>>>>>>>
> >> > > >>>>>>>>>
> >> > > >>>>>>>>> On Fri, Nov 6, 2015 at 6:57 AM, John Omernik <
> >> john@omernik.com
> >> > > >>>>>>> <javascript:;>> wrote:
> >> > > >>>>>>>>>
> >> > > >>>>>>>>>> I ran REFRESH TABLE METADATA on a table, it completed
> >> > > >>>>> successfully.
> >> > > >>>>>>>>>>
> >> > > >>>>>>>>>> When I tried a subsequent query, I get a IOException:
> >> > > >>>> Permission
> >> > > >>>>>>> Denied
> >> > > >>>>>>>>> on
> >> > > >>>>>>>>>> .drill.parquet_metadata.
> >> > > >>>>>>>>>>
> >> > > >>>>>>>>>> I am running drill with authentication.  I ran the
> REFRESH
> >> > > >>>> TABLE
> >> > > >>>>>>>>> METADATA
> >> > > >>>>>>>>>> as user X, it appears the .drill.parquet_metadata was
> >> created
> >> > > >>>> and
> >> > > >>>>>>> owned
> >> > > >>>>>>>>> by
> >> > > >>>>>>>>>> the user the drill bits are running as as is created with
> >> > > >>>>>> -rwxr-x-r-x
> >> > > >>>>>>>>>>
> >> > > >>>>>>>>>> My question is this: So, I can see why the file is owned
> by
> >> > > >>>> the
> >> > > >>>>>> drill
> >> > > >>>>>>>>> bit
> >> > > >>>>>>>>>> user, and the file is created with all can read
> >> permissions,
> >> > > >>>> but
> >> > > >>>>> why
> >> > > >>>>>>> am
> >> > > >>>>>>>>> I
> >> > > >>>>>>>>>> getting a permission denied when user X is trying to run
> a
> >> > > >>>> query?
> >> > > >>>>>>>>>>
> >> > > >>>>>>>>>
> >> > > >>>>>>>>
> >> > > >>>>>>>>
> >> > > >>>>>>>
> >> > > >>>>>>
> >> > > >>>>>
> >> > > >>>>
> >> > > >>>
> >> > > >>>
> >> > > >>
> >> > >
> >> > >
> >> >
> >>
> >
> >
>

Re: REFRESH TABLE METADATA - Access Denied

Posted by John Omernik <jo...@omernik.com>.
Should we do a JIRA on this? It seems important...

On Wed, Nov 11, 2015 at 5:15 PM, John Omernik <jo...@omernik.com> wrote:

> For me it's very strange. If I delete all the .drill.parquet_metadata
> files, I can create and then run a query.  I can wait 5 minutes, and come
> back and run the same query, and then I get the permission denied, if I try
> to run the REFRESH METADATA again, then it too fails with permission denied
> until I erase all the files.
>
> What is strange here is the .drill.parquet_metadata file is owned by the
> drillbit user, and has rwxr-xr-x.  Thus, based on those permissions, the
> nondrillbit user STILL should be able to read the file with no issues.
>  (This is not something that your last bullet describes, instead it's
> restricting others from writing, not reading)
>
> In addition, when I try to run the query, it appears that the non-drillbit
> user is trying to issue a file create, and per Keys, it's already there
> (and they don't have permissions to write).
>
> There are a number of things that are not happening correctly then based
> on your understanding/description of what's happening
>
> 1. The file that is created is not limited in reading to the drillbit user
> 2. When a query is run, the file is not accessed by the drillbit user,
> it's not even accessed by the authenticated user, instead the authenticated
> user tries to overwrite the file (which makes very little sense to me on a
> select query)
>
> The only thing that is (apparently) happening correctly is the initial
> REFRESH command is creating the files as the drillbit user, however,
> subsequent operations don't seem to be working right... so I am not sure if
> that is a 3rd bullet in the "things that appear broken" list.
>
> Using the Drill Audit logs was very helpful here, if there is anything
> else I can do to help test/troubleshoot this, let me know.
>
>
>
>
> On Wed, Nov 11, 2015 at 4:54 PM, Vince Gonzalez <vi...@gmail.com>
> wrote:
>
>> Ok, I'm seeing the behavior you describe except for the last bullet - the
>> permissions on the file would allow for anyone to read the cache file.
>>
>> $ ls -la
>> total 3499
>> drwxr-xr-x 2 ec2-user ec2-user       5 Nov 11 21:18 .
>> drwxrwxrwx 4 ec2-user ec2-user       2 Nov 11 21:18 ..
>> -rwxr-xr-x 1 ec2-user ec2-user 1068250 Nov 11 21:18 1_0_0.parquet
>> -rwxr-xr-x 1 ec2-user ec2-user  789341 Nov 11 21:18 1_1_0.parquet
>> -rwxr-xr-x 1 ec2-user ec2-user  952667 Nov 11 21:18 1_2_0.parquet
>> -rwxr-xr-x 1 ec2-user ec2-user  755805 Nov 11 21:18 1_3_0.parquet
>> *-rwxr-xr-x 1 mapr     mapr       14033 Nov 11 21:18
>> .drill.parquet_metadata*
>>
>> On Wed, Nov 11, 2015 at 5:29 PM, Neeraja Rentachintala <
>> nrentachintala@maprtech.com> wrote:
>>
>> > John, Vince
>> > I am little confused by this email thread.
>> > From the original description by John, I thought that the issue refresh
>> > metadata command is running successfully (and the cache is created with
>> the
>> > Drillbit user as owner) , but at query time it fails for any user (even
>> > though the user has permissions on the directory/dataset).
>> >
>> > Per the latest discussion, it seems like you are hitting permission
>> denied
>> > when running 'refresh metadata' command itself.
>> >
>> > Just wanted to share what I think the right behavior here is. Feel free
>> to
>> > comment.
>> >
>> > - When Refresh metadata command is run, the cache files get created with
>> > drillbit user as the owner (irrespective of whoever is running the
>> command
>> > and impersonation is turned on)
>> > - When a select query comes in on the table , the corresponding cache
>> file
>> > is always accessed as drillbit user (irrespective of whoever is running
>> the
>> > command and impersonation is turned on)
>> > - The cache file created through refresh metadata command should
>> restrict
>> > access to any other users other than the drillbit user (so there is no
>> > leakage of metadata for someone going to file system opening the file
>> i.e
>> > cache is Drill's internal planning purposes and not meant as user level
>> > cache).
>> >
>> > If the above is not happening, it seems like a bug.
>> >
>> > thanks
>> > Neeraja
>> >
>> > On Wed, Nov 11, 2015 at 2:07 PM, kbotzum <kb...@maprtech.com> wrote:
>> >
>> > > MapR audit records print the errno value to indicate success/failure.
>> > Thus
>> > > status 17 means errno 17 which means EEXIST. Looks like Drill is
>> trying
>> > to
>> > > create a file that already exists.
>> > >
>> > > I’ll defer to others as to why Drill might do that.
>> > >
>> > > Keys
>> > > _______________________________
>> > > Keys Botzum
>> > > Senior Principal Technologist
>> > > kbotzum@mapr.com
>> > > 443-718-0098
>> > > MapR Technologies
>> > > http://www.mapr.com
>> > >
>> > >
>> > >
>> > > On Nov 11, 2015, at 4:09 PM, John Omernik <jo...@omernik.com> wrote:
>> > >
>> > > > I turned on MapR Auditing (This is a handy feature) and found that
>> > when I
>> > > > run a query (that is giving me access denied.. my query is select *
>> > from
>> > > > table limit 1) Per MapR the user I am logged in as (mapradm) is
>> trying
>> > to
>> > > > do a create operation on the .drill.parquet_metadata operation and I
>> > > > guessing it's failing with status: 17 (Not sure what this means,
>> > > successes
>> > > > appear to be "0".  What was intersting was the "CREATE" being
>> attempted
>> > > > three times.   Any thoughts on why a select * from tables limit 1
>> would
>> > > try
>> > > > to initiate a create operation on the .drill.parquet_metadata file?
>> > > >
>> > > > On Wed, Nov 11, 2015 at 2:25 PM, John Omernik <jo...@omernik.com>
>> > wrote:
>> > > >
>> > > >> I take it back.
>> > > >>
>> > > >> I went to run a query, in the same session that had worked, and
>> now I
>> > am
>> > > >> getting permission denied.
>> > > >>
>> > > >> I do have a query running created new directories every 5 minutes,
>> > > >> however, these aren't the directories that are giving me permission
>> > > denied.
>> > > >>  Did you try running an aggregate query accross all data? This is a
>> > > >> interesting one to track down, not sure why I am getting the access
>> > > denied
>> > > >> now,
>> > > >>
>> > > >> the .drill.parquet_metadata file in the directory that I am getting
>> > the
>> > > >> error on is owned by mapr:mapr and has rwxr-xr-x  permissions. This
>> > > tells
>> > > >> me that both the user of the drillbits (mapr) and the user I am
>> logged
>> > > into
>> > > >> in sqlline (mapradm) should be able to read the file... so why do I
>> > get
>> > > an
>> > > >> access denied in running a query. I any assistance would be
>> valuable
>> > > here
>> > > >> in that there are some great performance increases with the
>> metadata
>> > > >> caching, and I don't want to miss out on that.
>> > > >>
>> > > >> On Wed, Nov 11, 2015 at 2:18 PM, John Omernik <jo...@omernik.com>
>> > wrote:
>> > > >>
>> > > >>> All files are owned by mapr:mapr?
>> > > >>>
>> > > >>> I have a setup where mapr is the user running the drillbit, but
>> then
>> > I
>> > > >>> have a directory that is owned by a another user. mapradm:mapradm
>> on
>> > > all
>> > > >>> files. (Permissions on directories and files appears to be
>> > rwxr-x-r-x)
>> > > When
>> > > >>> I run the REFRESH TABLE metatdata the .drill.parquet_metadata file
>> > gets
>> > > >>> created as mapr:mapr with rwxr-xr-x.
>> > > >>>
>> > > >>> So
>> > > >>> Drillbit User:mapr
>> > > >>> Directory (and subdirectories/files) owner: mapradm:mapradm
>> > > >>> Directory permissions (all files and folder under main directory)
>> > > >>> rwxr-x-r-x
>> > > >>>
>> > > >>> I authenticated to drill via sqlline as user mapradm (this user
>> > should
>> > > be
>> > > >>> able to read and write just fine to all directories).
>> > > >>>
>> > > >>> Now, one thing I did notice is my mapr user was not in the mapradm
>> > > group,
>> > > >>> therefore, didn't have write permissions anywhere... when I fixed
>> > that
>> > > on
>> > > >>> all nodes, and then I manually deleted the metadatafiles, things
>> seem
>> > > to be
>> > > >>> working. I wonder if that was my issue?
>> > > >>>
>> > > >>> Basically, the user running the drillbits need to be able to write
>> > > files
>> > > >>> (the .drill.parquet_metadata)  or something bad will happen :) I
>> will
>> > > do
>> > > >>> more testing. This may be a good candidate for some documentation
>> > work
>> > > to
>> > > >>> understand what permissions are required to be able to query
>> these.
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>> On Wed, Nov 11, 2015 at 1:36 PM, Vince Gonzalez <
>> > > vince.gonzalez@gmail.com
>> > > >>>> wrote:
>> > > >>>
>> > > >>>> Hi John, I tried this and didn't find any issues. Let me know if
>> I
>> > > didn't
>> > > >>>> follow your reproduction faithfully.
>> > > >>>>
>> > > >>>> $ sqlline -u jdbc:drill: -n ec2-user -p mapr
>> > > >>>> apache drill 1.2.0
>> > > >>>> "drill baby drill"
>> > > >>>> 0: jdbc:drill:> refresh table metadata dfs.`/tmp/flows`;
>> > > >>>> +-------+------------------------------------------------------+
>> > > >>>> |  ok   |                       summary                        |
>> > > >>>> +-------+------------------------------------------------------+
>> > > >>>> | true  | Successfully updated metadata for table /tmp/flows.  |
>> > > >>>> +-------+------------------------------------------------------+
>> > > >>>> 1 row selected (32.27 seconds)
>> > > >>>> 0: jdbc:drill:> select srcIP,dstIP from dfs.`/tmp/flows` limit
>> 12;
>> > > >>>> +---------------+---------------+
>> > > >>>> |     srcIP     |     dstIP     |
>> > > >>>> +---------------+---------------+
>> > > >>>> | 172.16.2.152  | 172.16.1.58   |
>> > > >>>> | 172.16.1.58   | 172.16.2.152  |
>> > > >>>> | 172.16.2.152  | 172.16.2.73   |
>> > > >>>> | 172.16.2.152  | 172.16.2.73   |
>> > > >>>> | 172.16.2.73   | 172.16.2.152  |
>> > > >>>> | 172.16.2.152  | 172.16.2.73   |
>> > > >>>> | 172.16.2.152  | 172.16.2.73   |
>> > > >>>> | 172.16.2.152  | 172.16.2.73   |
>> > > >>>> | 172.16.2.73   | 172.16.2.152  |
>> > > >>>> | 172.16.2.73   | 172.16.2.152  |
>> > > >>>> | 172.16.2.73   | 172.16.2.152  |
>> > > >>>> | 172.16.2.152  | 172.16.2.73   |
>> > > >>>> +---------------+---------------+
>> > > >>>> 12 rows selected (5.654 seconds)
>> > > >>>>
>> > > >>>> And here's what my table structure looks like (as seen via MapR
>> > NFS):
>> > > >>>>
>> > > >>>> $ tree /mapr/vgonzalez.drill/tmp/flows/ | head -15
>> > > >>>> /mapr/vgonzalez.drill/tmp/flows/
>> > > >>>> └── 2015
>> > > >>>>    └── 11
>> > > >>>>        ├── 10
>> > > >>>>        │   ├── 21
>> > > >>>>        │   │   ├── 39
>> > > >>>>        │   │   │   ├── 03
>> > > >>>>        │   │   │   │   ├── _common_metadata
>> > > >>>>        │   │   │   │   ├── _metadata
>> > > >>>>        │   │   │   │   ├──
>> > > >>>> part-r-00000-853882bd-66d8-4505-96ba-f0a282e374de.gz.parquet
>> > > >>>>        │   │   │   │   └── _SUCCESS
>> > > >>>>        │   │   │   └── 20
>> > > >>>>        │   │   │       ├── _common_metadata
>> > > >>>>        │   │   │       ├── _metadata
>> > > >>>>        │   │   │       ├──
>> > > >>>> part-r-00000-37a94549-8e56-46d5-be88-cb28e6d8bc35.gz.parquet
>> > > >>>>
>> > > >>>> My parquet was created in Spark, not Drill. Not sure if that's
>> > > relevant.
>> > > >>>>
>> > > >>>> I have authentication and impersonation turned on, and the files
>> are
>> > > >>>> owned
>> > > >>>> by mapr:mapr. Here's my drill-override.conf:
>> > > >>>>
>> > > >>>> drill.exec: {
>> > > >>>>  cluster-id: "vgonzalez_drill-drillbits",
>> > > >>>> zk.connect:
>> > > >>>>
>> > > >>>>
>> > >
>> >
>> "ip-172-16-2-36.ec2.internal:5181,ip-172-16-2-37.ec2.internal:5181,ip-172-16-2-38.ec2.internal:5181"
>> > > >>>> }
>> > > >>>> drill.exec.impersonation: { enabled: true,
>> max_chained_user_hops: 3
>> > }
>> > > >>>> drill.exec { security.user.auth { enabled: true, packages +=
>> > > >>>> "org.apache.drill.exec.rpc.user.security", impl: "pam",
>> > pam_profiles:
>> > > [
>> > > >>>> "login","sudo","sshd","password-auth" ] } }
>> > > >>>>
>> > > >>>>
>> > > >>>>
>> > > >>>>
>> > > >>>>
>> > > >>>> On Tue, Nov 10, 2015 at 1:17 PM, John Omernik <jo...@omernik.com>
>> > > wrote:
>> > > >>>>
>> > > >>>>> Cool, looking forward to it.
>> > > >>>>>
>> > > >>>>> On Mon, Nov 9, 2015 at 7:21 PM, Vince Gonzalez <
>> > > >>>> vince.gonzalez@gmail.com>
>> > > >>>>> wrote:
>> > > >>>>>
>> > > >>>>>> Hey John, I have a secure cluster and some parquet files, I'll
>> try
>> > > >>>> this
>> > > >>>>> out
>> > > >>>>>> and report back.
>> > > >>>>>>
>> > > >>>>>> On Monday, November 9, 2015, John Omernik <jo...@omernik.com>
>> > wrote:
>> > > >>>>>>
>> > > >>>>>>> Has anyone been able to try/test this? I am curious if it's me
>> > only
>> > > >>>>> issue
>> > > >>>>>>> or something more of bug so I can open a JIRA if needed.
>> > > >>>>>>>
>> > > >>>>>>> John
>> > > >>>>>>>
>> > > >>>>>>> On Fri, Nov 6, 2015 at 11:06 AM, John Omernik <
>> john@omernik.com
>> > > >>>>>>> <javascript:;>> wrote:
>> > > >>>>>>>
>> > > >>>>>>>> If someone has authorization/authentication setup, to
>> reproduce:
>> > > >>>>>>>>
>> > > >>>>>>>> Have a Parquet table with directories underneath the main (I
>> > have
>> > > >>>>>>>> directories per day)
>> > > >>>>>>>>
>> > > >>>>>>>> Then issue REFRESH TABLE METADATA on the root of the table
>> > > >>>> running an
>> > > >>>>>>>> authenticated user other than the drill bit user. (I am using
>> > > >>>> mapr, I
>> > > >>>>>>> used
>> > > >>>>>>>> my user to run the query, and yes I have access to the data)
>> > > >>>>>>>>
>> > > >>>>>>>> Then run a normal query and see what the result is. .
>> > > >>>>>>>>
>> > > >>>>>>>> John
>> > > >>>>>>>>
>> > > >>>>>>>> On Fri, Nov 6, 2015 at 10:22 AM, Neeraja Rentachintala <
>> > > >>>>>>>> nrentachintala@maprtech.com <javascript:;>> wrote:
>> > > >>>>>>>>
>> > > >>>>>>>>> This doesn't make sense and seems like a bug.
>> > > >>>>>>>>> I think the right behavior is for the Drillbit to access the
>> > > >>>> cache
>> > > >>>>> as
>> > > >>>>>>>>> Drillbit user at the query time (there is no user level
>> > metadata
>> > > >>>>> cache
>> > > >>>>>>> in
>> > > >>>>>>>>> Drill at this point).
>> > > >>>>>>>>>
>> > > >>>>>>>>>
>> > > >>>>>>>>>
>> > > >>>>>>>>> On Fri, Nov 6, 2015 at 6:57 AM, John Omernik <
>> john@omernik.com
>> > > >>>>>>> <javascript:;>> wrote:
>> > > >>>>>>>>>
>> > > >>>>>>>>>> I ran REFRESH TABLE METADATA on a table, it completed
>> > > >>>>> successfully.
>> > > >>>>>>>>>>
>> > > >>>>>>>>>> When I tried a subsequent query, I get a IOException:
>> > > >>>> Permission
>> > > >>>>>>> Denied
>> > > >>>>>>>>> on
>> > > >>>>>>>>>> .drill.parquet_metadata.
>> > > >>>>>>>>>>
>> > > >>>>>>>>>> I am running drill with authentication.  I ran the REFRESH
>> > > >>>> TABLE
>> > > >>>>>>>>> METADATA
>> > > >>>>>>>>>> as user X, it appears the .drill.parquet_metadata was
>> created
>> > > >>>> and
>> > > >>>>>>> owned
>> > > >>>>>>>>> by
>> > > >>>>>>>>>> the user the drill bits are running as as is created with
>> > > >>>>>> -rwxr-x-r-x
>> > > >>>>>>>>>>
>> > > >>>>>>>>>> My question is this: So, I can see why the file is owned by
>> > > >>>> the
>> > > >>>>>> drill
>> > > >>>>>>>>> bit
>> > > >>>>>>>>>> user, and the file is created with all can read
>> permissions,
>> > > >>>> but
>> > > >>>>> why
>> > > >>>>>>> am
>> > > >>>>>>>>> I
>> > > >>>>>>>>>> getting a permission denied when user X is trying to run a
>> > > >>>> query?
>> > > >>>>>>>>>>
>> > > >>>>>>>>>
>> > > >>>>>>>>
>> > > >>>>>>>>
>> > > >>>>>>>
>> > > >>>>>>
>> > > >>>>>
>> > > >>>>
>> > > >>>
>> > > >>>
>> > > >>
>> > >
>> > >
>> >
>>
>
>

Re: REFRESH TABLE METADATA - Access Denied

Posted by John Omernik <jo...@omernik.com>.
For me it's very strange. If I delete all the .drill.parquet_metadata
files, I can create and then run a query.  I can wait 5 minutes, and come
back and run the same query, and then I get the permission denied, if I try
to run the REFRESH METADATA again, then it too fails with permission denied
until I erase all the files.

What is strange here is the .drill.parquet_metadata file is owned by the
drillbit user, and has rwxr-xr-x.  Thus, based on those permissions, the
nondrillbit user STILL should be able to read the file with no issues.
 (This is not something that your last bullet describes, instead it's
restricting others from writing, not reading)

In addition, when I try to run the query, it appears that the non-drillbit
user is trying to issue a file create, and per Keys, it's already there
(and they don't have permissions to write).

There are a number of things that are not happening correctly then based on
your understanding/description of what's happening

1. The file that is created is not limited in reading to the drillbit user
2. When a query is run, the file is not accessed by the drillbit user, it's
not even accessed by the authenticated user, instead the authenticated user
tries to overwrite the file (which makes very little sense to me on a
select query)

The only thing that is (apparently) happening correctly is the initial
REFRESH command is creating the files as the drillbit user, however,
subsequent operations don't seem to be working right... so I am not sure if
that is a 3rd bullet in the "things that appear broken" list.

Using the Drill Audit logs was very helpful here, if there is anything else
I can do to help test/troubleshoot this, let me know.




On Wed, Nov 11, 2015 at 4:54 PM, Vince Gonzalez <vi...@gmail.com>
wrote:

> Ok, I'm seeing the behavior you describe except for the last bullet - the
> permissions on the file would allow for anyone to read the cache file.
>
> $ ls -la
> total 3499
> drwxr-xr-x 2 ec2-user ec2-user       5 Nov 11 21:18 .
> drwxrwxrwx 4 ec2-user ec2-user       2 Nov 11 21:18 ..
> -rwxr-xr-x 1 ec2-user ec2-user 1068250 Nov 11 21:18 1_0_0.parquet
> -rwxr-xr-x 1 ec2-user ec2-user  789341 Nov 11 21:18 1_1_0.parquet
> -rwxr-xr-x 1 ec2-user ec2-user  952667 Nov 11 21:18 1_2_0.parquet
> -rwxr-xr-x 1 ec2-user ec2-user  755805 Nov 11 21:18 1_3_0.parquet
> *-rwxr-xr-x 1 mapr     mapr       14033 Nov 11 21:18
> .drill.parquet_metadata*
>
> On Wed, Nov 11, 2015 at 5:29 PM, Neeraja Rentachintala <
> nrentachintala@maprtech.com> wrote:
>
> > John, Vince
> > I am little confused by this email thread.
> > From the original description by John, I thought that the issue refresh
> > metadata command is running successfully (and the cache is created with
> the
> > Drillbit user as owner) , but at query time it fails for any user (even
> > though the user has permissions on the directory/dataset).
> >
> > Per the latest discussion, it seems like you are hitting permission
> denied
> > when running 'refresh metadata' command itself.
> >
> > Just wanted to share what I think the right behavior here is. Feel free
> to
> > comment.
> >
> > - When Refresh metadata command is run, the cache files get created with
> > drillbit user as the owner (irrespective of whoever is running the
> command
> > and impersonation is turned on)
> > - When a select query comes in on the table , the corresponding cache
> file
> > is always accessed as drillbit user (irrespective of whoever is running
> the
> > command and impersonation is turned on)
> > - The cache file created through refresh metadata command should restrict
> > access to any other users other than the drillbit user (so there is no
> > leakage of metadata for someone going to file system opening the file i.e
> > cache is Drill's internal planning purposes and not meant as user level
> > cache).
> >
> > If the above is not happening, it seems like a bug.
> >
> > thanks
> > Neeraja
> >
> > On Wed, Nov 11, 2015 at 2:07 PM, kbotzum <kb...@maprtech.com> wrote:
> >
> > > MapR audit records print the errno value to indicate success/failure.
> > Thus
> > > status 17 means errno 17 which means EEXIST. Looks like Drill is trying
> > to
> > > create a file that already exists.
> > >
> > > I’ll defer to others as to why Drill might do that.
> > >
> > > Keys
> > > _______________________________
> > > Keys Botzum
> > > Senior Principal Technologist
> > > kbotzum@mapr.com
> > > 443-718-0098
> > > MapR Technologies
> > > http://www.mapr.com
> > >
> > >
> > >
> > > On Nov 11, 2015, at 4:09 PM, John Omernik <jo...@omernik.com> wrote:
> > >
> > > > I turned on MapR Auditing (This is a handy feature) and found that
> > when I
> > > > run a query (that is giving me access denied.. my query is select *
> > from
> > > > table limit 1) Per MapR the user I am logged in as (mapradm) is
> trying
> > to
> > > > do a create operation on the .drill.parquet_metadata operation and I
> > > > guessing it's failing with status: 17 (Not sure what this means,
> > > successes
> > > > appear to be "0".  What was intersting was the "CREATE" being
> attempted
> > > > three times.   Any thoughts on why a select * from tables limit 1
> would
> > > try
> > > > to initiate a create operation on the .drill.parquet_metadata file?
> > > >
> > > > On Wed, Nov 11, 2015 at 2:25 PM, John Omernik <jo...@omernik.com>
> > wrote:
> > > >
> > > >> I take it back.
> > > >>
> > > >> I went to run a query, in the same session that had worked, and now
> I
> > am
> > > >> getting permission denied.
> > > >>
> > > >> I do have a query running created new directories every 5 minutes,
> > > >> however, these aren't the directories that are giving me permission
> > > denied.
> > > >>  Did you try running an aggregate query accross all data? This is a
> > > >> interesting one to track down, not sure why I am getting the access
> > > denied
> > > >> now,
> > > >>
> > > >> the .drill.parquet_metadata file in the directory that I am getting
> > the
> > > >> error on is owned by mapr:mapr and has rwxr-xr-x  permissions. This
> > > tells
> > > >> me that both the user of the drillbits (mapr) and the user I am
> logged
> > > into
> > > >> in sqlline (mapradm) should be able to read the file... so why do I
> > get
> > > an
> > > >> access denied in running a query. I any assistance would be valuable
> > > here
> > > >> in that there are some great performance increases with the metadata
> > > >> caching, and I don't want to miss out on that.
> > > >>
> > > >> On Wed, Nov 11, 2015 at 2:18 PM, John Omernik <jo...@omernik.com>
> > wrote:
> > > >>
> > > >>> All files are owned by mapr:mapr?
> > > >>>
> > > >>> I have a setup where mapr is the user running the drillbit, but
> then
> > I
> > > >>> have a directory that is owned by a another user. mapradm:mapradm
> on
> > > all
> > > >>> files. (Permissions on directories and files appears to be
> > rwxr-x-r-x)
> > > When
> > > >>> I run the REFRESH TABLE metatdata the .drill.parquet_metadata file
> > gets
> > > >>> created as mapr:mapr with rwxr-xr-x.
> > > >>>
> > > >>> So
> > > >>> Drillbit User:mapr
> > > >>> Directory (and subdirectories/files) owner: mapradm:mapradm
> > > >>> Directory permissions (all files and folder under main directory)
> > > >>> rwxr-x-r-x
> > > >>>
> > > >>> I authenticated to drill via sqlline as user mapradm (this user
> > should
> > > be
> > > >>> able to read and write just fine to all directories).
> > > >>>
> > > >>> Now, one thing I did notice is my mapr user was not in the mapradm
> > > group,
> > > >>> therefore, didn't have write permissions anywhere... when I fixed
> > that
> > > on
> > > >>> all nodes, and then I manually deleted the metadatafiles, things
> seem
> > > to be
> > > >>> working. I wonder if that was my issue?
> > > >>>
> > > >>> Basically, the user running the drillbits need to be able to write
> > > files
> > > >>> (the .drill.parquet_metadata)  or something bad will happen :) I
> will
> > > do
> > > >>> more testing. This may be a good candidate for some documentation
> > work
> > > to
> > > >>> understand what permissions are required to be able to query these.
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>> On Wed, Nov 11, 2015 at 1:36 PM, Vince Gonzalez <
> > > vince.gonzalez@gmail.com
> > > >>>> wrote:
> > > >>>
> > > >>>> Hi John, I tried this and didn't find any issues. Let me know if I
> > > didn't
> > > >>>> follow your reproduction faithfully.
> > > >>>>
> > > >>>> $ sqlline -u jdbc:drill: -n ec2-user -p mapr
> > > >>>> apache drill 1.2.0
> > > >>>> "drill baby drill"
> > > >>>> 0: jdbc:drill:> refresh table metadata dfs.`/tmp/flows`;
> > > >>>> +-------+------------------------------------------------------+
> > > >>>> |  ok   |                       summary                        |
> > > >>>> +-------+------------------------------------------------------+
> > > >>>> | true  | Successfully updated metadata for table /tmp/flows.  |
> > > >>>> +-------+------------------------------------------------------+
> > > >>>> 1 row selected (32.27 seconds)
> > > >>>> 0: jdbc:drill:> select srcIP,dstIP from dfs.`/tmp/flows` limit 12;
> > > >>>> +---------------+---------------+
> > > >>>> |     srcIP     |     dstIP     |
> > > >>>> +---------------+---------------+
> > > >>>> | 172.16.2.152  | 172.16.1.58   |
> > > >>>> | 172.16.1.58   | 172.16.2.152  |
> > > >>>> | 172.16.2.152  | 172.16.2.73   |
> > > >>>> | 172.16.2.152  | 172.16.2.73   |
> > > >>>> | 172.16.2.73   | 172.16.2.152  |
> > > >>>> | 172.16.2.152  | 172.16.2.73   |
> > > >>>> | 172.16.2.152  | 172.16.2.73   |
> > > >>>> | 172.16.2.152  | 172.16.2.73   |
> > > >>>> | 172.16.2.73   | 172.16.2.152  |
> > > >>>> | 172.16.2.73   | 172.16.2.152  |
> > > >>>> | 172.16.2.73   | 172.16.2.152  |
> > > >>>> | 172.16.2.152  | 172.16.2.73   |
> > > >>>> +---------------+---------------+
> > > >>>> 12 rows selected (5.654 seconds)
> > > >>>>
> > > >>>> And here's what my table structure looks like (as seen via MapR
> > NFS):
> > > >>>>
> > > >>>> $ tree /mapr/vgonzalez.drill/tmp/flows/ | head -15
> > > >>>> /mapr/vgonzalez.drill/tmp/flows/
> > > >>>> └── 2015
> > > >>>>    └── 11
> > > >>>>        ├── 10
> > > >>>>        │   ├── 21
> > > >>>>        │   │   ├── 39
> > > >>>>        │   │   │   ├── 03
> > > >>>>        │   │   │   │   ├── _common_metadata
> > > >>>>        │   │   │   │   ├── _metadata
> > > >>>>        │   │   │   │   ├──
> > > >>>> part-r-00000-853882bd-66d8-4505-96ba-f0a282e374de.gz.parquet
> > > >>>>        │   │   │   │   └── _SUCCESS
> > > >>>>        │   │   │   └── 20
> > > >>>>        │   │   │       ├── _common_metadata
> > > >>>>        │   │   │       ├── _metadata
> > > >>>>        │   │   │       ├──
> > > >>>> part-r-00000-37a94549-8e56-46d5-be88-cb28e6d8bc35.gz.parquet
> > > >>>>
> > > >>>> My parquet was created in Spark, not Drill. Not sure if that's
> > > relevant.
> > > >>>>
> > > >>>> I have authentication and impersonation turned on, and the files
> are
> > > >>>> owned
> > > >>>> by mapr:mapr. Here's my drill-override.conf:
> > > >>>>
> > > >>>> drill.exec: {
> > > >>>>  cluster-id: "vgonzalez_drill-drillbits",
> > > >>>> zk.connect:
> > > >>>>
> > > >>>>
> > >
> >
> "ip-172-16-2-36.ec2.internal:5181,ip-172-16-2-37.ec2.internal:5181,ip-172-16-2-38.ec2.internal:5181"
> > > >>>> }
> > > >>>> drill.exec.impersonation: { enabled: true, max_chained_user_hops:
> 3
> > }
> > > >>>> drill.exec { security.user.auth { enabled: true, packages +=
> > > >>>> "org.apache.drill.exec.rpc.user.security", impl: "pam",
> > pam_profiles:
> > > [
> > > >>>> "login","sudo","sshd","password-auth" ] } }
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> On Tue, Nov 10, 2015 at 1:17 PM, John Omernik <jo...@omernik.com>
> > > wrote:
> > > >>>>
> > > >>>>> Cool, looking forward to it.
> > > >>>>>
> > > >>>>> On Mon, Nov 9, 2015 at 7:21 PM, Vince Gonzalez <
> > > >>>> vince.gonzalez@gmail.com>
> > > >>>>> wrote:
> > > >>>>>
> > > >>>>>> Hey John, I have a secure cluster and some parquet files, I'll
> try
> > > >>>> this
> > > >>>>> out
> > > >>>>>> and report back.
> > > >>>>>>
> > > >>>>>> On Monday, November 9, 2015, John Omernik <jo...@omernik.com>
> > wrote:
> > > >>>>>>
> > > >>>>>>> Has anyone been able to try/test this? I am curious if it's me
> > only
> > > >>>>> issue
> > > >>>>>>> or something more of bug so I can open a JIRA if needed.
> > > >>>>>>>
> > > >>>>>>> John
> > > >>>>>>>
> > > >>>>>>> On Fri, Nov 6, 2015 at 11:06 AM, John Omernik <
> john@omernik.com
> > > >>>>>>> <javascript:;>> wrote:
> > > >>>>>>>
> > > >>>>>>>> If someone has authorization/authentication setup, to
> reproduce:
> > > >>>>>>>>
> > > >>>>>>>> Have a Parquet table with directories underneath the main (I
> > have
> > > >>>>>>>> directories per day)
> > > >>>>>>>>
> > > >>>>>>>> Then issue REFRESH TABLE METADATA on the root of the table
> > > >>>> running an
> > > >>>>>>>> authenticated user other than the drill bit user. (I am using
> > > >>>> mapr, I
> > > >>>>>>> used
> > > >>>>>>>> my user to run the query, and yes I have access to the data)
> > > >>>>>>>>
> > > >>>>>>>> Then run a normal query and see what the result is. .
> > > >>>>>>>>
> > > >>>>>>>> John
> > > >>>>>>>>
> > > >>>>>>>> On Fri, Nov 6, 2015 at 10:22 AM, Neeraja Rentachintala <
> > > >>>>>>>> nrentachintala@maprtech.com <javascript:;>> wrote:
> > > >>>>>>>>
> > > >>>>>>>>> This doesn't make sense and seems like a bug.
> > > >>>>>>>>> I think the right behavior is for the Drillbit to access the
> > > >>>> cache
> > > >>>>> as
> > > >>>>>>>>> Drillbit user at the query time (there is no user level
> > metadata
> > > >>>>> cache
> > > >>>>>>> in
> > > >>>>>>>>> Drill at this point).
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>> On Fri, Nov 6, 2015 at 6:57 AM, John Omernik <
> john@omernik.com
> > > >>>>>>> <javascript:;>> wrote:
> > > >>>>>>>>>
> > > >>>>>>>>>> I ran REFRESH TABLE METADATA on a table, it completed
> > > >>>>> successfully.
> > > >>>>>>>>>>
> > > >>>>>>>>>> When I tried a subsequent query, I get a IOException:
> > > >>>> Permission
> > > >>>>>>> Denied
> > > >>>>>>>>> on
> > > >>>>>>>>>> .drill.parquet_metadata.
> > > >>>>>>>>>>
> > > >>>>>>>>>> I am running drill with authentication.  I ran the REFRESH
> > > >>>> TABLE
> > > >>>>>>>>> METADATA
> > > >>>>>>>>>> as user X, it appears the .drill.parquet_metadata was
> created
> > > >>>> and
> > > >>>>>>> owned
> > > >>>>>>>>> by
> > > >>>>>>>>>> the user the drill bits are running as as is created with
> > > >>>>>> -rwxr-x-r-x
> > > >>>>>>>>>>
> > > >>>>>>>>>> My question is this: So, I can see why the file is owned by
> > > >>>> the
> > > >>>>>> drill
> > > >>>>>>>>> bit
> > > >>>>>>>>>> user, and the file is created with all can read permissions,
> > > >>>> but
> > > >>>>> why
> > > >>>>>>> am
> > > >>>>>>>>> I
> > > >>>>>>>>>> getting a permission denied when user X is trying to run a
> > > >>>> query?
> > > >>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>>
> > > >>
> > >
> > >
> >
>

Re: REFRESH TABLE METADATA - Access Denied

Posted by Vince Gonzalez <vi...@gmail.com>.
Ok, I'm seeing the behavior you describe except for the last bullet - the
permissions on the file would allow for anyone to read the cache file.

$ ls -la
total 3499
drwxr-xr-x 2 ec2-user ec2-user       5 Nov 11 21:18 .
drwxrwxrwx 4 ec2-user ec2-user       2 Nov 11 21:18 ..
-rwxr-xr-x 1 ec2-user ec2-user 1068250 Nov 11 21:18 1_0_0.parquet
-rwxr-xr-x 1 ec2-user ec2-user  789341 Nov 11 21:18 1_1_0.parquet
-rwxr-xr-x 1 ec2-user ec2-user  952667 Nov 11 21:18 1_2_0.parquet
-rwxr-xr-x 1 ec2-user ec2-user  755805 Nov 11 21:18 1_3_0.parquet
*-rwxr-xr-x 1 mapr     mapr       14033 Nov 11 21:18
.drill.parquet_metadata*

On Wed, Nov 11, 2015 at 5:29 PM, Neeraja Rentachintala <
nrentachintala@maprtech.com> wrote:

> John, Vince
> I am little confused by this email thread.
> From the original description by John, I thought that the issue refresh
> metadata command is running successfully (and the cache is created with the
> Drillbit user as owner) , but at query time it fails for any user (even
> though the user has permissions on the directory/dataset).
>
> Per the latest discussion, it seems like you are hitting permission denied
> when running 'refresh metadata' command itself.
>
> Just wanted to share what I think the right behavior here is. Feel free to
> comment.
>
> - When Refresh metadata command is run, the cache files get created with
> drillbit user as the owner (irrespective of whoever is running the command
> and impersonation is turned on)
> - When a select query comes in on the table , the corresponding cache file
> is always accessed as drillbit user (irrespective of whoever is running the
> command and impersonation is turned on)
> - The cache file created through refresh metadata command should restrict
> access to any other users other than the drillbit user (so there is no
> leakage of metadata for someone going to file system opening the file i.e
> cache is Drill's internal planning purposes and not meant as user level
> cache).
>
> If the above is not happening, it seems like a bug.
>
> thanks
> Neeraja
>
> On Wed, Nov 11, 2015 at 2:07 PM, kbotzum <kb...@maprtech.com> wrote:
>
> > MapR audit records print the errno value to indicate success/failure.
> Thus
> > status 17 means errno 17 which means EEXIST. Looks like Drill is trying
> to
> > create a file that already exists.
> >
> > I’ll defer to others as to why Drill might do that.
> >
> > Keys
> > _______________________________
> > Keys Botzum
> > Senior Principal Technologist
> > kbotzum@mapr.com
> > 443-718-0098
> > MapR Technologies
> > http://www.mapr.com
> >
> >
> >
> > On Nov 11, 2015, at 4:09 PM, John Omernik <jo...@omernik.com> wrote:
> >
> > > I turned on MapR Auditing (This is a handy feature) and found that
> when I
> > > run a query (that is giving me access denied.. my query is select *
> from
> > > table limit 1) Per MapR the user I am logged in as (mapradm) is trying
> to
> > > do a create operation on the .drill.parquet_metadata operation and I
> > > guessing it's failing with status: 17 (Not sure what this means,
> > successes
> > > appear to be "0".  What was intersting was the "CREATE" being attempted
> > > three times.   Any thoughts on why a select * from tables limit 1 would
> > try
> > > to initiate a create operation on the .drill.parquet_metadata file?
> > >
> > > On Wed, Nov 11, 2015 at 2:25 PM, John Omernik <jo...@omernik.com>
> wrote:
> > >
> > >> I take it back.
> > >>
> > >> I went to run a query, in the same session that had worked, and now I
> am
> > >> getting permission denied.
> > >>
> > >> I do have a query running created new directories every 5 minutes,
> > >> however, these aren't the directories that are giving me permission
> > denied.
> > >>  Did you try running an aggregate query accross all data? This is a
> > >> interesting one to track down, not sure why I am getting the access
> > denied
> > >> now,
> > >>
> > >> the .drill.parquet_metadata file in the directory that I am getting
> the
> > >> error on is owned by mapr:mapr and has rwxr-xr-x  permissions. This
> > tells
> > >> me that both the user of the drillbits (mapr) and the user I am logged
> > into
> > >> in sqlline (mapradm) should be able to read the file... so why do I
> get
> > an
> > >> access denied in running a query. I any assistance would be valuable
> > here
> > >> in that there are some great performance increases with the metadata
> > >> caching, and I don't want to miss out on that.
> > >>
> > >> On Wed, Nov 11, 2015 at 2:18 PM, John Omernik <jo...@omernik.com>
> wrote:
> > >>
> > >>> All files are owned by mapr:mapr?
> > >>>
> > >>> I have a setup where mapr is the user running the drillbit, but then
> I
> > >>> have a directory that is owned by a another user. mapradm:mapradm on
> > all
> > >>> files. (Permissions on directories and files appears to be
> rwxr-x-r-x)
> > When
> > >>> I run the REFRESH TABLE metatdata the .drill.parquet_metadata file
> gets
> > >>> created as mapr:mapr with rwxr-xr-x.
> > >>>
> > >>> So
> > >>> Drillbit User:mapr
> > >>> Directory (and subdirectories/files) owner: mapradm:mapradm
> > >>> Directory permissions (all files and folder under main directory)
> > >>> rwxr-x-r-x
> > >>>
> > >>> I authenticated to drill via sqlline as user mapradm (this user
> should
> > be
> > >>> able to read and write just fine to all directories).
> > >>>
> > >>> Now, one thing I did notice is my mapr user was not in the mapradm
> > group,
> > >>> therefore, didn't have write permissions anywhere... when I fixed
> that
> > on
> > >>> all nodes, and then I manually deleted the metadatafiles, things seem
> > to be
> > >>> working. I wonder if that was my issue?
> > >>>
> > >>> Basically, the user running the drillbits need to be able to write
> > files
> > >>> (the .drill.parquet_metadata)  or something bad will happen :) I will
> > do
> > >>> more testing. This may be a good candidate for some documentation
> work
> > to
> > >>> understand what permissions are required to be able to query these.
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> On Wed, Nov 11, 2015 at 1:36 PM, Vince Gonzalez <
> > vince.gonzalez@gmail.com
> > >>>> wrote:
> > >>>
> > >>>> Hi John, I tried this and didn't find any issues. Let me know if I
> > didn't
> > >>>> follow your reproduction faithfully.
> > >>>>
> > >>>> $ sqlline -u jdbc:drill: -n ec2-user -p mapr
> > >>>> apache drill 1.2.0
> > >>>> "drill baby drill"
> > >>>> 0: jdbc:drill:> refresh table metadata dfs.`/tmp/flows`;
> > >>>> +-------+------------------------------------------------------+
> > >>>> |  ok   |                       summary                        |
> > >>>> +-------+------------------------------------------------------+
> > >>>> | true  | Successfully updated metadata for table /tmp/flows.  |
> > >>>> +-------+------------------------------------------------------+
> > >>>> 1 row selected (32.27 seconds)
> > >>>> 0: jdbc:drill:> select srcIP,dstIP from dfs.`/tmp/flows` limit 12;
> > >>>> +---------------+---------------+
> > >>>> |     srcIP     |     dstIP     |
> > >>>> +---------------+---------------+
> > >>>> | 172.16.2.152  | 172.16.1.58   |
> > >>>> | 172.16.1.58   | 172.16.2.152  |
> > >>>> | 172.16.2.152  | 172.16.2.73   |
> > >>>> | 172.16.2.152  | 172.16.2.73   |
> > >>>> | 172.16.2.73   | 172.16.2.152  |
> > >>>> | 172.16.2.152  | 172.16.2.73   |
> > >>>> | 172.16.2.152  | 172.16.2.73   |
> > >>>> | 172.16.2.152  | 172.16.2.73   |
> > >>>> | 172.16.2.73   | 172.16.2.152  |
> > >>>> | 172.16.2.73   | 172.16.2.152  |
> > >>>> | 172.16.2.73   | 172.16.2.152  |
> > >>>> | 172.16.2.152  | 172.16.2.73   |
> > >>>> +---------------+---------------+
> > >>>> 12 rows selected (5.654 seconds)
> > >>>>
> > >>>> And here's what my table structure looks like (as seen via MapR
> NFS):
> > >>>>
> > >>>> $ tree /mapr/vgonzalez.drill/tmp/flows/ | head -15
> > >>>> /mapr/vgonzalez.drill/tmp/flows/
> > >>>> └── 2015
> > >>>>    └── 11
> > >>>>        ├── 10
> > >>>>        │   ├── 21
> > >>>>        │   │   ├── 39
> > >>>>        │   │   │   ├── 03
> > >>>>        │   │   │   │   ├── _common_metadata
> > >>>>        │   │   │   │   ├── _metadata
> > >>>>        │   │   │   │   ├──
> > >>>> part-r-00000-853882bd-66d8-4505-96ba-f0a282e374de.gz.parquet
> > >>>>        │   │   │   │   └── _SUCCESS
> > >>>>        │   │   │   └── 20
> > >>>>        │   │   │       ├── _common_metadata
> > >>>>        │   │   │       ├── _metadata
> > >>>>        │   │   │       ├──
> > >>>> part-r-00000-37a94549-8e56-46d5-be88-cb28e6d8bc35.gz.parquet
> > >>>>
> > >>>> My parquet was created in Spark, not Drill. Not sure if that's
> > relevant.
> > >>>>
> > >>>> I have authentication and impersonation turned on, and the files are
> > >>>> owned
> > >>>> by mapr:mapr. Here's my drill-override.conf:
> > >>>>
> > >>>> drill.exec: {
> > >>>>  cluster-id: "vgonzalez_drill-drillbits",
> > >>>> zk.connect:
> > >>>>
> > >>>>
> >
> "ip-172-16-2-36.ec2.internal:5181,ip-172-16-2-37.ec2.internal:5181,ip-172-16-2-38.ec2.internal:5181"
> > >>>> }
> > >>>> drill.exec.impersonation: { enabled: true, max_chained_user_hops: 3
> }
> > >>>> drill.exec { security.user.auth { enabled: true, packages +=
> > >>>> "org.apache.drill.exec.rpc.user.security", impl: "pam",
> pam_profiles:
> > [
> > >>>> "login","sudo","sshd","password-auth" ] } }
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> On Tue, Nov 10, 2015 at 1:17 PM, John Omernik <jo...@omernik.com>
> > wrote:
> > >>>>
> > >>>>> Cool, looking forward to it.
> > >>>>>
> > >>>>> On Mon, Nov 9, 2015 at 7:21 PM, Vince Gonzalez <
> > >>>> vince.gonzalez@gmail.com>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> Hey John, I have a secure cluster and some parquet files, I'll try
> > >>>> this
> > >>>>> out
> > >>>>>> and report back.
> > >>>>>>
> > >>>>>> On Monday, November 9, 2015, John Omernik <jo...@omernik.com>
> wrote:
> > >>>>>>
> > >>>>>>> Has anyone been able to try/test this? I am curious if it's me
> only
> > >>>>> issue
> > >>>>>>> or something more of bug so I can open a JIRA if needed.
> > >>>>>>>
> > >>>>>>> John
> > >>>>>>>
> > >>>>>>> On Fri, Nov 6, 2015 at 11:06 AM, John Omernik <john@omernik.com
> > >>>>>>> <javascript:;>> wrote:
> > >>>>>>>
> > >>>>>>>> If someone has authorization/authentication setup, to reproduce:
> > >>>>>>>>
> > >>>>>>>> Have a Parquet table with directories underneath the main (I
> have
> > >>>>>>>> directories per day)
> > >>>>>>>>
> > >>>>>>>> Then issue REFRESH TABLE METADATA on the root of the table
> > >>>> running an
> > >>>>>>>> authenticated user other than the drill bit user. (I am using
> > >>>> mapr, I
> > >>>>>>> used
> > >>>>>>>> my user to run the query, and yes I have access to the data)
> > >>>>>>>>
> > >>>>>>>> Then run a normal query and see what the result is. .
> > >>>>>>>>
> > >>>>>>>> John
> > >>>>>>>>
> > >>>>>>>> On Fri, Nov 6, 2015 at 10:22 AM, Neeraja Rentachintala <
> > >>>>>>>> nrentachintala@maprtech.com <javascript:;>> wrote:
> > >>>>>>>>
> > >>>>>>>>> This doesn't make sense and seems like a bug.
> > >>>>>>>>> I think the right behavior is for the Drillbit to access the
> > >>>> cache
> > >>>>> as
> > >>>>>>>>> Drillbit user at the query time (there is no user level
> metadata
> > >>>>> cache
> > >>>>>>> in
> > >>>>>>>>> Drill at this point).
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> On Fri, Nov 6, 2015 at 6:57 AM, John Omernik <john@omernik.com
> > >>>>>>> <javascript:;>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> I ran REFRESH TABLE METADATA on a table, it completed
> > >>>>> successfully.
> > >>>>>>>>>>
> > >>>>>>>>>> When I tried a subsequent query, I get a IOException:
> > >>>> Permission
> > >>>>>>> Denied
> > >>>>>>>>> on
> > >>>>>>>>>> .drill.parquet_metadata.
> > >>>>>>>>>>
> > >>>>>>>>>> I am running drill with authentication.  I ran the REFRESH
> > >>>> TABLE
> > >>>>>>>>> METADATA
> > >>>>>>>>>> as user X, it appears the .drill.parquet_metadata was created
> > >>>> and
> > >>>>>>> owned
> > >>>>>>>>> by
> > >>>>>>>>>> the user the drill bits are running as as is created with
> > >>>>>> -rwxr-x-r-x
> > >>>>>>>>>>
> > >>>>>>>>>> My question is this: So, I can see why the file is owned by
> > >>>> the
> > >>>>>> drill
> > >>>>>>>>> bit
> > >>>>>>>>>> user, and the file is created with all can read permissions,
> > >>>> but
> > >>>>> why
> > >>>>>>> am
> > >>>>>>>>> I
> > >>>>>>>>>> getting a permission denied when user X is trying to run a
> > >>>> query?
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>>
> > >>
> >
> >
>

Re: REFRESH TABLE METADATA - Access Denied

Posted by Neeraja Rentachintala <nr...@maprtech.com>.
John, Vince
I am little confused by this email thread.
>From the original description by John, I thought that the issue refresh
metadata command is running successfully (and the cache is created with the
Drillbit user as owner) , but at query time it fails for any user (even
though the user has permissions on the directory/dataset).

Per the latest discussion, it seems like you are hitting permission denied
when running 'refresh metadata' command itself.

Just wanted to share what I think the right behavior here is. Feel free to
comment.

- When Refresh metadata command is run, the cache files get created with
drillbit user as the owner (irrespective of whoever is running the command
and impersonation is turned on)
- When a select query comes in on the table , the corresponding cache file
is always accessed as drillbit user (irrespective of whoever is running the
command and impersonation is turned on)
- The cache file created through refresh metadata command should restrict
access to any other users other than the drillbit user (so there is no
leakage of metadata for someone going to file system opening the file i.e
cache is Drill's internal planning purposes and not meant as user level
cache).

If the above is not happening, it seems like a bug.

thanks
Neeraja

On Wed, Nov 11, 2015 at 2:07 PM, kbotzum <kb...@maprtech.com> wrote:

> MapR audit records print the errno value to indicate success/failure. Thus
> status 17 means errno 17 which means EEXIST. Looks like Drill is trying to
> create a file that already exists.
>
> I’ll defer to others as to why Drill might do that.
>
> Keys
> _______________________________
> Keys Botzum
> Senior Principal Technologist
> kbotzum@mapr.com
> 443-718-0098
> MapR Technologies
> http://www.mapr.com
>
>
>
> On Nov 11, 2015, at 4:09 PM, John Omernik <jo...@omernik.com> wrote:
>
> > I turned on MapR Auditing (This is a handy feature) and found that when I
> > run a query (that is giving me access denied.. my query is select * from
> > table limit 1) Per MapR the user I am logged in as (mapradm) is trying to
> > do a create operation on the .drill.parquet_metadata operation and I
> > guessing it's failing with status: 17 (Not sure what this means,
> successes
> > appear to be "0".  What was intersting was the "CREATE" being attempted
> > three times.   Any thoughts on why a select * from tables limit 1 would
> try
> > to initiate a create operation on the .drill.parquet_metadata file?
> >
> > On Wed, Nov 11, 2015 at 2:25 PM, John Omernik <jo...@omernik.com> wrote:
> >
> >> I take it back.
> >>
> >> I went to run a query, in the same session that had worked, and now I am
> >> getting permission denied.
> >>
> >> I do have a query running created new directories every 5 minutes,
> >> however, these aren't the directories that are giving me permission
> denied.
> >>  Did you try running an aggregate query accross all data? This is a
> >> interesting one to track down, not sure why I am getting the access
> denied
> >> now,
> >>
> >> the .drill.parquet_metadata file in the directory that I am getting the
> >> error on is owned by mapr:mapr and has rwxr-xr-x  permissions. This
> tells
> >> me that both the user of the drillbits (mapr) and the user I am logged
> into
> >> in sqlline (mapradm) should be able to read the file... so why do I get
> an
> >> access denied in running a query. I any assistance would be valuable
> here
> >> in that there are some great performance increases with the metadata
> >> caching, and I don't want to miss out on that.
> >>
> >> On Wed, Nov 11, 2015 at 2:18 PM, John Omernik <jo...@omernik.com> wrote:
> >>
> >>> All files are owned by mapr:mapr?
> >>>
> >>> I have a setup where mapr is the user running the drillbit, but then I
> >>> have a directory that is owned by a another user. mapradm:mapradm on
> all
> >>> files. (Permissions on directories and files appears to be rwxr-x-r-x)
> When
> >>> I run the REFRESH TABLE metatdata the .drill.parquet_metadata file gets
> >>> created as mapr:mapr with rwxr-xr-x.
> >>>
> >>> So
> >>> Drillbit User:mapr
> >>> Directory (and subdirectories/files) owner: mapradm:mapradm
> >>> Directory permissions (all files and folder under main directory)
> >>> rwxr-x-r-x
> >>>
> >>> I authenticated to drill via sqlline as user mapradm (this user should
> be
> >>> able to read and write just fine to all directories).
> >>>
> >>> Now, one thing I did notice is my mapr user was not in the mapradm
> group,
> >>> therefore, didn't have write permissions anywhere... when I fixed that
> on
> >>> all nodes, and then I manually deleted the metadatafiles, things seem
> to be
> >>> working. I wonder if that was my issue?
> >>>
> >>> Basically, the user running the drillbits need to be able to write
> files
> >>> (the .drill.parquet_metadata)  or something bad will happen :) I will
> do
> >>> more testing. This may be a good candidate for some documentation work
> to
> >>> understand what permissions are required to be able to query these.
> >>>
> >>>
> >>>
> >>>
> >>> On Wed, Nov 11, 2015 at 1:36 PM, Vince Gonzalez <
> vince.gonzalez@gmail.com
> >>>> wrote:
> >>>
> >>>> Hi John, I tried this and didn't find any issues. Let me know if I
> didn't
> >>>> follow your reproduction faithfully.
> >>>>
> >>>> $ sqlline -u jdbc:drill: -n ec2-user -p mapr
> >>>> apache drill 1.2.0
> >>>> "drill baby drill"
> >>>> 0: jdbc:drill:> refresh table metadata dfs.`/tmp/flows`;
> >>>> +-------+------------------------------------------------------+
> >>>> |  ok   |                       summary                        |
> >>>> +-------+------------------------------------------------------+
> >>>> | true  | Successfully updated metadata for table /tmp/flows.  |
> >>>> +-------+------------------------------------------------------+
> >>>> 1 row selected (32.27 seconds)
> >>>> 0: jdbc:drill:> select srcIP,dstIP from dfs.`/tmp/flows` limit 12;
> >>>> +---------------+---------------+
> >>>> |     srcIP     |     dstIP     |
> >>>> +---------------+---------------+
> >>>> | 172.16.2.152  | 172.16.1.58   |
> >>>> | 172.16.1.58   | 172.16.2.152  |
> >>>> | 172.16.2.152  | 172.16.2.73   |
> >>>> | 172.16.2.152  | 172.16.2.73   |
> >>>> | 172.16.2.73   | 172.16.2.152  |
> >>>> | 172.16.2.152  | 172.16.2.73   |
> >>>> | 172.16.2.152  | 172.16.2.73   |
> >>>> | 172.16.2.152  | 172.16.2.73   |
> >>>> | 172.16.2.73   | 172.16.2.152  |
> >>>> | 172.16.2.73   | 172.16.2.152  |
> >>>> | 172.16.2.73   | 172.16.2.152  |
> >>>> | 172.16.2.152  | 172.16.2.73   |
> >>>> +---------------+---------------+
> >>>> 12 rows selected (5.654 seconds)
> >>>>
> >>>> And here's what my table structure looks like (as seen via MapR NFS):
> >>>>
> >>>> $ tree /mapr/vgonzalez.drill/tmp/flows/ | head -15
> >>>> /mapr/vgonzalez.drill/tmp/flows/
> >>>> └── 2015
> >>>>    └── 11
> >>>>        ├── 10
> >>>>        │   ├── 21
> >>>>        │   │   ├── 39
> >>>>        │   │   │   ├── 03
> >>>>        │   │   │   │   ├── _common_metadata
> >>>>        │   │   │   │   ├── _metadata
> >>>>        │   │   │   │   ├──
> >>>> part-r-00000-853882bd-66d8-4505-96ba-f0a282e374de.gz.parquet
> >>>>        │   │   │   │   └── _SUCCESS
> >>>>        │   │   │   └── 20
> >>>>        │   │   │       ├── _common_metadata
> >>>>        │   │   │       ├── _metadata
> >>>>        │   │   │       ├──
> >>>> part-r-00000-37a94549-8e56-46d5-be88-cb28e6d8bc35.gz.parquet
> >>>>
> >>>> My parquet was created in Spark, not Drill. Not sure if that's
> relevant.
> >>>>
> >>>> I have authentication and impersonation turned on, and the files are
> >>>> owned
> >>>> by mapr:mapr. Here's my drill-override.conf:
> >>>>
> >>>> drill.exec: {
> >>>>  cluster-id: "vgonzalez_drill-drillbits",
> >>>> zk.connect:
> >>>>
> >>>>
> "ip-172-16-2-36.ec2.internal:5181,ip-172-16-2-37.ec2.internal:5181,ip-172-16-2-38.ec2.internal:5181"
> >>>> }
> >>>> drill.exec.impersonation: { enabled: true, max_chained_user_hops: 3 }
> >>>> drill.exec { security.user.auth { enabled: true, packages +=
> >>>> "org.apache.drill.exec.rpc.user.security", impl: "pam", pam_profiles:
> [
> >>>> "login","sudo","sshd","password-auth" ] } }
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Tue, Nov 10, 2015 at 1:17 PM, John Omernik <jo...@omernik.com>
> wrote:
> >>>>
> >>>>> Cool, looking forward to it.
> >>>>>
> >>>>> On Mon, Nov 9, 2015 at 7:21 PM, Vince Gonzalez <
> >>>> vince.gonzalez@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Hey John, I have a secure cluster and some parquet files, I'll try
> >>>> this
> >>>>> out
> >>>>>> and report back.
> >>>>>>
> >>>>>> On Monday, November 9, 2015, John Omernik <jo...@omernik.com> wrote:
> >>>>>>
> >>>>>>> Has anyone been able to try/test this? I am curious if it's me only
> >>>>> issue
> >>>>>>> or something more of bug so I can open a JIRA if needed.
> >>>>>>>
> >>>>>>> John
> >>>>>>>
> >>>>>>> On Fri, Nov 6, 2015 at 11:06 AM, John Omernik <john@omernik.com
> >>>>>>> <javascript:;>> wrote:
> >>>>>>>
> >>>>>>>> If someone has authorization/authentication setup, to reproduce:
> >>>>>>>>
> >>>>>>>> Have a Parquet table with directories underneath the main (I have
> >>>>>>>> directories per day)
> >>>>>>>>
> >>>>>>>> Then issue REFRESH TABLE METADATA on the root of the table
> >>>> running an
> >>>>>>>> authenticated user other than the drill bit user. (I am using
> >>>> mapr, I
> >>>>>>> used
> >>>>>>>> my user to run the query, and yes I have access to the data)
> >>>>>>>>
> >>>>>>>> Then run a normal query and see what the result is. .
> >>>>>>>>
> >>>>>>>> John
> >>>>>>>>
> >>>>>>>> On Fri, Nov 6, 2015 at 10:22 AM, Neeraja Rentachintala <
> >>>>>>>> nrentachintala@maprtech.com <javascript:;>> wrote:
> >>>>>>>>
> >>>>>>>>> This doesn't make sense and seems like a bug.
> >>>>>>>>> I think the right behavior is for the Drillbit to access the
> >>>> cache
> >>>>> as
> >>>>>>>>> Drillbit user at the query time (there is no user level metadata
> >>>>> cache
> >>>>>>> in
> >>>>>>>>> Drill at this point).
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Fri, Nov 6, 2015 at 6:57 AM, John Omernik <john@omernik.com
> >>>>>>> <javascript:;>> wrote:
> >>>>>>>>>
> >>>>>>>>>> I ran REFRESH TABLE METADATA on a table, it completed
> >>>>> successfully.
> >>>>>>>>>>
> >>>>>>>>>> When I tried a subsequent query, I get a IOException:
> >>>> Permission
> >>>>>>> Denied
> >>>>>>>>> on
> >>>>>>>>>> .drill.parquet_metadata.
> >>>>>>>>>>
> >>>>>>>>>> I am running drill with authentication.  I ran the REFRESH
> >>>> TABLE
> >>>>>>>>> METADATA
> >>>>>>>>>> as user X, it appears the .drill.parquet_metadata was created
> >>>> and
> >>>>>>> owned
> >>>>>>>>> by
> >>>>>>>>>> the user the drill bits are running as as is created with
> >>>>>> -rwxr-x-r-x
> >>>>>>>>>>
> >>>>>>>>>> My question is this: So, I can see why the file is owned by
> >>>> the
> >>>>>> drill
> >>>>>>>>> bit
> >>>>>>>>>> user, and the file is created with all can read permissions,
> >>>> but
> >>>>> why
> >>>>>>> am
> >>>>>>>>> I
> >>>>>>>>>> getting a permission denied when user X is trying to run a
> >>>> query?
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>
>
>

Re: REFRESH TABLE METADATA - Access Denied

Posted by kbotzum <kb...@maprtech.com>.
MapR audit records print the errno value to indicate success/failure. Thus status 17 means errno 17 which means EEXIST. Looks like Drill is trying to create a file that already exists.

I’ll defer to others as to why Drill might do that.

Keys
_______________________________
Keys Botzum 
Senior Principal Technologist
kbotzum@mapr.com
443-718-0098 
MapR Technologies 
http://www.mapr.com



On Nov 11, 2015, at 4:09 PM, John Omernik <jo...@omernik.com> wrote:

> I turned on MapR Auditing (This is a handy feature) and found that when I
> run a query (that is giving me access denied.. my query is select * from
> table limit 1) Per MapR the user I am logged in as (mapradm) is trying to
> do a create operation on the .drill.parquet_metadata operation and I
> guessing it's failing with status: 17 (Not sure what this means, successes
> appear to be "0".  What was intersting was the "CREATE" being attempted
> three times.   Any thoughts on why a select * from tables limit 1 would try
> to initiate a create operation on the .drill.parquet_metadata file?
> 
> On Wed, Nov 11, 2015 at 2:25 PM, John Omernik <jo...@omernik.com> wrote:
> 
>> I take it back.
>> 
>> I went to run a query, in the same session that had worked, and now I am
>> getting permission denied.
>> 
>> I do have a query running created new directories every 5 minutes,
>> however, these aren't the directories that are giving me permission denied.
>>  Did you try running an aggregate query accross all data? This is a
>> interesting one to track down, not sure why I am getting the access denied
>> now,
>> 
>> the .drill.parquet_metadata file in the directory that I am getting the
>> error on is owned by mapr:mapr and has rwxr-xr-x  permissions. This tells
>> me that both the user of the drillbits (mapr) and the user I am logged into
>> in sqlline (mapradm) should be able to read the file... so why do I get an
>> access denied in running a query. I any assistance would be valuable here
>> in that there are some great performance increases with the metadata
>> caching, and I don't want to miss out on that.
>> 
>> On Wed, Nov 11, 2015 at 2:18 PM, John Omernik <jo...@omernik.com> wrote:
>> 
>>> All files are owned by mapr:mapr?
>>> 
>>> I have a setup where mapr is the user running the drillbit, but then I
>>> have a directory that is owned by a another user. mapradm:mapradm on all
>>> files. (Permissions on directories and files appears to be rwxr-x-r-x) When
>>> I run the REFRESH TABLE metatdata the .drill.parquet_metadata file gets
>>> created as mapr:mapr with rwxr-xr-x.
>>> 
>>> So
>>> Drillbit User:mapr
>>> Directory (and subdirectories/files) owner: mapradm:mapradm
>>> Directory permissions (all files and folder under main directory)
>>> rwxr-x-r-x
>>> 
>>> I authenticated to drill via sqlline as user mapradm (this user should be
>>> able to read and write just fine to all directories).
>>> 
>>> Now, one thing I did notice is my mapr user was not in the mapradm group,
>>> therefore, didn't have write permissions anywhere... when I fixed that on
>>> all nodes, and then I manually deleted the metadatafiles, things seem to be
>>> working. I wonder if that was my issue?
>>> 
>>> Basically, the user running the drillbits need to be able to write files
>>> (the .drill.parquet_metadata)  or something bad will happen :) I will do
>>> more testing. This may be a good candidate for some documentation work to
>>> understand what permissions are required to be able to query these.
>>> 
>>> 
>>> 
>>> 
>>> On Wed, Nov 11, 2015 at 1:36 PM, Vince Gonzalez <vince.gonzalez@gmail.com
>>>> wrote:
>>> 
>>>> Hi John, I tried this and didn't find any issues. Let me know if I didn't
>>>> follow your reproduction faithfully.
>>>> 
>>>> $ sqlline -u jdbc:drill: -n ec2-user -p mapr
>>>> apache drill 1.2.0
>>>> "drill baby drill"
>>>> 0: jdbc:drill:> refresh table metadata dfs.`/tmp/flows`;
>>>> +-------+------------------------------------------------------+
>>>> |  ok   |                       summary                        |
>>>> +-------+------------------------------------------------------+
>>>> | true  | Successfully updated metadata for table /tmp/flows.  |
>>>> +-------+------------------------------------------------------+
>>>> 1 row selected (32.27 seconds)
>>>> 0: jdbc:drill:> select srcIP,dstIP from dfs.`/tmp/flows` limit 12;
>>>> +---------------+---------------+
>>>> |     srcIP     |     dstIP     |
>>>> +---------------+---------------+
>>>> | 172.16.2.152  | 172.16.1.58   |
>>>> | 172.16.1.58   | 172.16.2.152  |
>>>> | 172.16.2.152  | 172.16.2.73   |
>>>> | 172.16.2.152  | 172.16.2.73   |
>>>> | 172.16.2.73   | 172.16.2.152  |
>>>> | 172.16.2.152  | 172.16.2.73   |
>>>> | 172.16.2.152  | 172.16.2.73   |
>>>> | 172.16.2.152  | 172.16.2.73   |
>>>> | 172.16.2.73   | 172.16.2.152  |
>>>> | 172.16.2.73   | 172.16.2.152  |
>>>> | 172.16.2.73   | 172.16.2.152  |
>>>> | 172.16.2.152  | 172.16.2.73   |
>>>> +---------------+---------------+
>>>> 12 rows selected (5.654 seconds)
>>>> 
>>>> And here's what my table structure looks like (as seen via MapR NFS):
>>>> 
>>>> $ tree /mapr/vgonzalez.drill/tmp/flows/ | head -15
>>>> /mapr/vgonzalez.drill/tmp/flows/
>>>> └── 2015
>>>>    └── 11
>>>>        ├── 10
>>>>        │   ├── 21
>>>>        │   │   ├── 39
>>>>        │   │   │   ├── 03
>>>>        │   │   │   │   ├── _common_metadata
>>>>        │   │   │   │   ├── _metadata
>>>>        │   │   │   │   ├──
>>>> part-r-00000-853882bd-66d8-4505-96ba-f0a282e374de.gz.parquet
>>>>        │   │   │   │   └── _SUCCESS
>>>>        │   │   │   └── 20
>>>>        │   │   │       ├── _common_metadata
>>>>        │   │   │       ├── _metadata
>>>>        │   │   │       ├──
>>>> part-r-00000-37a94549-8e56-46d5-be88-cb28e6d8bc35.gz.parquet
>>>> 
>>>> My parquet was created in Spark, not Drill. Not sure if that's relevant.
>>>> 
>>>> I have authentication and impersonation turned on, and the files are
>>>> owned
>>>> by mapr:mapr. Here's my drill-override.conf:
>>>> 
>>>> drill.exec: {
>>>>  cluster-id: "vgonzalez_drill-drillbits",
>>>> zk.connect:
>>>> 
>>>> "ip-172-16-2-36.ec2.internal:5181,ip-172-16-2-37.ec2.internal:5181,ip-172-16-2-38.ec2.internal:5181"
>>>> }
>>>> drill.exec.impersonation: { enabled: true, max_chained_user_hops: 3 }
>>>> drill.exec { security.user.auth { enabled: true, packages +=
>>>> "org.apache.drill.exec.rpc.user.security", impl: "pam", pam_profiles: [
>>>> "login","sudo","sshd","password-auth" ] } }
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Tue, Nov 10, 2015 at 1:17 PM, John Omernik <jo...@omernik.com> wrote:
>>>> 
>>>>> Cool, looking forward to it.
>>>>> 
>>>>> On Mon, Nov 9, 2015 at 7:21 PM, Vince Gonzalez <
>>>> vince.gonzalez@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> Hey John, I have a secure cluster and some parquet files, I'll try
>>>> this
>>>>> out
>>>>>> and report back.
>>>>>> 
>>>>>> On Monday, November 9, 2015, John Omernik <jo...@omernik.com> wrote:
>>>>>> 
>>>>>>> Has anyone been able to try/test this? I am curious if it's me only
>>>>> issue
>>>>>>> or something more of bug so I can open a JIRA if needed.
>>>>>>> 
>>>>>>> John
>>>>>>> 
>>>>>>> On Fri, Nov 6, 2015 at 11:06 AM, John Omernik <john@omernik.com
>>>>>>> <javascript:;>> wrote:
>>>>>>> 
>>>>>>>> If someone has authorization/authentication setup, to reproduce:
>>>>>>>> 
>>>>>>>> Have a Parquet table with directories underneath the main (I have
>>>>>>>> directories per day)
>>>>>>>> 
>>>>>>>> Then issue REFRESH TABLE METADATA on the root of the table
>>>> running an
>>>>>>>> authenticated user other than the drill bit user. (I am using
>>>> mapr, I
>>>>>>> used
>>>>>>>> my user to run the query, and yes I have access to the data)
>>>>>>>> 
>>>>>>>> Then run a normal query and see what the result is. .
>>>>>>>> 
>>>>>>>> John
>>>>>>>> 
>>>>>>>> On Fri, Nov 6, 2015 at 10:22 AM, Neeraja Rentachintala <
>>>>>>>> nrentachintala@maprtech.com <javascript:;>> wrote:
>>>>>>>> 
>>>>>>>>> This doesn't make sense and seems like a bug.
>>>>>>>>> I think the right behavior is for the Drillbit to access the
>>>> cache
>>>>> as
>>>>>>>>> Drillbit user at the query time (there is no user level metadata
>>>>> cache
>>>>>>> in
>>>>>>>>> Drill at this point).
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Fri, Nov 6, 2015 at 6:57 AM, John Omernik <john@omernik.com
>>>>>>> <javascript:;>> wrote:
>>>>>>>>> 
>>>>>>>>>> I ran REFRESH TABLE METADATA on a table, it completed
>>>>> successfully.
>>>>>>>>>> 
>>>>>>>>>> When I tried a subsequent query, I get a IOException:
>>>> Permission
>>>>>>> Denied
>>>>>>>>> on
>>>>>>>>>> .drill.parquet_metadata.
>>>>>>>>>> 
>>>>>>>>>> I am running drill with authentication.  I ran the REFRESH
>>>> TABLE
>>>>>>>>> METADATA
>>>>>>>>>> as user X, it appears the .drill.parquet_metadata was created
>>>> and
>>>>>>> owned
>>>>>>>>> by
>>>>>>>>>> the user the drill bits are running as as is created with
>>>>>> -rwxr-x-r-x
>>>>>>>>>> 
>>>>>>>>>> My question is this: So, I can see why the file is owned by
>>>> the
>>>>>> drill
>>>>>>>>> bit
>>>>>>>>>> user, and the file is created with all can read permissions,
>>>> but
>>>>> why
>>>>>>> am
>>>>>>>>> I
>>>>>>>>>> getting a permission denied when user X is trying to run a
>>>> query?
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>> 


Re: REFRESH TABLE METADATA - Access Denied

Posted by John Omernik <jo...@omernik.com>.
This is exactly what I am seeing ok, good, that makes me feel a bit better
(I am not crazy!)  Before we file a JIRA, can anyone comment on what may be
happening here? Is this a bug or a feature? Since this is so new, I am not
really sure the expected result...

On Wed, Nov 11, 2015 at 3:25 PM, Vince Gonzalez <vi...@gmail.com>
wrote:

> My files were owned by mapr:mapr. I changed the ownership of everything to
> ec2-user, and now get permission denied on the refresh table metadata
> command, even though impersonation is on and I authenticated as ec2-user.
> If impersonation is working correctly, then I'd expect this should work. Is
> this what you see?
>
> It's also kinda weird in that both users involved should have write access
> to the files - ec2-user is the owner, and mapr is the superuser on MFS.
>
> [ec2-user@ip-172-16-2-36 tmp]$ sudo -u mapr chown -R ec2-user:ec2-user .
> [ec2-user@ip-172-16-2-36 tmp]$ sqlline -u jdbc:drill: -n ec2-user -p mapr
> apache drill 1.2.0
> "a drill is a terrible thing to waste"
> 0: jdbc:drill:> select count(*) from dfs.`/tmp/flows`;
> +---------+
> | EXPR$0  |
> +---------+
> | 370280  |
> +---------+
> 1 row selected (6.452 seconds)
> 0: jdbc:drill:> refresh table metadata dfs.`/tmp/flows`;
>
> +--------+-----------------------------------------------------------------------------------------------------+
> |   ok   |                                               summary
>                                     |
>
> +--------+-----------------------------------------------------------------------------------------------------+
> | false  | Error: 2050.6796.144654
> /tmp/flows/2015/11/11/15/01/20/.drill.parquet_metadata (Permission denied)
>  |
>
> +--------+-----------------------------------------------------------------------------------------------------+
> 1 row selected (3.253 seconds)
>
> $ ls -la flows/2015/11/11/15/01/20/.drill.parquet_metadata
> -rwxr-xr-x 1 ec2-user ec2-user 0 Nov 11 19:55
> flows/2015/11/11/15/01/20/.drill.parquet_metadata
>
>
> Then I tried to CTAS and it works, but apparently impersonation does not:
>
> 0: jdbc:drill:> create table dfs.tmp.flows2 as select * from
> dfs.`/tmp/flows`;
> +-----------+----------------------------+
> | Fragment  | Number of records written  |
> +-----------+----------------------------+
> | 1_1       | 81222                      |
> | 1_3       | 78255                      |
> | 1_0       | 113624                     |
> | 1_2       | 97179                      |
> +-----------+----------------------------+
> 4 rows selected (22.591 seconds)
> 0: jdbc:drill:> refresh table metadata dfs.tmp.flows2;
> +-------+--------------------------------------------------+
> |  ok   |                     summary                      |
> +-------+--------------------------------------------------+
> | true  | Successfully updated metadata for table flows2.  |
> +-------+--------------------------------------------------+
> 1 row selected (0.13 seconds)
>
> $ ls -la flows2/
> total 3499
> drwxr-xr-x 2 ec2-user ec2-user       5 Nov 11 21:18 .
> drwxrwxrwx 4 ec2-user ec2-user       2 Nov 11 21:18 ..
> -rwxr-xr-x 1 ec2-user ec2-user 1068250 Nov 11 21:18 1_0_0.parquet
> -rwxr-xr-x 1 ec2-user ec2-user  789341 Nov 11 21:18 1_1_0.parquet
> -rwxr-xr-x 1 ec2-user ec2-user  952667 Nov 11 21:18 1_2_0.parquet
> -rwxr-xr-x 1 ec2-user ec2-user  755805 Nov 11 21:18 1_3_0.parquet
> -rwxr-xr-x 1 mapr     mapr       14033 Nov 11 21:18 .drill.parquet_metadata
>
>
> Looks like a bug to me. Impersonation doesn't seem to be in force for
> REFRESH TABLE METADATA.
>
>
> On Wed, Nov 11, 2015 at 4:09 PM, John Omernik <jo...@omernik.com> wrote:
>
> > I turned on MapR Auditing (This is a handy feature) and found that when I
> > run a query (that is giving me access denied.. my query is select * from
> > table limit 1) Per MapR the user I am logged in as (mapradm) is trying to
> > do a create operation on the .drill.parquet_metadata operation and I
> > guessing it's failing with status: 17 (Not sure what this means,
> successes
> > appear to be "0".  What was intersting was the "CREATE" being attempted
> > three times.   Any thoughts on why a select * from tables limit 1 would
> try
> > to initiate a create operation on the .drill.parquet_metadata file?
> >
> > On Wed, Nov 11, 2015 at 2:25 PM, John Omernik <jo...@omernik.com> wrote:
> >
> > > I take it back.
> > >
> > > I went to run a query, in the same session that had worked, and now I
> am
> > > getting permission denied.
> > >
> > > I do have a query running created new directories every 5 minutes,
> > > however, these aren't the directories that are giving me permission
> > denied.
> > >   Did you try running an aggregate query accross all data? This is a
> > > interesting one to track down, not sure why I am getting the access
> > denied
> > > now,
> > >
> > > the .drill.parquet_metadata file in the directory that I am getting the
> > > error on is owned by mapr:mapr and has rwxr-xr-x  permissions. This
> tells
> > > me that both the user of the drillbits (mapr) and the user I am logged
> > into
> > > in sqlline (mapradm) should be able to read the file... so why do I get
> > an
> > > access denied in running a query. I any assistance would be valuable
> here
> > > in that there are some great performance increases with the metadata
> > > caching, and I don't want to miss out on that.
> > >
> > > On Wed, Nov 11, 2015 at 2:18 PM, John Omernik <jo...@omernik.com>
> wrote:
> > >
> > >> All files are owned by mapr:mapr?
> > >>
> > >> I have a setup where mapr is the user running the drillbit, but then I
> > >> have a directory that is owned by a another user. mapradm:mapradm on
> all
> > >> files. (Permissions on directories and files appears to be rwxr-x-r-x)
> > When
> > >> I run the REFRESH TABLE metatdata the .drill.parquet_metadata file
> gets
> > >> created as mapr:mapr with rwxr-xr-x.
> > >>
> > >> So
> > >> Drillbit User:mapr
> > >> Directory (and subdirectories/files) owner: mapradm:mapradm
> > >> Directory permissions (all files and folder under main directory)
> > >> rwxr-x-r-x
> > >>
> > >> I authenticated to drill via sqlline as user mapradm (this user should
> > be
> > >> able to read and write just fine to all directories).
> > >>
> > >> Now, one thing I did notice is my mapr user was not in the mapradm
> > group,
> > >> therefore, didn't have write permissions anywhere... when I fixed that
> > on
> > >> all nodes, and then I manually deleted the metadatafiles, things seem
> > to be
> > >> working. I wonder if that was my issue?
> > >>
> > >> Basically, the user running the drillbits need to be able to write
> files
> > >> (the .drill.parquet_metadata)  or something bad will happen :) I will
> do
> > >> more testing. This may be a good candidate for some documentation work
> > to
> > >> understand what permissions are required to be able to query these.
> > >>
> > >>
> > >>
> > >>
> > >> On Wed, Nov 11, 2015 at 1:36 PM, Vince Gonzalez <
> > vince.gonzalez@gmail.com
> > >> > wrote:
> > >>
> > >>> Hi John, I tried this and didn't find any issues. Let me know if I
> > didn't
> > >>> follow your reproduction faithfully.
> > >>>
> > >>> $ sqlline -u jdbc:drill: -n ec2-user -p mapr
> > >>> apache drill 1.2.0
> > >>> "drill baby drill"
> > >>> 0: jdbc:drill:> refresh table metadata dfs.`/tmp/flows`;
> > >>> +-------+------------------------------------------------------+
> > >>> |  ok   |                       summary                        |
> > >>> +-------+------------------------------------------------------+
> > >>> | true  | Successfully updated metadata for table /tmp/flows.  |
> > >>> +-------+------------------------------------------------------+
> > >>> 1 row selected (32.27 seconds)
> > >>> 0: jdbc:drill:> select srcIP,dstIP from dfs.`/tmp/flows` limit 12;
> > >>> +---------------+---------------+
> > >>> |     srcIP     |     dstIP     |
> > >>> +---------------+---------------+
> > >>> | 172.16.2.152  | 172.16.1.58   |
> > >>> | 172.16.1.58   | 172.16.2.152  |
> > >>> | 172.16.2.152  | 172.16.2.73   |
> > >>> | 172.16.2.152  | 172.16.2.73   |
> > >>> | 172.16.2.73   | 172.16.2.152  |
> > >>> | 172.16.2.152  | 172.16.2.73   |
> > >>> | 172.16.2.152  | 172.16.2.73   |
> > >>> | 172.16.2.152  | 172.16.2.73   |
> > >>> | 172.16.2.73   | 172.16.2.152  |
> > >>> | 172.16.2.73   | 172.16.2.152  |
> > >>> | 172.16.2.73   | 172.16.2.152  |
> > >>> | 172.16.2.152  | 172.16.2.73   |
> > >>> +---------------+---------------+
> > >>> 12 rows selected (5.654 seconds)
> > >>>
> > >>> And here's what my table structure looks like (as seen via MapR NFS):
> > >>>
> > >>> $ tree /mapr/vgonzalez.drill/tmp/flows/ | head -15
> > >>> /mapr/vgonzalez.drill/tmp/flows/
> > >>> └── 2015
> > >>>     └── 11
> > >>>         ├── 10
> > >>>         │   ├── 21
> > >>>         │   │   ├── 39
> > >>>         │   │   │   ├── 03
> > >>>         │   │   │   │   ├── _common_metadata
> > >>>         │   │   │   │   ├── _metadata
> > >>>         │   │   │   │   ├──
> > >>> part-r-00000-853882bd-66d8-4505-96ba-f0a282e374de.gz.parquet
> > >>>         │   │   │   │   └── _SUCCESS
> > >>>         │   │   │   └── 20
> > >>>         │   │   │       ├── _common_metadata
> > >>>         │   │   │       ├── _metadata
> > >>>         │   │   │       ├──
> > >>> part-r-00000-37a94549-8e56-46d5-be88-cb28e6d8bc35.gz.parquet
> > >>>
> > >>> My parquet was created in Spark, not Drill. Not sure if that's
> > relevant.
> > >>>
> > >>> I have authentication and impersonation turned on, and the files are
> > >>> owned
> > >>> by mapr:mapr. Here's my drill-override.conf:
> > >>>
> > >>> drill.exec: {
> > >>>   cluster-id: "vgonzalez_drill-drillbits",
> > >>> zk.connect:
> > >>>
> > >>>
> >
> "ip-172-16-2-36.ec2.internal:5181,ip-172-16-2-37.ec2.internal:5181,ip-172-16-2-38.ec2.internal:5181"
> > >>> }
> > >>> drill.exec.impersonation: { enabled: true, max_chained_user_hops: 3 }
> > >>> drill.exec { security.user.auth { enabled: true, packages +=
> > >>> "org.apache.drill.exec.rpc.user.security", impl: "pam",
> pam_profiles: [
> > >>> "login","sudo","sshd","password-auth" ] } }
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> On Tue, Nov 10, 2015 at 1:17 PM, John Omernik <jo...@omernik.com>
> > wrote:
> > >>>
> > >>> > Cool, looking forward to it.
> > >>> >
> > >>> > On Mon, Nov 9, 2015 at 7:21 PM, Vince Gonzalez <
> > >>> vince.gonzalez@gmail.com>
> > >>> > wrote:
> > >>> >
> > >>> > > Hey John, I have a secure cluster and some parquet files, I'll
> try
> > >>> this
> > >>> > out
> > >>> > > and report back.
> > >>> > >
> > >>> > > On Monday, November 9, 2015, John Omernik <jo...@omernik.com>
> > wrote:
> > >>> > >
> > >>> > > > Has anyone been able to try/test this? I am curious if it's me
> > only
> > >>> > issue
> > >>> > > > or something more of bug so I can open a JIRA if needed.
> > >>> > > >
> > >>> > > > John
> > >>> > > >
> > >>> > > > On Fri, Nov 6, 2015 at 11:06 AM, John Omernik <
> john@omernik.com
> > >>> > > > <javascript:;>> wrote:
> > >>> > > >
> > >>> > > > > If someone has authorization/authentication setup, to
> > reproduce:
> > >>> > > > >
> > >>> > > > > Have a Parquet table with directories underneath the main (I
> > have
> > >>> > > > > directories per day)
> > >>> > > > >
> > >>> > > > > Then issue REFRESH TABLE METADATA on the root of the table
> > >>> running an
> > >>> > > > > authenticated user other than the drill bit user. (I am using
> > >>> mapr, I
> > >>> > > > used
> > >>> > > > > my user to run the query, and yes I have access to the data)
> > >>> > > > >
> > >>> > > > > Then run a normal query and see what the result is. .
> > >>> > > > >
> > >>> > > > > John
> > >>> > > > >
> > >>> > > > > On Fri, Nov 6, 2015 at 10:22 AM, Neeraja Rentachintala <
> > >>> > > > > nrentachintala@maprtech.com <javascript:;>> wrote:
> > >>> > > > >
> > >>> > > > >> This doesn't make sense and seems like a bug.
> > >>> > > > >> I think the right behavior is for the Drillbit to access the
> > >>> cache
> > >>> > as
> > >>> > > > >> Drillbit user at the query time (there is no user level
> > metadata
> > >>> > cache
> > >>> > > > in
> > >>> > > > >> Drill at this point).
> > >>> > > > >>
> > >>> > > > >>
> > >>> > > > >>
> > >>> > > > >> On Fri, Nov 6, 2015 at 6:57 AM, John Omernik <
> > john@omernik.com
> > >>> > > > <javascript:;>> wrote:
> > >>> > > > >>
> > >>> > > > >> > I ran REFRESH TABLE METADATA on a table, it completed
> > >>> > successfully.
> > >>> > > > >> >
> > >>> > > > >> > When I tried a subsequent query, I get a IOException:
> > >>> Permission
> > >>> > > > Denied
> > >>> > > > >> on
> > >>> > > > >> > .drill.parquet_metadata.
> > >>> > > > >> >
> > >>> > > > >> > I am running drill with authentication.  I ran the REFRESH
> > >>> TABLE
> > >>> > > > >> METADATA
> > >>> > > > >> > as user X, it appears the .drill.parquet_metadata was
> > created
> > >>> and
> > >>> > > > owned
> > >>> > > > >> by
> > >>> > > > >> > the user the drill bits are running as as is created with
> > >>> > > -rwxr-x-r-x
> > >>> > > > >> >
> > >>> > > > >> > My question is this: So, I can see why the file is owned
> by
> > >>> the
> > >>> > > drill
> > >>> > > > >> bit
> > >>> > > > >> > user, and the file is created with all can read
> permissions,
> > >>> but
> > >>> > why
> > >>> > > > am
> > >>> > > > >> I
> > >>> > > > >> > getting a permission denied when user X is trying to run a
> > >>> query?
> > >>> > > > >> >
> > >>> > > > >>
> > >>> > > > >
> > >>> > > > >
> > >>> > > >
> > >>> > >
> > >>> >
> > >>>
> > >>
> > >>
> > >
> >
>

Re: REFRESH TABLE METADATA - Access Denied

Posted by Vince Gonzalez <vi...@gmail.com>.
My files were owned by mapr:mapr. I changed the ownership of everything to
ec2-user, and now get permission denied on the refresh table metadata
command, even though impersonation is on and I authenticated as ec2-user.
If impersonation is working correctly, then I'd expect this should work. Is
this what you see?

It's also kinda weird in that both users involved should have write access
to the files - ec2-user is the owner, and mapr is the superuser on MFS.

[ec2-user@ip-172-16-2-36 tmp]$ sudo -u mapr chown -R ec2-user:ec2-user .
[ec2-user@ip-172-16-2-36 tmp]$ sqlline -u jdbc:drill: -n ec2-user -p mapr
apache drill 1.2.0
"a drill is a terrible thing to waste"
0: jdbc:drill:> select count(*) from dfs.`/tmp/flows`;
+---------+
| EXPR$0  |
+---------+
| 370280  |
+---------+
1 row selected (6.452 seconds)
0: jdbc:drill:> refresh table metadata dfs.`/tmp/flows`;
+--------+-----------------------------------------------------------------------------------------------------+
|   ok   |                                               summary
                                    |
+--------+-----------------------------------------------------------------------------------------------------+
| false  | Error: 2050.6796.144654
/tmp/flows/2015/11/11/15/01/20/.drill.parquet_metadata (Permission denied)
 |
+--------+-----------------------------------------------------------------------------------------------------+
1 row selected (3.253 seconds)

$ ls -la flows/2015/11/11/15/01/20/.drill.parquet_metadata
-rwxr-xr-x 1 ec2-user ec2-user 0 Nov 11 19:55
flows/2015/11/11/15/01/20/.drill.parquet_metadata


Then I tried to CTAS and it works, but apparently impersonation does not:

0: jdbc:drill:> create table dfs.tmp.flows2 as select * from
dfs.`/tmp/flows`;
+-----------+----------------------------+
| Fragment  | Number of records written  |
+-----------+----------------------------+
| 1_1       | 81222                      |
| 1_3       | 78255                      |
| 1_0       | 113624                     |
| 1_2       | 97179                      |
+-----------+----------------------------+
4 rows selected (22.591 seconds)
0: jdbc:drill:> refresh table metadata dfs.tmp.flows2;
+-------+--------------------------------------------------+
|  ok   |                     summary                      |
+-------+--------------------------------------------------+
| true  | Successfully updated metadata for table flows2.  |
+-------+--------------------------------------------------+
1 row selected (0.13 seconds)

$ ls -la flows2/
total 3499
drwxr-xr-x 2 ec2-user ec2-user       5 Nov 11 21:18 .
drwxrwxrwx 4 ec2-user ec2-user       2 Nov 11 21:18 ..
-rwxr-xr-x 1 ec2-user ec2-user 1068250 Nov 11 21:18 1_0_0.parquet
-rwxr-xr-x 1 ec2-user ec2-user  789341 Nov 11 21:18 1_1_0.parquet
-rwxr-xr-x 1 ec2-user ec2-user  952667 Nov 11 21:18 1_2_0.parquet
-rwxr-xr-x 1 ec2-user ec2-user  755805 Nov 11 21:18 1_3_0.parquet
-rwxr-xr-x 1 mapr     mapr       14033 Nov 11 21:18 .drill.parquet_metadata


Looks like a bug to me. Impersonation doesn't seem to be in force for
REFRESH TABLE METADATA.


On Wed, Nov 11, 2015 at 4:09 PM, John Omernik <jo...@omernik.com> wrote:

> I turned on MapR Auditing (This is a handy feature) and found that when I
> run a query (that is giving me access denied.. my query is select * from
> table limit 1) Per MapR the user I am logged in as (mapradm) is trying to
> do a create operation on the .drill.parquet_metadata operation and I
> guessing it's failing with status: 17 (Not sure what this means, successes
> appear to be "0".  What was intersting was the "CREATE" being attempted
> three times.   Any thoughts on why a select * from tables limit 1 would try
> to initiate a create operation on the .drill.parquet_metadata file?
>
> On Wed, Nov 11, 2015 at 2:25 PM, John Omernik <jo...@omernik.com> wrote:
>
> > I take it back.
> >
> > I went to run a query, in the same session that had worked, and now I am
> > getting permission denied.
> >
> > I do have a query running created new directories every 5 minutes,
> > however, these aren't the directories that are giving me permission
> denied.
> >   Did you try running an aggregate query accross all data? This is a
> > interesting one to track down, not sure why I am getting the access
> denied
> > now,
> >
> > the .drill.parquet_metadata file in the directory that I am getting the
> > error on is owned by mapr:mapr and has rwxr-xr-x  permissions. This tells
> > me that both the user of the drillbits (mapr) and the user I am logged
> into
> > in sqlline (mapradm) should be able to read the file... so why do I get
> an
> > access denied in running a query. I any assistance would be valuable here
> > in that there are some great performance increases with the metadata
> > caching, and I don't want to miss out on that.
> >
> > On Wed, Nov 11, 2015 at 2:18 PM, John Omernik <jo...@omernik.com> wrote:
> >
> >> All files are owned by mapr:mapr?
> >>
> >> I have a setup where mapr is the user running the drillbit, but then I
> >> have a directory that is owned by a another user. mapradm:mapradm on all
> >> files. (Permissions on directories and files appears to be rwxr-x-r-x)
> When
> >> I run the REFRESH TABLE metatdata the .drill.parquet_metadata file gets
> >> created as mapr:mapr with rwxr-xr-x.
> >>
> >> So
> >> Drillbit User:mapr
> >> Directory (and subdirectories/files) owner: mapradm:mapradm
> >> Directory permissions (all files and folder under main directory)
> >> rwxr-x-r-x
> >>
> >> I authenticated to drill via sqlline as user mapradm (this user should
> be
> >> able to read and write just fine to all directories).
> >>
> >> Now, one thing I did notice is my mapr user was not in the mapradm
> group,
> >> therefore, didn't have write permissions anywhere... when I fixed that
> on
> >> all nodes, and then I manually deleted the metadatafiles, things seem
> to be
> >> working. I wonder if that was my issue?
> >>
> >> Basically, the user running the drillbits need to be able to write files
> >> (the .drill.parquet_metadata)  or something bad will happen :) I will do
> >> more testing. This may be a good candidate for some documentation work
> to
> >> understand what permissions are required to be able to query these.
> >>
> >>
> >>
> >>
> >> On Wed, Nov 11, 2015 at 1:36 PM, Vince Gonzalez <
> vince.gonzalez@gmail.com
> >> > wrote:
> >>
> >>> Hi John, I tried this and didn't find any issues. Let me know if I
> didn't
> >>> follow your reproduction faithfully.
> >>>
> >>> $ sqlline -u jdbc:drill: -n ec2-user -p mapr
> >>> apache drill 1.2.0
> >>> "drill baby drill"
> >>> 0: jdbc:drill:> refresh table metadata dfs.`/tmp/flows`;
> >>> +-------+------------------------------------------------------+
> >>> |  ok   |                       summary                        |
> >>> +-------+------------------------------------------------------+
> >>> | true  | Successfully updated metadata for table /tmp/flows.  |
> >>> +-------+------------------------------------------------------+
> >>> 1 row selected (32.27 seconds)
> >>> 0: jdbc:drill:> select srcIP,dstIP from dfs.`/tmp/flows` limit 12;
> >>> +---------------+---------------+
> >>> |     srcIP     |     dstIP     |
> >>> +---------------+---------------+
> >>> | 172.16.2.152  | 172.16.1.58   |
> >>> | 172.16.1.58   | 172.16.2.152  |
> >>> | 172.16.2.152  | 172.16.2.73   |
> >>> | 172.16.2.152  | 172.16.2.73   |
> >>> | 172.16.2.73   | 172.16.2.152  |
> >>> | 172.16.2.152  | 172.16.2.73   |
> >>> | 172.16.2.152  | 172.16.2.73   |
> >>> | 172.16.2.152  | 172.16.2.73   |
> >>> | 172.16.2.73   | 172.16.2.152  |
> >>> | 172.16.2.73   | 172.16.2.152  |
> >>> | 172.16.2.73   | 172.16.2.152  |
> >>> | 172.16.2.152  | 172.16.2.73   |
> >>> +---------------+---------------+
> >>> 12 rows selected (5.654 seconds)
> >>>
> >>> And here's what my table structure looks like (as seen via MapR NFS):
> >>>
> >>> $ tree /mapr/vgonzalez.drill/tmp/flows/ | head -15
> >>> /mapr/vgonzalez.drill/tmp/flows/
> >>> └── 2015
> >>>     └── 11
> >>>         ├── 10
> >>>         │   ├── 21
> >>>         │   │   ├── 39
> >>>         │   │   │   ├── 03
> >>>         │   │   │   │   ├── _common_metadata
> >>>         │   │   │   │   ├── _metadata
> >>>         │   │   │   │   ├──
> >>> part-r-00000-853882bd-66d8-4505-96ba-f0a282e374de.gz.parquet
> >>>         │   │   │   │   └── _SUCCESS
> >>>         │   │   │   └── 20
> >>>         │   │   │       ├── _common_metadata
> >>>         │   │   │       ├── _metadata
> >>>         │   │   │       ├──
> >>> part-r-00000-37a94549-8e56-46d5-be88-cb28e6d8bc35.gz.parquet
> >>>
> >>> My parquet was created in Spark, not Drill. Not sure if that's
> relevant.
> >>>
> >>> I have authentication and impersonation turned on, and the files are
> >>> owned
> >>> by mapr:mapr. Here's my drill-override.conf:
> >>>
> >>> drill.exec: {
> >>>   cluster-id: "vgonzalez_drill-drillbits",
> >>> zk.connect:
> >>>
> >>>
> "ip-172-16-2-36.ec2.internal:5181,ip-172-16-2-37.ec2.internal:5181,ip-172-16-2-38.ec2.internal:5181"
> >>> }
> >>> drill.exec.impersonation: { enabled: true, max_chained_user_hops: 3 }
> >>> drill.exec { security.user.auth { enabled: true, packages +=
> >>> "org.apache.drill.exec.rpc.user.security", impl: "pam", pam_profiles: [
> >>> "login","sudo","sshd","password-auth" ] } }
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Tue, Nov 10, 2015 at 1:17 PM, John Omernik <jo...@omernik.com>
> wrote:
> >>>
> >>> > Cool, looking forward to it.
> >>> >
> >>> > On Mon, Nov 9, 2015 at 7:21 PM, Vince Gonzalez <
> >>> vince.gonzalez@gmail.com>
> >>> > wrote:
> >>> >
> >>> > > Hey John, I have a secure cluster and some parquet files, I'll try
> >>> this
> >>> > out
> >>> > > and report back.
> >>> > >
> >>> > > On Monday, November 9, 2015, John Omernik <jo...@omernik.com>
> wrote:
> >>> > >
> >>> > > > Has anyone been able to try/test this? I am curious if it's me
> only
> >>> > issue
> >>> > > > or something more of bug so I can open a JIRA if needed.
> >>> > > >
> >>> > > > John
> >>> > > >
> >>> > > > On Fri, Nov 6, 2015 at 11:06 AM, John Omernik <john@omernik.com
> >>> > > > <javascript:;>> wrote:
> >>> > > >
> >>> > > > > If someone has authorization/authentication setup, to
> reproduce:
> >>> > > > >
> >>> > > > > Have a Parquet table with directories underneath the main (I
> have
> >>> > > > > directories per day)
> >>> > > > >
> >>> > > > > Then issue REFRESH TABLE METADATA on the root of the table
> >>> running an
> >>> > > > > authenticated user other than the drill bit user. (I am using
> >>> mapr, I
> >>> > > > used
> >>> > > > > my user to run the query, and yes I have access to the data)
> >>> > > > >
> >>> > > > > Then run a normal query and see what the result is. .
> >>> > > > >
> >>> > > > > John
> >>> > > > >
> >>> > > > > On Fri, Nov 6, 2015 at 10:22 AM, Neeraja Rentachintala <
> >>> > > > > nrentachintala@maprtech.com <javascript:;>> wrote:
> >>> > > > >
> >>> > > > >> This doesn't make sense and seems like a bug.
> >>> > > > >> I think the right behavior is for the Drillbit to access the
> >>> cache
> >>> > as
> >>> > > > >> Drillbit user at the query time (there is no user level
> metadata
> >>> > cache
> >>> > > > in
> >>> > > > >> Drill at this point).
> >>> > > > >>
> >>> > > > >>
> >>> > > > >>
> >>> > > > >> On Fri, Nov 6, 2015 at 6:57 AM, John Omernik <
> john@omernik.com
> >>> > > > <javascript:;>> wrote:
> >>> > > > >>
> >>> > > > >> > I ran REFRESH TABLE METADATA on a table, it completed
> >>> > successfully.
> >>> > > > >> >
> >>> > > > >> > When I tried a subsequent query, I get a IOException:
> >>> Permission
> >>> > > > Denied
> >>> > > > >> on
> >>> > > > >> > .drill.parquet_metadata.
> >>> > > > >> >
> >>> > > > >> > I am running drill with authentication.  I ran the REFRESH
> >>> TABLE
> >>> > > > >> METADATA
> >>> > > > >> > as user X, it appears the .drill.parquet_metadata was
> created
> >>> and
> >>> > > > owned
> >>> > > > >> by
> >>> > > > >> > the user the drill bits are running as as is created with
> >>> > > -rwxr-x-r-x
> >>> > > > >> >
> >>> > > > >> > My question is this: So, I can see why the file is owned by
> >>> the
> >>> > > drill
> >>> > > > >> bit
> >>> > > > >> > user, and the file is created with all can read permissions,
> >>> but
> >>> > why
> >>> > > > am
> >>> > > > >> I
> >>> > > > >> > getting a permission denied when user X is trying to run a
> >>> query?
> >>> > > > >> >
> >>> > > > >>
> >>> > > > >
> >>> > > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> >>
> >>
> >
>

Re: REFRESH TABLE METADATA - Access Denied

Posted by John Omernik <jo...@omernik.com>.
I turned on MapR Auditing (This is a handy feature) and found that when I
run a query (that is giving me access denied.. my query is select * from
table limit 1) Per MapR the user I am logged in as (mapradm) is trying to
do a create operation on the .drill.parquet_metadata operation and I
guessing it's failing with status: 17 (Not sure what this means, successes
appear to be "0".  What was intersting was the "CREATE" being attempted
three times.   Any thoughts on why a select * from tables limit 1 would try
to initiate a create operation on the .drill.parquet_metadata file?

On Wed, Nov 11, 2015 at 2:25 PM, John Omernik <jo...@omernik.com> wrote:

> I take it back.
>
> I went to run a query, in the same session that had worked, and now I am
> getting permission denied.
>
> I do have a query running created new directories every 5 minutes,
> however, these aren't the directories that are giving me permission denied.
>   Did you try running an aggregate query accross all data? This is a
> interesting one to track down, not sure why I am getting the access denied
> now,
>
> the .drill.parquet_metadata file in the directory that I am getting the
> error on is owned by mapr:mapr and has rwxr-xr-x  permissions. This tells
> me that both the user of the drillbits (mapr) and the user I am logged into
> in sqlline (mapradm) should be able to read the file... so why do I get an
> access denied in running a query. I any assistance would be valuable here
> in that there are some great performance increases with the metadata
> caching, and I don't want to miss out on that.
>
> On Wed, Nov 11, 2015 at 2:18 PM, John Omernik <jo...@omernik.com> wrote:
>
>> All files are owned by mapr:mapr?
>>
>> I have a setup where mapr is the user running the drillbit, but then I
>> have a directory that is owned by a another user. mapradm:mapradm on all
>> files. (Permissions on directories and files appears to be rwxr-x-r-x) When
>> I run the REFRESH TABLE metatdata the .drill.parquet_metadata file gets
>> created as mapr:mapr with rwxr-xr-x.
>>
>> So
>> Drillbit User:mapr
>> Directory (and subdirectories/files) owner: mapradm:mapradm
>> Directory permissions (all files and folder under main directory)
>> rwxr-x-r-x
>>
>> I authenticated to drill via sqlline as user mapradm (this user should be
>> able to read and write just fine to all directories).
>>
>> Now, one thing I did notice is my mapr user was not in the mapradm group,
>> therefore, didn't have write permissions anywhere... when I fixed that on
>> all nodes, and then I manually deleted the metadatafiles, things seem to be
>> working. I wonder if that was my issue?
>>
>> Basically, the user running the drillbits need to be able to write files
>> (the .drill.parquet_metadata)  or something bad will happen :) I will do
>> more testing. This may be a good candidate for some documentation work to
>> understand what permissions are required to be able to query these.
>>
>>
>>
>>
>> On Wed, Nov 11, 2015 at 1:36 PM, Vince Gonzalez <vince.gonzalez@gmail.com
>> > wrote:
>>
>>> Hi John, I tried this and didn't find any issues. Let me know if I didn't
>>> follow your reproduction faithfully.
>>>
>>> $ sqlline -u jdbc:drill: -n ec2-user -p mapr
>>> apache drill 1.2.0
>>> "drill baby drill"
>>> 0: jdbc:drill:> refresh table metadata dfs.`/tmp/flows`;
>>> +-------+------------------------------------------------------+
>>> |  ok   |                       summary                        |
>>> +-------+------------------------------------------------------+
>>> | true  | Successfully updated metadata for table /tmp/flows.  |
>>> +-------+------------------------------------------------------+
>>> 1 row selected (32.27 seconds)
>>> 0: jdbc:drill:> select srcIP,dstIP from dfs.`/tmp/flows` limit 12;
>>> +---------------+---------------+
>>> |     srcIP     |     dstIP     |
>>> +---------------+---------------+
>>> | 172.16.2.152  | 172.16.1.58   |
>>> | 172.16.1.58   | 172.16.2.152  |
>>> | 172.16.2.152  | 172.16.2.73   |
>>> | 172.16.2.152  | 172.16.2.73   |
>>> | 172.16.2.73   | 172.16.2.152  |
>>> | 172.16.2.152  | 172.16.2.73   |
>>> | 172.16.2.152  | 172.16.2.73   |
>>> | 172.16.2.152  | 172.16.2.73   |
>>> | 172.16.2.73   | 172.16.2.152  |
>>> | 172.16.2.73   | 172.16.2.152  |
>>> | 172.16.2.73   | 172.16.2.152  |
>>> | 172.16.2.152  | 172.16.2.73   |
>>> +---------------+---------------+
>>> 12 rows selected (5.654 seconds)
>>>
>>> And here's what my table structure looks like (as seen via MapR NFS):
>>>
>>> $ tree /mapr/vgonzalez.drill/tmp/flows/ | head -15
>>> /mapr/vgonzalez.drill/tmp/flows/
>>> └── 2015
>>>     └── 11
>>>         ├── 10
>>>         │   ├── 21
>>>         │   │   ├── 39
>>>         │   │   │   ├── 03
>>>         │   │   │   │   ├── _common_metadata
>>>         │   │   │   │   ├── _metadata
>>>         │   │   │   │   ├──
>>> part-r-00000-853882bd-66d8-4505-96ba-f0a282e374de.gz.parquet
>>>         │   │   │   │   └── _SUCCESS
>>>         │   │   │   └── 20
>>>         │   │   │       ├── _common_metadata
>>>         │   │   │       ├── _metadata
>>>         │   │   │       ├──
>>> part-r-00000-37a94549-8e56-46d5-be88-cb28e6d8bc35.gz.parquet
>>>
>>> My parquet was created in Spark, not Drill. Not sure if that's relevant.
>>>
>>> I have authentication and impersonation turned on, and the files are
>>> owned
>>> by mapr:mapr. Here's my drill-override.conf:
>>>
>>> drill.exec: {
>>>   cluster-id: "vgonzalez_drill-drillbits",
>>> zk.connect:
>>>
>>> "ip-172-16-2-36.ec2.internal:5181,ip-172-16-2-37.ec2.internal:5181,ip-172-16-2-38.ec2.internal:5181"
>>> }
>>> drill.exec.impersonation: { enabled: true, max_chained_user_hops: 3 }
>>> drill.exec { security.user.auth { enabled: true, packages +=
>>> "org.apache.drill.exec.rpc.user.security", impl: "pam", pam_profiles: [
>>> "login","sudo","sshd","password-auth" ] } }
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Nov 10, 2015 at 1:17 PM, John Omernik <jo...@omernik.com> wrote:
>>>
>>> > Cool, looking forward to it.
>>> >
>>> > On Mon, Nov 9, 2015 at 7:21 PM, Vince Gonzalez <
>>> vince.gonzalez@gmail.com>
>>> > wrote:
>>> >
>>> > > Hey John, I have a secure cluster and some parquet files, I'll try
>>> this
>>> > out
>>> > > and report back.
>>> > >
>>> > > On Monday, November 9, 2015, John Omernik <jo...@omernik.com> wrote:
>>> > >
>>> > > > Has anyone been able to try/test this? I am curious if it's me only
>>> > issue
>>> > > > or something more of bug so I can open a JIRA if needed.
>>> > > >
>>> > > > John
>>> > > >
>>> > > > On Fri, Nov 6, 2015 at 11:06 AM, John Omernik <john@omernik.com
>>> > > > <javascript:;>> wrote:
>>> > > >
>>> > > > > If someone has authorization/authentication setup, to reproduce:
>>> > > > >
>>> > > > > Have a Parquet table with directories underneath the main (I have
>>> > > > > directories per day)
>>> > > > >
>>> > > > > Then issue REFRESH TABLE METADATA on the root of the table
>>> running an
>>> > > > > authenticated user other than the drill bit user. (I am using
>>> mapr, I
>>> > > > used
>>> > > > > my user to run the query, and yes I have access to the data)
>>> > > > >
>>> > > > > Then run a normal query and see what the result is. .
>>> > > > >
>>> > > > > John
>>> > > > >
>>> > > > > On Fri, Nov 6, 2015 at 10:22 AM, Neeraja Rentachintala <
>>> > > > > nrentachintala@maprtech.com <javascript:;>> wrote:
>>> > > > >
>>> > > > >> This doesn't make sense and seems like a bug.
>>> > > > >> I think the right behavior is for the Drillbit to access the
>>> cache
>>> > as
>>> > > > >> Drillbit user at the query time (there is no user level metadata
>>> > cache
>>> > > > in
>>> > > > >> Drill at this point).
>>> > > > >>
>>> > > > >>
>>> > > > >>
>>> > > > >> On Fri, Nov 6, 2015 at 6:57 AM, John Omernik <john@omernik.com
>>> > > > <javascript:;>> wrote:
>>> > > > >>
>>> > > > >> > I ran REFRESH TABLE METADATA on a table, it completed
>>> > successfully.
>>> > > > >> >
>>> > > > >> > When I tried a subsequent query, I get a IOException:
>>> Permission
>>> > > > Denied
>>> > > > >> on
>>> > > > >> > .drill.parquet_metadata.
>>> > > > >> >
>>> > > > >> > I am running drill with authentication.  I ran the REFRESH
>>> TABLE
>>> > > > >> METADATA
>>> > > > >> > as user X, it appears the .drill.parquet_metadata was created
>>> and
>>> > > > owned
>>> > > > >> by
>>> > > > >> > the user the drill bits are running as as is created with
>>> > > -rwxr-x-r-x
>>> > > > >> >
>>> > > > >> > My question is this: So, I can see why the file is owned by
>>> the
>>> > > drill
>>> > > > >> bit
>>> > > > >> > user, and the file is created with all can read permissions,
>>> but
>>> > why
>>> > > > am
>>> > > > >> I
>>> > > > >> > getting a permission denied when user X is trying to run a
>>> query?
>>> > > > >> >
>>> > > > >>
>>> > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>>
>>
>>
>

Re: REFRESH TABLE METADATA - Access Denied

Posted by John Omernik <jo...@omernik.com>.
I take it back.

I went to run a query, in the same session that had worked, and now I am
getting permission denied.

I do have a query running created new directories every 5 minutes, however,
these aren't the directories that are giving me permission denied.   Did
you try running an aggregate query accross all data? This is a interesting
one to track down, not sure why I am getting the access denied now,

the .drill.parquet_metadata file in the directory that I am getting the
error on is owned by mapr:mapr and has rwxr-xr-x  permissions. This tells
me that both the user of the drillbits (mapr) and the user I am logged into
in sqlline (mapradm) should be able to read the file... so why do I get an
access denied in running a query. I any assistance would be valuable here
in that there are some great performance increases with the metadata
caching, and I don't want to miss out on that.

On Wed, Nov 11, 2015 at 2:18 PM, John Omernik <jo...@omernik.com> wrote:

> All files are owned by mapr:mapr?
>
> I have a setup where mapr is the user running the drillbit, but then I
> have a directory that is owned by a another user. mapradm:mapradm on all
> files. (Permissions on directories and files appears to be rwxr-x-r-x) When
> I run the REFRESH TABLE metatdata the .drill.parquet_metadata file gets
> created as mapr:mapr with rwxr-xr-x.
>
> So
> Drillbit User:mapr
> Directory (and subdirectories/files) owner: mapradm:mapradm
> Directory permissions (all files and folder under main directory)
> rwxr-x-r-x
>
> I authenticated to drill via sqlline as user mapradm (this user should be
> able to read and write just fine to all directories).
>
> Now, one thing I did notice is my mapr user was not in the mapradm group,
> therefore, didn't have write permissions anywhere... when I fixed that on
> all nodes, and then I manually deleted the metadatafiles, things seem to be
> working. I wonder if that was my issue?
>
> Basically, the user running the drillbits need to be able to write files
> (the .drill.parquet_metadata)  or something bad will happen :) I will do
> more testing. This may be a good candidate for some documentation work to
> understand what permissions are required to be able to query these.
>
>
>
>
> On Wed, Nov 11, 2015 at 1:36 PM, Vince Gonzalez <vi...@gmail.com>
> wrote:
>
>> Hi John, I tried this and didn't find any issues. Let me know if I didn't
>> follow your reproduction faithfully.
>>
>> $ sqlline -u jdbc:drill: -n ec2-user -p mapr
>> apache drill 1.2.0
>> "drill baby drill"
>> 0: jdbc:drill:> refresh table metadata dfs.`/tmp/flows`;
>> +-------+------------------------------------------------------+
>> |  ok   |                       summary                        |
>> +-------+------------------------------------------------------+
>> | true  | Successfully updated metadata for table /tmp/flows.  |
>> +-------+------------------------------------------------------+
>> 1 row selected (32.27 seconds)
>> 0: jdbc:drill:> select srcIP,dstIP from dfs.`/tmp/flows` limit 12;
>> +---------------+---------------+
>> |     srcIP     |     dstIP     |
>> +---------------+---------------+
>> | 172.16.2.152  | 172.16.1.58   |
>> | 172.16.1.58   | 172.16.2.152  |
>> | 172.16.2.152  | 172.16.2.73   |
>> | 172.16.2.152  | 172.16.2.73   |
>> | 172.16.2.73   | 172.16.2.152  |
>> | 172.16.2.152  | 172.16.2.73   |
>> | 172.16.2.152  | 172.16.2.73   |
>> | 172.16.2.152  | 172.16.2.73   |
>> | 172.16.2.73   | 172.16.2.152  |
>> | 172.16.2.73   | 172.16.2.152  |
>> | 172.16.2.73   | 172.16.2.152  |
>> | 172.16.2.152  | 172.16.2.73   |
>> +---------------+---------------+
>> 12 rows selected (5.654 seconds)
>>
>> And here's what my table structure looks like (as seen via MapR NFS):
>>
>> $ tree /mapr/vgonzalez.drill/tmp/flows/ | head -15
>> /mapr/vgonzalez.drill/tmp/flows/
>> └── 2015
>>     └── 11
>>         ├── 10
>>         │   ├── 21
>>         │   │   ├── 39
>>         │   │   │   ├── 03
>>         │   │   │   │   ├── _common_metadata
>>         │   │   │   │   ├── _metadata
>>         │   │   │   │   ├──
>> part-r-00000-853882bd-66d8-4505-96ba-f0a282e374de.gz.parquet
>>         │   │   │   │   └── _SUCCESS
>>         │   │   │   └── 20
>>         │   │   │       ├── _common_metadata
>>         │   │   │       ├── _metadata
>>         │   │   │       ├──
>> part-r-00000-37a94549-8e56-46d5-be88-cb28e6d8bc35.gz.parquet
>>
>> My parquet was created in Spark, not Drill. Not sure if that's relevant.
>>
>> I have authentication and impersonation turned on, and the files are owned
>> by mapr:mapr. Here's my drill-override.conf:
>>
>> drill.exec: {
>>   cluster-id: "vgonzalez_drill-drillbits",
>> zk.connect:
>>
>> "ip-172-16-2-36.ec2.internal:5181,ip-172-16-2-37.ec2.internal:5181,ip-172-16-2-38.ec2.internal:5181"
>> }
>> drill.exec.impersonation: { enabled: true, max_chained_user_hops: 3 }
>> drill.exec { security.user.auth { enabled: true, packages +=
>> "org.apache.drill.exec.rpc.user.security", impl: "pam", pam_profiles: [
>> "login","sudo","sshd","password-auth" ] } }
>>
>>
>>
>>
>>
>> On Tue, Nov 10, 2015 at 1:17 PM, John Omernik <jo...@omernik.com> wrote:
>>
>> > Cool, looking forward to it.
>> >
>> > On Mon, Nov 9, 2015 at 7:21 PM, Vince Gonzalez <
>> vince.gonzalez@gmail.com>
>> > wrote:
>> >
>> > > Hey John, I have a secure cluster and some parquet files, I'll try
>> this
>> > out
>> > > and report back.
>> > >
>> > > On Monday, November 9, 2015, John Omernik <jo...@omernik.com> wrote:
>> > >
>> > > > Has anyone been able to try/test this? I am curious if it's me only
>> > issue
>> > > > or something more of bug so I can open a JIRA if needed.
>> > > >
>> > > > John
>> > > >
>> > > > On Fri, Nov 6, 2015 at 11:06 AM, John Omernik <john@omernik.com
>> > > > <javascript:;>> wrote:
>> > > >
>> > > > > If someone has authorization/authentication setup, to reproduce:
>> > > > >
>> > > > > Have a Parquet table with directories underneath the main (I have
>> > > > > directories per day)
>> > > > >
>> > > > > Then issue REFRESH TABLE METADATA on the root of the table
>> running an
>> > > > > authenticated user other than the drill bit user. (I am using
>> mapr, I
>> > > > used
>> > > > > my user to run the query, and yes I have access to the data)
>> > > > >
>> > > > > Then run a normal query and see what the result is. .
>> > > > >
>> > > > > John
>> > > > >
>> > > > > On Fri, Nov 6, 2015 at 10:22 AM, Neeraja Rentachintala <
>> > > > > nrentachintala@maprtech.com <javascript:;>> wrote:
>> > > > >
>> > > > >> This doesn't make sense and seems like a bug.
>> > > > >> I think the right behavior is for the Drillbit to access the
>> cache
>> > as
>> > > > >> Drillbit user at the query time (there is no user level metadata
>> > cache
>> > > > in
>> > > > >> Drill at this point).
>> > > > >>
>> > > > >>
>> > > > >>
>> > > > >> On Fri, Nov 6, 2015 at 6:57 AM, John Omernik <john@omernik.com
>> > > > <javascript:;>> wrote:
>> > > > >>
>> > > > >> > I ran REFRESH TABLE METADATA on a table, it completed
>> > successfully.
>> > > > >> >
>> > > > >> > When I tried a subsequent query, I get a IOException:
>> Permission
>> > > > Denied
>> > > > >> on
>> > > > >> > .drill.parquet_metadata.
>> > > > >> >
>> > > > >> > I am running drill with authentication.  I ran the REFRESH
>> TABLE
>> > > > >> METADATA
>> > > > >> > as user X, it appears the .drill.parquet_metadata was created
>> and
>> > > > owned
>> > > > >> by
>> > > > >> > the user the drill bits are running as as is created with
>> > > -rwxr-x-r-x
>> > > > >> >
>> > > > >> > My question is this: So, I can see why the file is owned by the
>> > > drill
>> > > > >> bit
>> > > > >> > user, and the file is created with all can read permissions,
>> but
>> > why
>> > > > am
>> > > > >> I
>> > > > >> > getting a permission denied when user X is trying to run a
>> query?
>> > > > >> >
>> > > > >>
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: REFRESH TABLE METADATA - Access Denied

Posted by John Omernik <jo...@omernik.com>.
All files are owned by mapr:mapr?

I have a setup where mapr is the user running the drillbit, but then I have
a directory that is owned by a another user. mapradm:mapradm on all files.
(Permissions on directories and files appears to be rwxr-x-r-x) When I run
the REFRESH TABLE metatdata the .drill.parquet_metadata file gets created
as mapr:mapr with rwxr-xr-x.

So
Drillbit User:mapr
Directory (and subdirectories/files) owner: mapradm:mapradm
Directory permissions (all files and folder under main directory) rwxr-x-r-x

I authenticated to drill via sqlline as user mapradm (this user should be
able to read and write just fine to all directories).

Now, one thing I did notice is my mapr user was not in the mapradm group,
therefore, didn't have write permissions anywhere... when I fixed that on
all nodes, and then I manually deleted the metadatafiles, things seem to be
working. I wonder if that was my issue?

Basically, the user running the drillbits need to be able to write files
(the .drill.parquet_metadata)  or something bad will happen :) I will do
more testing. This may be a good candidate for some documentation work to
understand what permissions are required to be able to query these.




On Wed, Nov 11, 2015 at 1:36 PM, Vince Gonzalez <vi...@gmail.com>
wrote:

> Hi John, I tried this and didn't find any issues. Let me know if I didn't
> follow your reproduction faithfully.
>
> $ sqlline -u jdbc:drill: -n ec2-user -p mapr
> apache drill 1.2.0
> "drill baby drill"
> 0: jdbc:drill:> refresh table metadata dfs.`/tmp/flows`;
> +-------+------------------------------------------------------+
> |  ok   |                       summary                        |
> +-------+------------------------------------------------------+
> | true  | Successfully updated metadata for table /tmp/flows.  |
> +-------+------------------------------------------------------+
> 1 row selected (32.27 seconds)
> 0: jdbc:drill:> select srcIP,dstIP from dfs.`/tmp/flows` limit 12;
> +---------------+---------------+
> |     srcIP     |     dstIP     |
> +---------------+---------------+
> | 172.16.2.152  | 172.16.1.58   |
> | 172.16.1.58   | 172.16.2.152  |
> | 172.16.2.152  | 172.16.2.73   |
> | 172.16.2.152  | 172.16.2.73   |
> | 172.16.2.73   | 172.16.2.152  |
> | 172.16.2.152  | 172.16.2.73   |
> | 172.16.2.152  | 172.16.2.73   |
> | 172.16.2.152  | 172.16.2.73   |
> | 172.16.2.73   | 172.16.2.152  |
> | 172.16.2.73   | 172.16.2.152  |
> | 172.16.2.73   | 172.16.2.152  |
> | 172.16.2.152  | 172.16.2.73   |
> +---------------+---------------+
> 12 rows selected (5.654 seconds)
>
> And here's what my table structure looks like (as seen via MapR NFS):
>
> $ tree /mapr/vgonzalez.drill/tmp/flows/ | head -15
> /mapr/vgonzalez.drill/tmp/flows/
> └── 2015
>     └── 11
>         ├── 10
>         │   ├── 21
>         │   │   ├── 39
>         │   │   │   ├── 03
>         │   │   │   │   ├── _common_metadata
>         │   │   │   │   ├── _metadata
>         │   │   │   │   ├──
> part-r-00000-853882bd-66d8-4505-96ba-f0a282e374de.gz.parquet
>         │   │   │   │   └── _SUCCESS
>         │   │   │   └── 20
>         │   │   │       ├── _common_metadata
>         │   │   │       ├── _metadata
>         │   │   │       ├──
> part-r-00000-37a94549-8e56-46d5-be88-cb28e6d8bc35.gz.parquet
>
> My parquet was created in Spark, not Drill. Not sure if that's relevant.
>
> I have authentication and impersonation turned on, and the files are owned
> by mapr:mapr. Here's my drill-override.conf:
>
> drill.exec: {
>   cluster-id: "vgonzalez_drill-drillbits",
> zk.connect:
>
> "ip-172-16-2-36.ec2.internal:5181,ip-172-16-2-37.ec2.internal:5181,ip-172-16-2-38.ec2.internal:5181"
> }
> drill.exec.impersonation: { enabled: true, max_chained_user_hops: 3 }
> drill.exec { security.user.auth { enabled: true, packages +=
> "org.apache.drill.exec.rpc.user.security", impl: "pam", pam_profiles: [
> "login","sudo","sshd","password-auth" ] } }
>
>
>
>
>
> On Tue, Nov 10, 2015 at 1:17 PM, John Omernik <jo...@omernik.com> wrote:
>
> > Cool, looking forward to it.
> >
> > On Mon, Nov 9, 2015 at 7:21 PM, Vince Gonzalez <vince.gonzalez@gmail.com
> >
> > wrote:
> >
> > > Hey John, I have a secure cluster and some parquet files, I'll try this
> > out
> > > and report back.
> > >
> > > On Monday, November 9, 2015, John Omernik <jo...@omernik.com> wrote:
> > >
> > > > Has anyone been able to try/test this? I am curious if it's me only
> > issue
> > > > or something more of bug so I can open a JIRA if needed.
> > > >
> > > > John
> > > >
> > > > On Fri, Nov 6, 2015 at 11:06 AM, John Omernik <john@omernik.com
> > > > <javascript:;>> wrote:
> > > >
> > > > > If someone has authorization/authentication setup, to reproduce:
> > > > >
> > > > > Have a Parquet table with directories underneath the main (I have
> > > > > directories per day)
> > > > >
> > > > > Then issue REFRESH TABLE METADATA on the root of the table running
> an
> > > > > authenticated user other than the drill bit user. (I am using
> mapr, I
> > > > used
> > > > > my user to run the query, and yes I have access to the data)
> > > > >
> > > > > Then run a normal query and see what the result is. .
> > > > >
> > > > > John
> > > > >
> > > > > On Fri, Nov 6, 2015 at 10:22 AM, Neeraja Rentachintala <
> > > > > nrentachintala@maprtech.com <javascript:;>> wrote:
> > > > >
> > > > >> This doesn't make sense and seems like a bug.
> > > > >> I think the right behavior is for the Drillbit to access the cache
> > as
> > > > >> Drillbit user at the query time (there is no user level metadata
> > cache
> > > > in
> > > > >> Drill at this point).
> > > > >>
> > > > >>
> > > > >>
> > > > >> On Fri, Nov 6, 2015 at 6:57 AM, John Omernik <john@omernik.com
> > > > <javascript:;>> wrote:
> > > > >>
> > > > >> > I ran REFRESH TABLE METADATA on a table, it completed
> > successfully.
> > > > >> >
> > > > >> > When I tried a subsequent query, I get a IOException: Permission
> > > > Denied
> > > > >> on
> > > > >> > .drill.parquet_metadata.
> > > > >> >
> > > > >> > I am running drill with authentication.  I ran the REFRESH TABLE
> > > > >> METADATA
> > > > >> > as user X, it appears the .drill.parquet_metadata was created
> and
> > > > owned
> > > > >> by
> > > > >> > the user the drill bits are running as as is created with
> > > -rwxr-x-r-x
> > > > >> >
> > > > >> > My question is this: So, I can see why the file is owned by the
> > > drill
> > > > >> bit
> > > > >> > user, and the file is created with all can read permissions, but
> > why
> > > > am
> > > > >> I
> > > > >> > getting a permission denied when user X is trying to run a
> query?
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: REFRESH TABLE METADATA - Access Denied

Posted by Vince Gonzalez <vi...@gmail.com>.
Hi John, I tried this and didn't find any issues. Let me know if I didn't
follow your reproduction faithfully.

$ sqlline -u jdbc:drill: -n ec2-user -p mapr
apache drill 1.2.0
"drill baby drill"
0: jdbc:drill:> refresh table metadata dfs.`/tmp/flows`;
+-------+------------------------------------------------------+
|  ok   |                       summary                        |
+-------+------------------------------------------------------+
| true  | Successfully updated metadata for table /tmp/flows.  |
+-------+------------------------------------------------------+
1 row selected (32.27 seconds)
0: jdbc:drill:> select srcIP,dstIP from dfs.`/tmp/flows` limit 12;
+---------------+---------------+
|     srcIP     |     dstIP     |
+---------------+---------------+
| 172.16.2.152  | 172.16.1.58   |
| 172.16.1.58   | 172.16.2.152  |
| 172.16.2.152  | 172.16.2.73   |
| 172.16.2.152  | 172.16.2.73   |
| 172.16.2.73   | 172.16.2.152  |
| 172.16.2.152  | 172.16.2.73   |
| 172.16.2.152  | 172.16.2.73   |
| 172.16.2.152  | 172.16.2.73   |
| 172.16.2.73   | 172.16.2.152  |
| 172.16.2.73   | 172.16.2.152  |
| 172.16.2.73   | 172.16.2.152  |
| 172.16.2.152  | 172.16.2.73   |
+---------------+---------------+
12 rows selected (5.654 seconds)

And here's what my table structure looks like (as seen via MapR NFS):

$ tree /mapr/vgonzalez.drill/tmp/flows/ | head -15
/mapr/vgonzalez.drill/tmp/flows/
└── 2015
    └── 11
        ├── 10
        │   ├── 21
        │   │   ├── 39
        │   │   │   ├── 03
        │   │   │   │   ├── _common_metadata
        │   │   │   │   ├── _metadata
        │   │   │   │   ├──
part-r-00000-853882bd-66d8-4505-96ba-f0a282e374de.gz.parquet
        │   │   │   │   └── _SUCCESS
        │   │   │   └── 20
        │   │   │       ├── _common_metadata
        │   │   │       ├── _metadata
        │   │   │       ├──
part-r-00000-37a94549-8e56-46d5-be88-cb28e6d8bc35.gz.parquet

My parquet was created in Spark, not Drill. Not sure if that's relevant.

I have authentication and impersonation turned on, and the files are owned
by mapr:mapr. Here's my drill-override.conf:

drill.exec: {
  cluster-id: "vgonzalez_drill-drillbits",
zk.connect:
"ip-172-16-2-36.ec2.internal:5181,ip-172-16-2-37.ec2.internal:5181,ip-172-16-2-38.ec2.internal:5181"
}
drill.exec.impersonation: { enabled: true, max_chained_user_hops: 3 }
drill.exec { security.user.auth { enabled: true, packages +=
"org.apache.drill.exec.rpc.user.security", impl: "pam", pam_profiles: [
"login","sudo","sshd","password-auth" ] } }





On Tue, Nov 10, 2015 at 1:17 PM, John Omernik <jo...@omernik.com> wrote:

> Cool, looking forward to it.
>
> On Mon, Nov 9, 2015 at 7:21 PM, Vince Gonzalez <vi...@gmail.com>
> wrote:
>
> > Hey John, I have a secure cluster and some parquet files, I'll try this
> out
> > and report back.
> >
> > On Monday, November 9, 2015, John Omernik <jo...@omernik.com> wrote:
> >
> > > Has anyone been able to try/test this? I am curious if it's me only
> issue
> > > or something more of bug so I can open a JIRA if needed.
> > >
> > > John
> > >
> > > On Fri, Nov 6, 2015 at 11:06 AM, John Omernik <john@omernik.com
> > > <javascript:;>> wrote:
> > >
> > > > If someone has authorization/authentication setup, to reproduce:
> > > >
> > > > Have a Parquet table with directories underneath the main (I have
> > > > directories per day)
> > > >
> > > > Then issue REFRESH TABLE METADATA on the root of the table running an
> > > > authenticated user other than the drill bit user. (I am using mapr, I
> > > used
> > > > my user to run the query, and yes I have access to the data)
> > > >
> > > > Then run a normal query and see what the result is. .
> > > >
> > > > John
> > > >
> > > > On Fri, Nov 6, 2015 at 10:22 AM, Neeraja Rentachintala <
> > > > nrentachintala@maprtech.com <javascript:;>> wrote:
> > > >
> > > >> This doesn't make sense and seems like a bug.
> > > >> I think the right behavior is for the Drillbit to access the cache
> as
> > > >> Drillbit user at the query time (there is no user level metadata
> cache
> > > in
> > > >> Drill at this point).
> > > >>
> > > >>
> > > >>
> > > >> On Fri, Nov 6, 2015 at 6:57 AM, John Omernik <john@omernik.com
> > > <javascript:;>> wrote:
> > > >>
> > > >> > I ran REFRESH TABLE METADATA on a table, it completed
> successfully.
> > > >> >
> > > >> > When I tried a subsequent query, I get a IOException: Permission
> > > Denied
> > > >> on
> > > >> > .drill.parquet_metadata.
> > > >> >
> > > >> > I am running drill with authentication.  I ran the REFRESH TABLE
> > > >> METADATA
> > > >> > as user X, it appears the .drill.parquet_metadata was created and
> > > owned
> > > >> by
> > > >> > the user the drill bits are running as as is created with
> > -rwxr-x-r-x
> > > >> >
> > > >> > My question is this: So, I can see why the file is owned by the
> > drill
> > > >> bit
> > > >> > user, and the file is created with all can read permissions, but
> why
> > > am
> > > >> I
> > > >> > getting a permission denied when user X is trying to run a query?
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>

Re: REFRESH TABLE METADATA - Access Denied

Posted by John Omernik <jo...@omernik.com>.
Cool, looking forward to it.

On Mon, Nov 9, 2015 at 7:21 PM, Vince Gonzalez <vi...@gmail.com>
wrote:

> Hey John, I have a secure cluster and some parquet files, I'll try this out
> and report back.
>
> On Monday, November 9, 2015, John Omernik <jo...@omernik.com> wrote:
>
> > Has anyone been able to try/test this? I am curious if it's me only issue
> > or something more of bug so I can open a JIRA if needed.
> >
> > John
> >
> > On Fri, Nov 6, 2015 at 11:06 AM, John Omernik <john@omernik.com
> > <javascript:;>> wrote:
> >
> > > If someone has authorization/authentication setup, to reproduce:
> > >
> > > Have a Parquet table with directories underneath the main (I have
> > > directories per day)
> > >
> > > Then issue REFRESH TABLE METADATA on the root of the table running an
> > > authenticated user other than the drill bit user. (I am using mapr, I
> > used
> > > my user to run the query, and yes I have access to the data)
> > >
> > > Then run a normal query and see what the result is. .
> > >
> > > John
> > >
> > > On Fri, Nov 6, 2015 at 10:22 AM, Neeraja Rentachintala <
> > > nrentachintala@maprtech.com <javascript:;>> wrote:
> > >
> > >> This doesn't make sense and seems like a bug.
> > >> I think the right behavior is for the Drillbit to access the cache as
> > >> Drillbit user at the query time (there is no user level metadata cache
> > in
> > >> Drill at this point).
> > >>
> > >>
> > >>
> > >> On Fri, Nov 6, 2015 at 6:57 AM, John Omernik <john@omernik.com
> > <javascript:;>> wrote:
> > >>
> > >> > I ran REFRESH TABLE METADATA on a table, it completed successfully.
> > >> >
> > >> > When I tried a subsequent query, I get a IOException: Permission
> > Denied
> > >> on
> > >> > .drill.parquet_metadata.
> > >> >
> > >> > I am running drill with authentication.  I ran the REFRESH TABLE
> > >> METADATA
> > >> > as user X, it appears the .drill.parquet_metadata was created and
> > owned
> > >> by
> > >> > the user the drill bits are running as as is created with
> -rwxr-x-r-x
> > >> >
> > >> > My question is this: So, I can see why the file is owned by the
> drill
> > >> bit
> > >> > user, and the file is created with all can read permissions, but why
> > am
> > >> I
> > >> > getting a permission denied when user X is trying to run a query?
> > >> >
> > >>
> > >
> > >
> >
>

Re: REFRESH TABLE METADATA - Access Denied

Posted by Vince Gonzalez <vi...@gmail.com>.
Hey John, I have a secure cluster and some parquet files, I'll try this out
and report back.

On Monday, November 9, 2015, John Omernik <jo...@omernik.com> wrote:

> Has anyone been able to try/test this? I am curious if it's me only issue
> or something more of bug so I can open a JIRA if needed.
>
> John
>
> On Fri, Nov 6, 2015 at 11:06 AM, John Omernik <john@omernik.com
> <javascript:;>> wrote:
>
> > If someone has authorization/authentication setup, to reproduce:
> >
> > Have a Parquet table with directories underneath the main (I have
> > directories per day)
> >
> > Then issue REFRESH TABLE METADATA on the root of the table running an
> > authenticated user other than the drill bit user. (I am using mapr, I
> used
> > my user to run the query, and yes I have access to the data)
> >
> > Then run a normal query and see what the result is. .
> >
> > John
> >
> > On Fri, Nov 6, 2015 at 10:22 AM, Neeraja Rentachintala <
> > nrentachintala@maprtech.com <javascript:;>> wrote:
> >
> >> This doesn't make sense and seems like a bug.
> >> I think the right behavior is for the Drillbit to access the cache as
> >> Drillbit user at the query time (there is no user level metadata cache
> in
> >> Drill at this point).
> >>
> >>
> >>
> >> On Fri, Nov 6, 2015 at 6:57 AM, John Omernik <john@omernik.com
> <javascript:;>> wrote:
> >>
> >> > I ran REFRESH TABLE METADATA on a table, it completed successfully.
> >> >
> >> > When I tried a subsequent query, I get a IOException: Permission
> Denied
> >> on
> >> > .drill.parquet_metadata.
> >> >
> >> > I am running drill with authentication.  I ran the REFRESH TABLE
> >> METADATA
> >> > as user X, it appears the .drill.parquet_metadata was created and
> owned
> >> by
> >> > the user the drill bits are running as as is created with -rwxr-x-r-x
> >> >
> >> > My question is this: So, I can see why the file is owned by the drill
> >> bit
> >> > user, and the file is created with all can read permissions, but why
> am
> >> I
> >> > getting a permission denied when user X is trying to run a query?
> >> >
> >>
> >
> >
>

Re: REFRESH TABLE METADATA - Access Denied

Posted by John Omernik <jo...@omernik.com>.
Has anyone been able to try/test this? I am curious if it's me only issue
or something more of bug so I can open a JIRA if needed.

John

On Fri, Nov 6, 2015 at 11:06 AM, John Omernik <jo...@omernik.com> wrote:

> If someone has authorization/authentication setup, to reproduce:
>
> Have a Parquet table with directories underneath the main (I have
> directories per day)
>
> Then issue REFRESH TABLE METADATA on the root of the table running an
> authenticated user other than the drill bit user. (I am using mapr, I used
> my user to run the query, and yes I have access to the data)
>
> Then run a normal query and see what the result is. .
>
> John
>
> On Fri, Nov 6, 2015 at 10:22 AM, Neeraja Rentachintala <
> nrentachintala@maprtech.com> wrote:
>
>> This doesn't make sense and seems like a bug.
>> I think the right behavior is for the Drillbit to access the cache as
>> Drillbit user at the query time (there is no user level metadata cache in
>> Drill at this point).
>>
>>
>>
>> On Fri, Nov 6, 2015 at 6:57 AM, John Omernik <jo...@omernik.com> wrote:
>>
>> > I ran REFRESH TABLE METADATA on a table, it completed successfully.
>> >
>> > When I tried a subsequent query, I get a IOException: Permission Denied
>> on
>> > .drill.parquet_metadata.
>> >
>> > I am running drill with authentication.  I ran the REFRESH TABLE
>> METADATA
>> > as user X, it appears the .drill.parquet_metadata was created and owned
>> by
>> > the user the drill bits are running as as is created with -rwxr-x-r-x
>> >
>> > My question is this: So, I can see why the file is owned by the drill
>> bit
>> > user, and the file is created with all can read permissions, but why am
>> I
>> > getting a permission denied when user X is trying to run a query?
>> >
>>
>
>

Re: REFRESH TABLE METADATA - Access Denied

Posted by John Omernik <jo...@omernik.com>.
If someone has authorization/authentication setup, to reproduce:

Have a Parquet table with directories underneath the main (I have
directories per day)

Then issue REFRESH TABLE METADATA on the root of the table running an
authenticated user other than the drill bit user. (I am using mapr, I used
my user to run the query, and yes I have access to the data)

Then run a normal query and see what the result is. .

John

On Fri, Nov 6, 2015 at 10:22 AM, Neeraja Rentachintala <
nrentachintala@maprtech.com> wrote:

> This doesn't make sense and seems like a bug.
> I think the right behavior is for the Drillbit to access the cache as
> Drillbit user at the query time (there is no user level metadata cache in
> Drill at this point).
>
>
>
> On Fri, Nov 6, 2015 at 6:57 AM, John Omernik <jo...@omernik.com> wrote:
>
> > I ran REFRESH TABLE METADATA on a table, it completed successfully.
> >
> > When I tried a subsequent query, I get a IOException: Permission Denied
> on
> > .drill.parquet_metadata.
> >
> > I am running drill with authentication.  I ran the REFRESH TABLE METADATA
> > as user X, it appears the .drill.parquet_metadata was created and owned
> by
> > the user the drill bits are running as as is created with -rwxr-x-r-x
> >
> > My question is this: So, I can see why the file is owned by the drill bit
> > user, and the file is created with all can read permissions, but why am I
> > getting a permission denied when user X is trying to run a query?
> >
>

Re: REFRESH TABLE METADATA - Access Denied

Posted by Neeraja Rentachintala <nr...@maprtech.com>.
This doesn't make sense and seems like a bug.
I think the right behavior is for the Drillbit to access the cache as
Drillbit user at the query time (there is no user level metadata cache in
Drill at this point).



On Fri, Nov 6, 2015 at 6:57 AM, John Omernik <jo...@omernik.com> wrote:

> I ran REFRESH TABLE METADATA on a table, it completed successfully.
>
> When I tried a subsequent query, I get a IOException: Permission Denied on
> .drill.parquet_metadata.
>
> I am running drill with authentication.  I ran the REFRESH TABLE METADATA
> as user X, it appears the .drill.parquet_metadata was created and owned by
> the user the drill bits are running as as is created with -rwxr-x-r-x
>
> My question is this: So, I can see why the file is owned by the drill bit
> user, and the file is created with all can read permissions, but why am I
> getting a permission denied when user X is trying to run a query?
>