You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Adam Portley <ap...@yahoo-inc.com> on 2011/11/22 04:45:56 UTC

AccessControlException in estimateNumberOfReducers

I'm running into an issue with pig 0.9.1.  My top-level data directory 
contains several files and directories with restricted permissions, and 
my LoadFunc and input format ignore these directories if the user does 
not have permission to read them.   Unfortunately pig's execution engine 
still throws an exception.

Example:

$ hadoop fs -ls /data
Found 2 items
drwxr-xr-x   - owner users            0 2011-11-16 06:47 /data/readable
drwxr-x---   - owner secure          0 2011-11-16 06:48 /data/secure

The /data/secure directory is readable only by users in the 'secure' 
group.  Non-secure users encounter the following pig exception even 
though the loader and input format do not touch secure data:

REGISTER my-jar;
data = LOAD /data USING myLoader();
(do something..)

Caused by: org.apache.hadoop.security.AccessControlException:
org.apache.hadoop.security.AccessControlException: Permission denied:
user=<removed>, access=READ_EXECUTE, inode="secure":owner:secure:rwxr-x---
         at 
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
         at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
         at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
         at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
         at 
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95)
         at 
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
         at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:669)
         at 
org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:280)
         at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getPathLength(JobControlCompiler.java:791)
         at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getPathLength(JobControlCompiler.java:794)
         at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getTotalInputFileSize(JobControlCompiler.java:779)
         at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.estimateNumberOfReducers(JobControlCompiler.java:739)
         at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:587)
         ... 12 more


I think Pig should probably catch this exception and ignore unreadable 
directories when estimating the number of reducers.

Thanks,
--Adam


Re: AccessControlException in estimateNumberOfReducers

Posted by Thejas Nair <th...@hortonworks.com>.
You should be able to workaround this issue by explicitly setting the 
number of reducer (parallel keyword in the statements or define 
default_parallel).
This is an unusual use case, but i don't see any harm in doing what you 
suggest. Please feel free to open a jira and submit a patch.

Thanks,
Thejas


On 11/21/11 7:45 PM, Adam Portley wrote:
> I'm running into an issue with pig 0.9.1. My top-level data directory
> contains several files and directories with restricted permissions, and
> my LoadFunc and input format ignore these directories if the user does
> not have permission to read them. Unfortunately pig's execution engine
> still throws an exception.
>
> Example:
>
> $ hadoop fs -ls /data
> Found 2 items
> drwxr-xr-x - owner users 0 2011-11-16 06:47 /data/readable
> drwxr-x--- - owner secure 0 2011-11-16 06:48 /data/secure
>
> The /data/secure directory is readable only by users in the 'secure'
> group. Non-secure users encounter the following pig exception even
> though the loader and input format do not touch secure data:
>
> REGISTER my-jar;
> data = LOAD /data USING myLoader();
> (do something..)
>
> Caused by: org.apache.hadoop.security.AccessControlException:
> org.apache.hadoop.security.AccessControlException: Permission denied:
> user=<removed>, access=READ_EXECUTE, inode="secure":owner:secure:rwxr-x---
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> at
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95)
>
> at
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
>
> at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:669)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:280)
>
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getPathLength(JobControlCompiler.java:791)
>
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getPathLength(JobControlCompiler.java:794)
>
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getTotalInputFileSize(JobControlCompiler.java:779)
>
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.estimateNumberOfReducers(JobControlCompiler.java:739)
>
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:587)
>
> ... 12 more
>
>
> I think Pig should probably catch this exception and ignore unreadable
> directories when estimating the number of reducers.
>
> Thanks,
> --Adam
>
>