You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Craig Macdonald <cr...@dcs.gla.ac.uk> on 2008/02/08 10:26:11 UTC

dont copy to DFS if source filesystem marked as shared

Good morning,

I've been playing with Pig using three setups:
 (a) local
 (b) hadoop mapred with hdfs
 (c) hadoop mapred with file:///path/to/shared/fs as the default file system

In our local setup, various NFS filesystems are shared between all 
machines (including mapred nodes)  eg /users, /local

I would like Pig to note when input files are in a file:// directory 
that has been marked as shared, and hence not copy it to DFS.

For comparison, the Torque PBS resource manager has a usecp directive, 
which notes when a filesystem location is shared between all nodes, (and 
hence scp is not needed). See 
http://www.clusterresources.com/wiki/doku.php?id=torque:6.2_nfs_and_other_networked_filesystems

It would be good to have a configurable setting in Pig that says when a 
filesystem is shared, and hence no copying between file:// and hdfs:// 
is needed.
An example in our setup might be:
 sharedFS file:///local/
 sharedFS file:///users/
if commands should be used.

Relatedly, if I use a fs.default.name=file:///path/to/shared/fs then the 
default file path for Pig job information is not suitable (eg 
/tmp/tempRANDOMINT is NOT shared on all nodes)

C

Re: dont copy to DFS if source filesystem marked as shared

Posted by Eric Baldeschwieler <er...@yahoo-inc.com>.
probably best to send the URL for JIRA...

On Feb 8, 2008, at 11:22 AM, Benjamin Reed wrote:

> Great suggestion Craig! Could you open a Jira on this?
>
> thanx
> ben
>
> On Friday 08 February 2008 01:26:11 Craig Macdonald wrote:
>> Good morning,
>>
>> I've been playing with Pig using three setups:
>>  (a) local
>>  (b) hadoop mapred with hdfs
>>  (c) hadoop mapred with file:///path/to/shared/fs as the default file
>> system
>>
>> In our local setup, various NFS filesystems are shared between all
>> machines (including mapred nodes)  eg /users, /local
>>
>> I would like Pig to note when input files are in a file:// directory
>> that has been marked as shared, and hence not copy it to DFS.
>>
>> For comparison, the Torque PBS resource manager has a usecp  
>> directive,
>> which notes when a filesystem location is shared between all  
>> nodes, (and
>> hence scp is not needed). See
>> http://www.clusterresources.com/wiki/doku.php?id=torque: 
>> 6.2_nfs_and_other_n
>> etworked_filesystems
>>
>> It would be good to have a configurable setting in Pig that says  
>> when a
>> filesystem is shared, and hence no copying between file:// and  
>> hdfs://
>> is needed.
>> An example in our setup might be:
>>  sharedFS file:///local/
>>  sharedFS file:///users/
>> if commands should be used.
>>
>> Relatedly, if I use a fs.default.name=file:///path/to/shared/fs  
>> then the
>> default file path for Pig job information is not suitable (eg
>> /tmp/tempRANDOMINT is NOT shared on all nodes)
>>
>> C
>
>


Re: dont copy to DFS if source filesystem marked as shared

Posted by Benjamin Reed <br...@yahoo-inc.com>.
Great suggestion Craig! Could you open a Jira on this?

thanx
ben

On Friday 08 February 2008 01:26:11 Craig Macdonald wrote:
> Good morning,
>
> I've been playing with Pig using three setups:
>  (a) local
>  (b) hadoop mapred with hdfs
>  (c) hadoop mapred with file:///path/to/shared/fs as the default file
> system
>
> In our local setup, various NFS filesystems are shared between all
> machines (including mapred nodes)  eg /users, /local
>
> I would like Pig to note when input files are in a file:// directory
> that has been marked as shared, and hence not copy it to DFS.
>
> For comparison, the Torque PBS resource manager has a usecp directive,
> which notes when a filesystem location is shared between all nodes, (and
> hence scp is not needed). See
> http://www.clusterresources.com/wiki/doku.php?id=torque:6.2_nfs_and_other_n
>etworked_filesystems
>
> It would be good to have a configurable setting in Pig that says when a
> filesystem is shared, and hence no copying between file:// and hdfs://
> is needed.
> An example in our setup might be:
>  sharedFS file:///local/
>  sharedFS file:///users/
> if commands should be used.
>
> Relatedly, if I use a fs.default.name=file:///path/to/shared/fs then the
> default file path for Pig job information is not suitable (eg
> /tmp/tempRANDOMINT is NOT shared on all nodes)
>
> C