You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Craig Macdonald <cr...@dcs.gla.ac.uk> on 2008/02/08 10:26:11 UTC
dont copy to DFS if source filesystem marked as shared
Good morning,
I've been playing with Pig using three setups:
(a) local
(b) hadoop mapred with hdfs
(c) hadoop mapred with file:///path/to/shared/fs as the default file system
In our local setup, various NFS filesystems are shared between all
machines (including mapred nodes) eg /users, /local
I would like Pig to note when input files are in a file:// directory
that has been marked as shared, and hence not copy it to DFS.
For comparison, the Torque PBS resource manager has a usecp directive,
which notes when a filesystem location is shared between all nodes, (and
hence scp is not needed). See
http://www.clusterresources.com/wiki/doku.php?id=torque:6.2_nfs_and_other_networked_filesystems
It would be good to have a configurable setting in Pig that says when a
filesystem is shared, and hence no copying between file:// and hdfs://
is needed.
An example in our setup might be:
sharedFS file:///local/
sharedFS file:///users/
if commands should be used.
Relatedly, if I use a fs.default.name=file:///path/to/shared/fs then the
default file path for Pig job information is not suitable (eg
/tmp/tempRANDOMINT is NOT shared on all nodes)
C
Re: dont copy to DFS if source filesystem marked as shared
Posted by Eric Baldeschwieler <er...@yahoo-inc.com>.
probably best to send the URL for JIRA...
On Feb 8, 2008, at 11:22 AM, Benjamin Reed wrote:
> Great suggestion Craig! Could you open a Jira on this?
>
> thanx
> ben
>
> On Friday 08 February 2008 01:26:11 Craig Macdonald wrote:
>> Good morning,
>>
>> I've been playing with Pig using three setups:
>> (a) local
>> (b) hadoop mapred with hdfs
>> (c) hadoop mapred with file:///path/to/shared/fs as the default file
>> system
>>
>> In our local setup, various NFS filesystems are shared between all
>> machines (including mapred nodes) eg /users, /local
>>
>> I would like Pig to note when input files are in a file:// directory
>> that has been marked as shared, and hence not copy it to DFS.
>>
>> For comparison, the Torque PBS resource manager has a usecp
>> directive,
>> which notes when a filesystem location is shared between all
>> nodes, (and
>> hence scp is not needed). See
>> http://www.clusterresources.com/wiki/doku.php?id=torque:
>> 6.2_nfs_and_other_n
>> etworked_filesystems
>>
>> It would be good to have a configurable setting in Pig that says
>> when a
>> filesystem is shared, and hence no copying between file:// and
>> hdfs://
>> is needed.
>> An example in our setup might be:
>> sharedFS file:///local/
>> sharedFS file:///users/
>> if commands should be used.
>>
>> Relatedly, if I use a fs.default.name=file:///path/to/shared/fs
>> then the
>> default file path for Pig job information is not suitable (eg
>> /tmp/tempRANDOMINT is NOT shared on all nodes)
>>
>> C
>
>
Re: dont copy to DFS if source filesystem marked as shared
Posted by Benjamin Reed <br...@yahoo-inc.com>.
Great suggestion Craig! Could you open a Jira on this?
thanx
ben
On Friday 08 February 2008 01:26:11 Craig Macdonald wrote:
> Good morning,
>
> I've been playing with Pig using three setups:
> (a) local
> (b) hadoop mapred with hdfs
> (c) hadoop mapred with file:///path/to/shared/fs as the default file
> system
>
> In our local setup, various NFS filesystems are shared between all
> machines (including mapred nodes) eg /users, /local
>
> I would like Pig to note when input files are in a file:// directory
> that has been marked as shared, and hence not copy it to DFS.
>
> For comparison, the Torque PBS resource manager has a usecp directive,
> which notes when a filesystem location is shared between all nodes, (and
> hence scp is not needed). See
> http://www.clusterresources.com/wiki/doku.php?id=torque:6.2_nfs_and_other_n
>etworked_filesystems
>
> It would be good to have a configurable setting in Pig that says when a
> filesystem is shared, and hence no copying between file:// and hdfs://
> is needed.
> An example in our setup might be:
> sharedFS file:///local/
> sharedFS file:///users/
> if commands should be used.
>
> Relatedly, if I use a fs.default.name=file:///path/to/shared/fs then the
> default file path for Pig job information is not suitable (eg
> /tmp/tempRANDOMINT is NOT shared on all nodes)
>
> C