You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by "Katukuri, Jay" <jk...@ebay.com> on 2010/05/08 00:52:32 UTC

List of directories in Load

Hi all,
Is it possible to specify multiple HDFS directories in 'Load' function.
Ex:  raw = LOAD  '/input_data/dir1', '/input_data/dir2', '/input_data/dir3' USING PigStorage  ('\t') AS (.....);

Thanks,
Jay

RE: List of directories in Load

Posted by Richard Ding <rd...@yahoo-inc.com>.
People use this feature when they want to load many files with the same
schema. It's hard to see how this feature can be used to manage tables
with different schema (as in the AS clause).

Thanks,
-Richard

-----Original Message-----
From: Syed Wasti [mailto:mdwasti@hotmail.com] 
Sent: Monday, May 10, 2010 10:58 AM
To: pig-user@hadoop.apache.org
Subject: Re: List of directories in Load

This is interesting...
When can we use this option ?
Can the different directories be two different tables ?



On 5/7/10 4:14 PM, "Richard Ding" <rd...@yahoo-inc.com> wrote:

> This feature is supported since 0.6 (PIG-1071). But the correct form
is
> 
> raw = LOAD  '/input_data/dir1,/input_data/dir2,/input_data/dir3' USING
> PigStorage  ('\t') AS (.....);
> 
> Thanks,
> -Richard
> -----Original Message-----
> From: Katukuri, Jay [mailto:jkatukuri@ebay.com]
> Sent: Friday, May 07, 2010 3:53 PM
> To: pig-user@hadoop.apache.org
> Subject: List of directories in Load
> 
> Hi all,
> Is it possible to specify multiple HDFS directories in 'Load'
function.
> Ex:  raw = LOAD  '/input_data/dir1', '/input_data/dir2',
> '/input_data/dir3' USING PigStorage  ('\t') AS (.....);
> 
> Thanks,
> Jay
> 



RE: List of directories in Load

Posted by "Katukuri, Jay" <jk...@ebay.com>.
Yes, I can use, it works with pig-0.4

-----Original Message-----
From: Richard Ding [mailto:rding@yahoo-inc.com] 
Sent: Monday, May 10, 2010 5:13 PM
To: pig-user@hadoop.apache.org
Subject: RE: List of directories in Load

I think you can use Hadoop globbing (as in Scott's example) with Pig
0.4. 

Thanks,
-Richard

-----Original Message-----
From: Katukuri, Jay [mailto:jkatukuri@ebay.com] 
Sent: Monday, May 10, 2010 10:07 AM
To: pig-user@hadoop.apache.org
Subject: RE: List of directories in Load

Does this work in Pig 0.40 also?

-----Original Message-----
From: Scott Carey [mailto:scott@richrelevance.com] 
Sent: Sunday, May 09, 2010 11:36 PM
To: pig-user@hadoop.apache.org
Subject: Re: List of directories in Load

Also, pig passes on the string to HDFS, which supports globs, so this
works:

raw = LOAD '/input_data/dir{1,2,3}'


On May 7, 2010, at 4:14 PM, Richard Ding wrote:

> This feature is supported since 0.6 (PIG-1071). But the correct form
is 
> 
> raw = LOAD  '/input_data/dir1,/input_data/dir2,/input_data/dir3' USING
> PigStorage  ('\t') AS (.....);
> 
> Thanks,
> -Richard
> -----Original Message-----
> From: Katukuri, Jay [mailto:jkatukuri@ebay.com] 
> Sent: Friday, May 07, 2010 3:53 PM
> To: pig-user@hadoop.apache.org
> Subject: List of directories in Load
> 
> Hi all,
> Is it possible to specify multiple HDFS directories in 'Load'
function.
> Ex:  raw = LOAD  '/input_data/dir1', '/input_data/dir2',
> '/input_data/dir3' USING PigStorage  ('\t') AS (.....);
> 
> Thanks,
> Jay


RE: List of directories in Load

Posted by Richard Ding <rd...@yahoo-inc.com>.
I think you can use Hadoop globbing (as in Scott's example) with Pig
0.4. 

Thanks,
-Richard

-----Original Message-----
From: Katukuri, Jay [mailto:jkatukuri@ebay.com] 
Sent: Monday, May 10, 2010 10:07 AM
To: pig-user@hadoop.apache.org
Subject: RE: List of directories in Load

Does this work in Pig 0.40 also?

-----Original Message-----
From: Scott Carey [mailto:scott@richrelevance.com] 
Sent: Sunday, May 09, 2010 11:36 PM
To: pig-user@hadoop.apache.org
Subject: Re: List of directories in Load

Also, pig passes on the string to HDFS, which supports globs, so this
works:

raw = LOAD '/input_data/dir{1,2,3}'


On May 7, 2010, at 4:14 PM, Richard Ding wrote:

> This feature is supported since 0.6 (PIG-1071). But the correct form
is 
> 
> raw = LOAD  '/input_data/dir1,/input_data/dir2,/input_data/dir3' USING
> PigStorage  ('\t') AS (.....);
> 
> Thanks,
> -Richard
> -----Original Message-----
> From: Katukuri, Jay [mailto:jkatukuri@ebay.com] 
> Sent: Friday, May 07, 2010 3:53 PM
> To: pig-user@hadoop.apache.org
> Subject: List of directories in Load
> 
> Hi all,
> Is it possible to specify multiple HDFS directories in 'Load'
function.
> Ex:  raw = LOAD  '/input_data/dir1', '/input_data/dir2',
> '/input_data/dir3' USING PigStorage  ('\t') AS (.....);
> 
> Thanks,
> Jay


RE: List of directories in Load

Posted by "Katukuri, Jay" <jk...@ebay.com>.
Does this work in Pig 0.40 also?

-----Original Message-----
From: Scott Carey [mailto:scott@richrelevance.com] 
Sent: Sunday, May 09, 2010 11:36 PM
To: pig-user@hadoop.apache.org
Subject: Re: List of directories in Load

Also, pig passes on the string to HDFS, which supports globs, so this works:

raw = LOAD '/input_data/dir{1,2,3}'


On May 7, 2010, at 4:14 PM, Richard Ding wrote:

> This feature is supported since 0.6 (PIG-1071). But the correct form is 
> 
> raw = LOAD  '/input_data/dir1,/input_data/dir2,/input_data/dir3' USING
> PigStorage  ('\t') AS (.....);
> 
> Thanks,
> -Richard
> -----Original Message-----
> From: Katukuri, Jay [mailto:jkatukuri@ebay.com] 
> Sent: Friday, May 07, 2010 3:53 PM
> To: pig-user@hadoop.apache.org
> Subject: List of directories in Load
> 
> Hi all,
> Is it possible to specify multiple HDFS directories in 'Load' function.
> Ex:  raw = LOAD  '/input_data/dir1', '/input_data/dir2',
> '/input_data/dir3' USING PigStorage  ('\t') AS (.....);
> 
> Thanks,
> Jay


Re: List of directories in Load

Posted by Scott Carey <sc...@richrelevance.com>.
Also, pig passes on the string to HDFS, which supports globs, so this works:

raw = LOAD '/input_data/dir{1,2,3}'


On May 7, 2010, at 4:14 PM, Richard Ding wrote:

> This feature is supported since 0.6 (PIG-1071). But the correct form is 
> 
> raw = LOAD  '/input_data/dir1,/input_data/dir2,/input_data/dir3' USING
> PigStorage  ('\t') AS (.....);
> 
> Thanks,
> -Richard
> -----Original Message-----
> From: Katukuri, Jay [mailto:jkatukuri@ebay.com] 
> Sent: Friday, May 07, 2010 3:53 PM
> To: pig-user@hadoop.apache.org
> Subject: List of directories in Load
> 
> Hi all,
> Is it possible to specify multiple HDFS directories in 'Load' function.
> Ex:  raw = LOAD  '/input_data/dir1', '/input_data/dir2',
> '/input_data/dir3' USING PigStorage  ('\t') AS (.....);
> 
> Thanks,
> Jay


Re: List of directories in Load

Posted by Syed Wasti <md...@hotmail.com>.
This is interesting...
When can we use this option ?
Can the different directories be two different tables ?



On 5/7/10 4:14 PM, "Richard Ding" <rd...@yahoo-inc.com> wrote:

> This feature is supported since 0.6 (PIG-1071). But the correct form is
> 
> raw = LOAD  '/input_data/dir1,/input_data/dir2,/input_data/dir3' USING
> PigStorage  ('\t') AS (.....);
> 
> Thanks,
> -Richard
> -----Original Message-----
> From: Katukuri, Jay [mailto:jkatukuri@ebay.com]
> Sent: Friday, May 07, 2010 3:53 PM
> To: pig-user@hadoop.apache.org
> Subject: List of directories in Load
> 
> Hi all,
> Is it possible to specify multiple HDFS directories in 'Load' function.
> Ex:  raw = LOAD  '/input_data/dir1', '/input_data/dir2',
> '/input_data/dir3' USING PigStorage  ('\t') AS (.....);
> 
> Thanks,
> Jay
> 



RE: List of directories in Load

Posted by Richard Ding <rd...@yahoo-inc.com>.
This feature is supported since 0.6 (PIG-1071). But the correct form is 

raw = LOAD  '/input_data/dir1,/input_data/dir2,/input_data/dir3' USING
PigStorage  ('\t') AS (.....);

Thanks,
-Richard
-----Original Message-----
From: Katukuri, Jay [mailto:jkatukuri@ebay.com] 
Sent: Friday, May 07, 2010 3:53 PM
To: pig-user@hadoop.apache.org
Subject: List of directories in Load

Hi all,
Is it possible to specify multiple HDFS directories in 'Load' function.
Ex:  raw = LOAD  '/input_data/dir1', '/input_data/dir2',
'/input_data/dir3' USING PigStorage  ('\t') AS (.....);

Thanks,
Jay