You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by "Katukuri, Jay" <jk...@ebay.com> on 2010/05/08 00:52:32 UTC
List of directories in Load
Hi all,
Is it possible to specify multiple HDFS directories in 'Load' function.
Ex: raw = LOAD '/input_data/dir1', '/input_data/dir2', '/input_data/dir3' USING PigStorage ('\t') AS (.....);
Thanks,
Jay
RE: List of directories in Load
Posted by Richard Ding <rd...@yahoo-inc.com>.
People use this feature when they want to load many files with the same
schema. It's hard to see how this feature can be used to manage tables
with different schema (as in the AS clause).
Thanks,
-Richard
-----Original Message-----
From: Syed Wasti [mailto:mdwasti@hotmail.com]
Sent: Monday, May 10, 2010 10:58 AM
To: pig-user@hadoop.apache.org
Subject: Re: List of directories in Load
This is interesting...
When can we use this option ?
Can the different directories be two different tables ?
On 5/7/10 4:14 PM, "Richard Ding" <rd...@yahoo-inc.com> wrote:
> This feature is supported since 0.6 (PIG-1071). But the correct form
is
>
> raw = LOAD '/input_data/dir1,/input_data/dir2,/input_data/dir3' USING
> PigStorage ('\t') AS (.....);
>
> Thanks,
> -Richard
> -----Original Message-----
> From: Katukuri, Jay [mailto:jkatukuri@ebay.com]
> Sent: Friday, May 07, 2010 3:53 PM
> To: pig-user@hadoop.apache.org
> Subject: List of directories in Load
>
> Hi all,
> Is it possible to specify multiple HDFS directories in 'Load'
function.
> Ex: raw = LOAD '/input_data/dir1', '/input_data/dir2',
> '/input_data/dir3' USING PigStorage ('\t') AS (.....);
>
> Thanks,
> Jay
>
RE: List of directories in Load
Posted by "Katukuri, Jay" <jk...@ebay.com>.
Yes, I can use, it works with pig-0.4
-----Original Message-----
From: Richard Ding [mailto:rding@yahoo-inc.com]
Sent: Monday, May 10, 2010 5:13 PM
To: pig-user@hadoop.apache.org
Subject: RE: List of directories in Load
I think you can use Hadoop globbing (as in Scott's example) with Pig
0.4.
Thanks,
-Richard
-----Original Message-----
From: Katukuri, Jay [mailto:jkatukuri@ebay.com]
Sent: Monday, May 10, 2010 10:07 AM
To: pig-user@hadoop.apache.org
Subject: RE: List of directories in Load
Does this work in Pig 0.40 also?
-----Original Message-----
From: Scott Carey [mailto:scott@richrelevance.com]
Sent: Sunday, May 09, 2010 11:36 PM
To: pig-user@hadoop.apache.org
Subject: Re: List of directories in Load
Also, pig passes on the string to HDFS, which supports globs, so this
works:
raw = LOAD '/input_data/dir{1,2,3}'
On May 7, 2010, at 4:14 PM, Richard Ding wrote:
> This feature is supported since 0.6 (PIG-1071). But the correct form
is
>
> raw = LOAD '/input_data/dir1,/input_data/dir2,/input_data/dir3' USING
> PigStorage ('\t') AS (.....);
>
> Thanks,
> -Richard
> -----Original Message-----
> From: Katukuri, Jay [mailto:jkatukuri@ebay.com]
> Sent: Friday, May 07, 2010 3:53 PM
> To: pig-user@hadoop.apache.org
> Subject: List of directories in Load
>
> Hi all,
> Is it possible to specify multiple HDFS directories in 'Load'
function.
> Ex: raw = LOAD '/input_data/dir1', '/input_data/dir2',
> '/input_data/dir3' USING PigStorage ('\t') AS (.....);
>
> Thanks,
> Jay
RE: List of directories in Load
Posted by Richard Ding <rd...@yahoo-inc.com>.
I think you can use Hadoop globbing (as in Scott's example) with Pig
0.4.
Thanks,
-Richard
-----Original Message-----
From: Katukuri, Jay [mailto:jkatukuri@ebay.com]
Sent: Monday, May 10, 2010 10:07 AM
To: pig-user@hadoop.apache.org
Subject: RE: List of directories in Load
Does this work in Pig 0.40 also?
-----Original Message-----
From: Scott Carey [mailto:scott@richrelevance.com]
Sent: Sunday, May 09, 2010 11:36 PM
To: pig-user@hadoop.apache.org
Subject: Re: List of directories in Load
Also, pig passes on the string to HDFS, which supports globs, so this
works:
raw = LOAD '/input_data/dir{1,2,3}'
On May 7, 2010, at 4:14 PM, Richard Ding wrote:
> This feature is supported since 0.6 (PIG-1071). But the correct form
is
>
> raw = LOAD '/input_data/dir1,/input_data/dir2,/input_data/dir3' USING
> PigStorage ('\t') AS (.....);
>
> Thanks,
> -Richard
> -----Original Message-----
> From: Katukuri, Jay [mailto:jkatukuri@ebay.com]
> Sent: Friday, May 07, 2010 3:53 PM
> To: pig-user@hadoop.apache.org
> Subject: List of directories in Load
>
> Hi all,
> Is it possible to specify multiple HDFS directories in 'Load'
function.
> Ex: raw = LOAD '/input_data/dir1', '/input_data/dir2',
> '/input_data/dir3' USING PigStorage ('\t') AS (.....);
>
> Thanks,
> Jay
RE: List of directories in Load
Posted by "Katukuri, Jay" <jk...@ebay.com>.
Does this work in Pig 0.40 also?
-----Original Message-----
From: Scott Carey [mailto:scott@richrelevance.com]
Sent: Sunday, May 09, 2010 11:36 PM
To: pig-user@hadoop.apache.org
Subject: Re: List of directories in Load
Also, pig passes on the string to HDFS, which supports globs, so this works:
raw = LOAD '/input_data/dir{1,2,3}'
On May 7, 2010, at 4:14 PM, Richard Ding wrote:
> This feature is supported since 0.6 (PIG-1071). But the correct form is
>
> raw = LOAD '/input_data/dir1,/input_data/dir2,/input_data/dir3' USING
> PigStorage ('\t') AS (.....);
>
> Thanks,
> -Richard
> -----Original Message-----
> From: Katukuri, Jay [mailto:jkatukuri@ebay.com]
> Sent: Friday, May 07, 2010 3:53 PM
> To: pig-user@hadoop.apache.org
> Subject: List of directories in Load
>
> Hi all,
> Is it possible to specify multiple HDFS directories in 'Load' function.
> Ex: raw = LOAD '/input_data/dir1', '/input_data/dir2',
> '/input_data/dir3' USING PigStorage ('\t') AS (.....);
>
> Thanks,
> Jay
Re: List of directories in Load
Posted by Scott Carey <sc...@richrelevance.com>.
Also, pig passes on the string to HDFS, which supports globs, so this works:
raw = LOAD '/input_data/dir{1,2,3}'
On May 7, 2010, at 4:14 PM, Richard Ding wrote:
> This feature is supported since 0.6 (PIG-1071). But the correct form is
>
> raw = LOAD '/input_data/dir1,/input_data/dir2,/input_data/dir3' USING
> PigStorage ('\t') AS (.....);
>
> Thanks,
> -Richard
> -----Original Message-----
> From: Katukuri, Jay [mailto:jkatukuri@ebay.com]
> Sent: Friday, May 07, 2010 3:53 PM
> To: pig-user@hadoop.apache.org
> Subject: List of directories in Load
>
> Hi all,
> Is it possible to specify multiple HDFS directories in 'Load' function.
> Ex: raw = LOAD '/input_data/dir1', '/input_data/dir2',
> '/input_data/dir3' USING PigStorage ('\t') AS (.....);
>
> Thanks,
> Jay
Re: List of directories in Load
Posted by Syed Wasti <md...@hotmail.com>.
This is interesting...
When can we use this option ?
Can the different directories be two different tables ?
On 5/7/10 4:14 PM, "Richard Ding" <rd...@yahoo-inc.com> wrote:
> This feature is supported since 0.6 (PIG-1071). But the correct form is
>
> raw = LOAD '/input_data/dir1,/input_data/dir2,/input_data/dir3' USING
> PigStorage ('\t') AS (.....);
>
> Thanks,
> -Richard
> -----Original Message-----
> From: Katukuri, Jay [mailto:jkatukuri@ebay.com]
> Sent: Friday, May 07, 2010 3:53 PM
> To: pig-user@hadoop.apache.org
> Subject: List of directories in Load
>
> Hi all,
> Is it possible to specify multiple HDFS directories in 'Load' function.
> Ex: raw = LOAD '/input_data/dir1', '/input_data/dir2',
> '/input_data/dir3' USING PigStorage ('\t') AS (.....);
>
> Thanks,
> Jay
>
RE: List of directories in Load
Posted by Richard Ding <rd...@yahoo-inc.com>.
This feature is supported since 0.6 (PIG-1071). But the correct form is
raw = LOAD '/input_data/dir1,/input_data/dir2,/input_data/dir3' USING
PigStorage ('\t') AS (.....);
Thanks,
-Richard
-----Original Message-----
From: Katukuri, Jay [mailto:jkatukuri@ebay.com]
Sent: Friday, May 07, 2010 3:53 PM
To: pig-user@hadoop.apache.org
Subject: List of directories in Load
Hi all,
Is it possible to specify multiple HDFS directories in 'Load' function.
Ex: raw = LOAD '/input_data/dir1', '/input_data/dir2',
'/input_data/dir3' USING PigStorage ('\t') AS (.....);
Thanks,
Jay