You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by "Palleti, Pallavi" <pa...@corp.aol.com> on 2009/07/09 05:34:19 UTC
Specifying multiple input paths in LOAD command
Hi all,
We have a facility in hadoop where we can specify multiple input paths.
Does this exist in Pig? Essentially, Is it possible to specify multiple
paths in load command? For example, I have n number of input paths which
I need to load for processing. The only possibility that I can see right
now is to use n variables using n load commands and do an union at the
end.
For ex:
Raw1 = LOAD '$inputPath1/*' using PigStorage('\t');
Raw2 = LOAD '$inputPath2/*' using PigStorage('\t');
.
.
.
.
Rawn = LOAD '$inputPathn/*' using PigStorage('\t');
Raw = UNION Raw1,Raw2,....RawN
Can anyone kindly let me know if there is a simpler way of doing it in
single LOAD line or something like that?
Thanks
Pallavi
Re: Specifying multiple input paths in LOAD command
Posted by Daniel Dai <da...@gmail.com>.
PIG-252 (https://issues.apache.org/jira/browse/PIG-252) address this issue.
Instead of using union, you can try this:
Raw = LOAD '$inputPathprefix{1,2,3,4}/*' using PigStorage('\t');
----- Original Message -----
From: "Palleti, Pallavi" <pa...@corp.aol.com>
To: <pi...@hadoop.apache.org>
Sent: Wednesday, July 08, 2009 8:34 PM
Subject: Specifying multiple input paths in LOAD command
Hi all,
We have a facility in hadoop where we can specify multiple input paths.
Does this exist in Pig? Essentially, Is it possible to specify multiple
paths in load command? For example, I have n number of input paths which
I need to load for processing. The only possibility that I can see right
now is to use n variables using n load commands and do an union at the
end.
For ex:
Raw1 = LOAD '$inputPath1/*' using PigStorage('\t');
Raw2 = LOAD '$inputPath2/*' using PigStorage('\t');
.
.
.
.
Rawn = LOAD '$inputPathn/*' using PigStorage('\t');
Raw = UNION Raw1,Raw2,....RawN
Can anyone kindly let me know if there is a simpler way of doing it in
single LOAD line or something like that?
Thanks
Pallavi
Re: Specifying multiple input paths in LOAD command
Posted by Thejas Nair <te...@yahoo-inc.com>.
>From my experience, the entries in {} have to be one dir name, it can't be a
path containing several dirs.
This does not work - LOAD '{/d1/abc/def/f1,/d1/abc/xyz/f1}'
This works - LOAD '/d1/abc/{def,xyz}/f1'
-Thejas
On 7/9/09 8:07 PM, "zjffdu" <zj...@gmail.com> wrote:
> You can use pattern to match the path:
>
> For example:
>
> Raw1 = LOAD '{inputPath1,inputPath2,...}/*' using PigStorage('\t');
>
> This will load all the data under inputPath1,inputPath2,...
>
> This is a mechanism supported by hadoop internally.
>
>
>
> -----Original Message-----
> From: Palleti, Pallavi [mailto:pallavi.palleti@corp.aol.com]
> Sent: 2009年7月8日 20:34
> To: pig-user@hadoop.apache.org
> Subject: Specifying multiple input paths in LOAD command
>
> Hi all,
>
>
>
> We have a facility in hadoop where we can specify multiple input paths.
> Does this exist in Pig? Essentially, Is it possible to specify multiple
> paths in load command? For example, I have n number of input paths which
> I need to load for processing. The only possibility that I can see right
> now is to use n variables using n load commands and do an union at the
> end.
>
> For ex:
>
>
>
> Raw1 = LOAD '$inputPath1/*' using PigStorage('\t');
>
> Raw2 = LOAD '$inputPath2/*' using PigStorage('\t');
>
> .
>
> .
>
> .
>
> .
>
> Rawn = LOAD '$inputPathn/*' using PigStorage('\t');
>
> Raw = UNION Raw1,Raw2,....RawN
>
>
>
> Can anyone kindly let me know if there is a simpler way of doing it in
> single LOAD line or something like that?
>
>
>
> Thanks
>
> Pallavi
>
>
>
>
>
>
>
>
RE: Specifying multiple input paths in LOAD command
Posted by zjffdu <zj...@gmail.com>.
You can use pattern to match the path:
For example:
Raw1 = LOAD '{inputPath1,inputPath2,...}/*' using PigStorage('\t');
This will load all the data under inputPath1,inputPath2,...
This is a mechanism supported by hadoop internally.
-----Original Message-----
From: Palleti, Pallavi [mailto:pallavi.palleti@corp.aol.com]
Sent: 2009年7月8日 20:34
To: pig-user@hadoop.apache.org
Subject: Specifying multiple input paths in LOAD command
Hi all,
We have a facility in hadoop where we can specify multiple input paths.
Does this exist in Pig? Essentially, Is it possible to specify multiple
paths in load command? For example, I have n number of input paths which
I need to load for processing. The only possibility that I can see right
now is to use n variables using n load commands and do an union at the
end.
For ex:
Raw1 = LOAD '$inputPath1/*' using PigStorage('\t');
Raw2 = LOAD '$inputPath2/*' using PigStorage('\t');
.
.
.
.
Rawn = LOAD '$inputPathn/*' using PigStorage('\t');
Raw = UNION Raw1,Raw2,....RawN
Can anyone kindly let me know if there is a simpler way of doing it in
single LOAD line or something like that?
Thanks
Pallavi