You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by "Palleti, Pallavi" <pa...@corp.aol.com> on 2009/07/09 05:34:19 UTC

Specifying multiple input paths in LOAD command

Hi all,

 

We have a facility in hadoop where we can specify multiple input paths.
Does this exist in Pig? Essentially, Is it possible to specify multiple
paths in load command? For example, I have n number of input paths which
I need to load for processing. The only possibility that I can see right
now is to use n variables using n load commands and do an union at the
end. 

For ex:

 

Raw1 = LOAD '$inputPath1/*' using PigStorage('\t');

Raw2 = LOAD '$inputPath2/*' using PigStorage('\t');

.

.

.

.

Rawn = LOAD '$inputPathn/*' using PigStorage('\t');

Raw = UNION Raw1,Raw2,....RawN

 

Can anyone kindly let me know if there is a simpler way of doing it in
single LOAD line or something like that? 

 

Thanks

Pallavi

 

 

 


Re: Specifying multiple input paths in LOAD command

Posted by Daniel Dai <da...@gmail.com>.
PIG-252 (https://issues.apache.org/jira/browse/PIG-252) address this issue.

Instead of using union, you can try this:

Raw = LOAD '$inputPathprefix{1,2,3,4}/*' using PigStorage('\t');



----- Original Message ----- 
From: "Palleti, Pallavi" <pa...@corp.aol.com>
To: <pi...@hadoop.apache.org>
Sent: Wednesday, July 08, 2009 8:34 PM
Subject: Specifying multiple input paths in LOAD command


Hi all,

 

We have a facility in hadoop where we can specify multiple input paths.
Does this exist in Pig? Essentially, Is it possible to specify multiple
paths in load command? For example, I have n number of input paths which
I need to load for processing. The only possibility that I can see right
now is to use n variables using n load commands and do an union at the
end. 

For ex:

 

Raw1 = LOAD '$inputPath1/*' using PigStorage('\t');

Raw2 = LOAD '$inputPath2/*' using PigStorage('\t');

.

.

.

.

Rawn = LOAD '$inputPathn/*' using PigStorage('\t');

Raw = UNION Raw1,Raw2,....RawN

 

Can anyone kindly let me know if there is a simpler way of doing it in
single LOAD line or something like that? 

 

Thanks

Pallavi

 

 

 



Re: Specifying multiple input paths in LOAD command

Posted by Thejas Nair <te...@yahoo-inc.com>.
>From my experience, the entries in {} have to be one dir name, it can't be a
path containing several dirs.
This does not work - LOAD '{/d1/abc/def/f1,/d1/abc/xyz/f1}'
This works - LOAD '/d1/abc/{def,xyz}/f1'

-Thejas


On 7/9/09 8:07 PM, "zjffdu" <zj...@gmail.com> wrote:

> You can use pattern to match the path:
> 
> For example: 
> 
> Raw1 = LOAD '{inputPath1,inputPath2,...}/*' using PigStorage('\t');
> 
> This will load all the data under inputPath1,inputPath2,...
> 
> This is a mechanism supported by hadoop internally.
> 
> 
> 
> -----Original Message-----
> From: Palleti, Pallavi [mailto:pallavi.palleti@corp.aol.com]
> Sent: 2009年7月8日 20:34
> To: pig-user@hadoop.apache.org
> Subject: Specifying multiple input paths in LOAD command
> 
> Hi all,
> 
>  
> 
> We have a facility in hadoop where we can specify multiple input paths.
> Does this exist in Pig? Essentially, Is it possible to specify multiple
> paths in load command? For example, I have n number of input paths which
> I need to load for processing. The only possibility that I can see right
> now is to use n variables using n load commands and do an union at the
> end. 
> 
> For ex:
> 
>  
> 
> Raw1 = LOAD '$inputPath1/*' using PigStorage('\t');
> 
> Raw2 = LOAD '$inputPath2/*' using PigStorage('\t');
> 
> .
> 
> .
> 
> .
> 
> .
> 
> Rawn = LOAD '$inputPathn/*' using PigStorage('\t');
> 
> Raw = UNION Raw1,Raw2,....RawN
> 
>  
> 
> Can anyone kindly let me know if there is a simpler way of doing it in
> single LOAD line or something like that?
> 
>  
> 
> Thanks
> 
> Pallavi
> 
>  
> 
>  
> 
>  
> 
> 


RE: Specifying multiple input paths in LOAD command

Posted by zjffdu <zj...@gmail.com>.
You can use pattern to match the path:

For example: 

Raw1 = LOAD '{inputPath1,inputPath2,...}/*' using PigStorage('\t');

This will load all the data under inputPath1,inputPath2,...

This is a mechanism supported by hadoop internally.



-----Original Message-----
From: Palleti, Pallavi [mailto:pallavi.palleti@corp.aol.com] 
Sent: 2009年7月8日 20:34
To: pig-user@hadoop.apache.org
Subject: Specifying multiple input paths in LOAD command

Hi all,

 

We have a facility in hadoop where we can specify multiple input paths.
Does this exist in Pig? Essentially, Is it possible to specify multiple
paths in load command? For example, I have n number of input paths which
I need to load for processing. The only possibility that I can see right
now is to use n variables using n load commands and do an union at the
end. 

For ex:

 

Raw1 = LOAD '$inputPath1/*' using PigStorage('\t');

Raw2 = LOAD '$inputPath2/*' using PigStorage('\t');

.

.

.

.

Rawn = LOAD '$inputPathn/*' using PigStorage('\t');

Raw = UNION Raw1,Raw2,....RawN

 

Can anyone kindly let me know if there is a simpler way of doing it in
single LOAD line or something like that? 

 

Thanks

Pallavi