You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Tom White <to...@gmail.com> on 2008/06/04 16:55:47 UTC

Specifying multiple input paths

I;m having a problem loading data from multiple paths in Pig. What I'm
trying to do is to load data from a range of dates, so I would like to
specify an input of two globbed paths:

x = LOAD '2008/05/{26,27,28,29,30,31},2008/06/{1,2}'

Pig doesn't seem to like this though as it's trying to interpret it as
a single path. The best I can do it to use UNION:

x1 = LOAD '2008/05/{26,27,28,29,30,31}'
x2 = LOAD '2008/06/{1,2}'
x = UNION x1, x2

The downside to this is that I want to parameterize my paths, and
having separate script for each number of paths in the input is
cumbersome.

Is there a better way of doing this? Are there any plans to support
multiple paths, and/or PathFilters?

Thanks,

Tom

RE: Specifying multiple input paths

Posted by Olga Natkovich <ol...@yahoo-inc.com>.
Tom,

You are correct that currently we only allow a single glob in the load
statement. It would not be hard to extend it to multiple globs. I have
created a JIRA for it: https://issues.apache.org/jira/browse/PIG-252;
maybe somebody will be interested to contribute a patch. 

Olga

> -----Original Message-----
> From: Tom White [mailto:tom.e.white@gmail.com] 
> Sent: Wednesday, June 04, 2008 7:56 AM
> To: pig-user@incubator.apache.org
> Subject: Specifying multiple input paths
> 
> I;m having a problem loading data from multiple paths in Pig. 
> What I'm trying to do is to load data from a range of dates, 
> so I would like to specify an input of two globbed paths:
> 
> x = LOAD '2008/05/{26,27,28,29,30,31},2008/06/{1,2}'
> 
> Pig doesn't seem to like this though as it's trying to 
> interpret it as a single path. The best I can do it to use UNION:
> 
> x1 = LOAD '2008/05/{26,27,28,29,30,31}'
> x2 = LOAD '2008/06/{1,2}'
> x = UNION x1, x2
> 
> The downside to this is that I want to parameterize my paths, 
> and having separate script for each number of paths in the 
> input is cumbersome.
> 
> Is there a better way of doing this? Are there any plans to 
> support multiple paths, and/or PathFilters?
> 
> Thanks,
> 
> Tom
>