You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Jerry Jiang <je...@twitter.com> on 2012/09/28 00:49:53 UTC

Load multiple files with date variables in pig

Hi,

I am new to pig.

In pig, I want to load multiple files with date variables at their names.

If I load files between 2012/02/12 to 2012/02/19, the following works

$START = "12"
$END = "19"
raw_data = load '/table/status/2012/02/{$START,$END}' using Loader()

Suppose the start date is 2011/12/29 and end date is 2012/01/04, how do I
change the line of code?

Thanks for any help!

Jerry

Re: Load multiple files with date variables in pig

Posted by Bill Graham <bi...@gmail.com>.
In that case you'll need to write some code external to your script that
can generate all possible globbing patterns and pass that pattern into your
pig script. So instead of START and END you get something like this:

DATE_PATTERN={2011/12/{29,30,31},2012/01/{01,02,03,04}}

Yes, it's clunky but it's how HDFS handles path globbing.

On Thu, Sep 27, 2012 at 3:49 PM, Jerry Jiang <je...@twitter.com> wrote:

> Hi,
>
> I am new to pig.
>
> In pig, I want to load multiple files with date variables at their names.
>
> If I load files between 2012/02/12 to 2012/02/19, the following works
>
> $START = "12"
> $END = "19"
> raw_data = load '/table/status/2012/02/{$START,$END}' using Loader()
>
> Suppose the start date is 2011/12/29 and end date is 2012/01/04, how do I
> change the line of code?
>
> Thanks for any help!
>
> Jerry
>



-- 
*Note that I'm no longer using my Yahoo! email address. Please email me at
billgraham@gmail.com going forward.*