You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Romain Rigaux <ro...@gmail.com> on 2011/07/09 01:38:38 UTC

Re: pigunit - registerScript called twice with assertOutput?

Hi,

This double parsing is there because I did not find an easy way to modify
the Pig plan after the first parsing.

So PigUnit parses it one time, then modifies the Pig script and then
reparses it while adding some modifications like:

   - modify an alias (e.g. change A = LOAD 'txt' --> A = LOAD 'anotherdata')
   - guess a schema
   - remind what it the last alias...

I could post a patch where PigUnit drops all the mv (or any other shell
command) by default if you want? Maybe the plan is easier to modify now too.

Romain

On Wed, Jun 29, 2011 at 12:09 PM, Jennie Cochran-Chinn <
jcochran@adconion.com> wrote:

> I was wondering why assertOutput in PigTest calls registerScript
> twice?  Once in assertOutput and then again in getAlias?  I added a mv
> to the end of my pig script and its getting called each time
> registerScript is called and thus failing the second time bc the
> source directory is no longer there.
>
> Thanks,
> Jennie
>

Re: pigunit - registerScript called twice with assertOutput?

Posted by Romain Rigaux <ro...@gmail.com>.
Hi,

I just tried it and indeed it works as you skip some extra
registerScript(s).

Some tips:
With this method if you want to load some input data from an array of String
(instead of a file) you will need to create a temporary input file yourself
and load it.
Overriding aliases is still possible:
PigTest test = new PigTest(pig, args);
test.override("B", "B = FILTER A BY ...");

I will keep in mind this problem of registerScript for a new update of
PigUnit one day!

Romain

On Mon, Jul 11, 2011 at 9:45 AM, Jennie Cochran-Chinn <jcochran@adconion.com
> wrote:

> Hey Romain,
>
> Thanks for the reply.  Instead of assertOutput directly, I'm using
> just the second line of it for now:
> Assert.assertEquals(StringUtils.join(tupleOutput, "\n"),
>                    StringUtils.join(pigTest.getAlias(alias), "\n"));
>
> - it seems to work.  Any gotchas I should be looking out for that you
> can think of though?
>
> Thanks,
> Jennie
>
> On Fri, Jul 8, 2011 at 4:38 PM, Romain Rigaux <ro...@gmail.com>
> wrote:
> > Hi,
> >
> > This double parsing is there because I did not find an easy way to modify
> > the Pig plan after the first parsing.
> >
> > So PigUnit parses it one time, then modifies the Pig script and then
> > reparses it while adding some modifications like:
> >
> >   - modify an alias (e.g. change A = LOAD 'txt' --> A = LOAD
> 'anotherdata')
> >   - guess a schema
> >   - remind what it the last alias...
> >
> > I could post a patch where PigUnit drops all the mv (or any other shell
> > command) by default if you want? Maybe the plan is easier to modify now
> too.
> >
> > Romain
> >
> > On Wed, Jun 29, 2011 at 12:09 PM, Jennie Cochran-Chinn <
> > jcochran@adconion.com> wrote:
> >
> >> I was wondering why assertOutput in PigTest calls registerScript
> >> twice?  Once in assertOutput and then again in getAlias?  I added a mv
> >> to the end of my pig script and its getting called each time
> >> registerScript is called and thus failing the second time bc the
> >> source directory is no longer there.
> >>
> >> Thanks,
> >> Jennie
> >>
> >
>

Re: pigunit - registerScript called twice with assertOutput?

Posted by Jennie Cochran-Chinn <jc...@adconion.com>.
Hey Romain,

Thanks for the reply.  Instead of assertOutput directly, I'm using
just the second line of it for now:
Assert.assertEquals(StringUtils.join(tupleOutput, "\n"),
                    StringUtils.join(pigTest.getAlias(alias), "\n"));

- it seems to work.  Any gotchas I should be looking out for that you
can think of though?

Thanks,
Jennie

On Fri, Jul 8, 2011 at 4:38 PM, Romain Rigaux <ro...@gmail.com> wrote:
> Hi,
>
> This double parsing is there because I did not find an easy way to modify
> the Pig plan after the first parsing.
>
> So PigUnit parses it one time, then modifies the Pig script and then
> reparses it while adding some modifications like:
>
>   - modify an alias (e.g. change A = LOAD 'txt' --> A = LOAD 'anotherdata')
>   - guess a schema
>   - remind what it the last alias...
>
> I could post a patch where PigUnit drops all the mv (or any other shell
> command) by default if you want? Maybe the plan is easier to modify now too.
>
> Romain
>
> On Wed, Jun 29, 2011 at 12:09 PM, Jennie Cochran-Chinn <
> jcochran@adconion.com> wrote:
>
>> I was wondering why assertOutput in PigTest calls registerScript
>> twice?  Once in assertOutput and then again in getAlias?  I added a mv
>> to the end of my pig script and its getting called each time
>> registerScript is called and thus failing the second time bc the
>> source directory is no longer there.
>>
>> Thanks,
>> Jennie
>>
>