You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by jamal sasha <ja...@gmail.com> on 2012/12/20 01:23:42 UTC

what happens under the hood

Hi,
  I am trying to dig deep on the workings of pig libraries.

So can someone help me understand what happens when someone does:

in = load 'in.txt' using PigStorage(',') as (foo:int);
dump in;

what happens behind the scenes..
How does it executes map reduce jobs..
where is this "load" defined in the pig code base .
I am just trying to see how  the backend code is implemented where this two
lines of code translates into the map reduce code.
Any pointers.
Thanks
Jamal

Re: what happens under the hood

Posted by Jonathan Coveney <jc...@gmail.com>.
This is a very broad question. On the Pig website you can find some papers
on how Pig was implemented, and this should give you a high level view of
what is going on.

For this code, you can use the explain command (explain in; instead of dump
in;) to see the 3 plans that this code generates (logical, physical, mr).
If you want to be a real pro, put in a debug statement in your ide and
actually look at the steps as it builds the logical plan and the converts
to physical and mr.


2012/12/19 jamal sasha <ja...@gmail.com>

> Hi,
>   I am trying to dig deep on the workings of pig libraries.
>
> So can someone help me understand what happens when someone does:
>
> in = load 'in.txt' using PigStorage(',') as (foo:int);
> dump in;
>
> what happens behind the scenes..
> How does it executes map reduce jobs..
> where is this "load" defined in the pig code base .
> I am just trying to see how  the backend code is implemented where this two
> lines of code translates into the map reduce code.
> Any pointers.
> Thanks
> Jamal
>