You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Ji Mahn Ok <ji...@gmail.com> on 2014/07/17 22:13:48 UTC

Where can I find the class in which the job configurations are set in a job jar file produced by pig?

Hello,

I am trying to find the class which contains the job configuration part in
a job file produced by pig.

In detail, for example, I run a PigMix query. As you know, when I run pig
script, it produces job jar file in tmp directory. I stored that job jar
file separately to figure out what kind of MapReduce job is really produced
by pig. But it is hard for me to find the class which has the job
configuration part from the job jar file. What I want to ask is this: where
can I find the class in which the job configurations are set in that job
jar file? Or, is there any other better way to see the real MapReduce job
produced by pig?

Thank you  in advance.

Best Regards,

Re: Where can I find the class in which the job configurations are set in a job jar file produced by pig?

Posted by Cheolsoo Park <pi...@gmail.com>.
Hi Ji Mahn,

Pig doesn't generate MapReduce jobs on the fly. In fact, the way how Pig
works is follows. Pig has generic mapper and reducer classes. Then, it
compiles queries into chunks of physical plans and replay them inside the
generic mapper and reducer. For example, load / group-by / store are
translated into a mapper that contains load / Hadoop shuffle / a reducer
that contains store, and this is called MR plan. You can take a look at the
explain <http://pig.apache.org/docs/r0.13.0/test.html#explain> output to
see how Pig generate a MR plan for your queries.

Thanks,
Cheolsoo



On Thu, Jul 17, 2014 at 1:13 PM, Ji Mahn Ok <ji...@gmail.com> wrote:

> Hello,
>
> I am trying to find the class which contains the job configuration part in
> a job file produced by pig.
>
> In detail, for example, I run a PigMix query. As you know, when I run pig
> script, it produces job jar file in tmp directory. I stored that job jar
> file separately to figure out what kind of MapReduce job is really produced
> by pig. But it is hard for me to find the class which has the job
> configuration part from the job jar file. What I want to ask is this: where
> can I find the class in which the job configurations are set in that job
> jar file? Or, is there any other better way to see the real MapReduce job
> produced by pig?
>
> Thank you  in advance.
>
> Best Regards,
>

Re: Where can I find the class in which the job configurations are set in a job jar file produced by pig?

Posted by Cheolsoo Park <pi...@gmail.com>.
Hi Ji Mahn,

Pig doesn't generate MapReduce jobs on the fly. In fact, the way how Pig
works is follows. Pig has generic mapper and reducer classes. Then, it
compiles queries into chunks of physical plans and replay them inside the
generic mapper and reducer. For example, load / group-by / store are
translated into a mapper that contains load / Hadoop shuffle / a reducer
that contains store, and this is called MR plan. You can take a look at the
explain <http://pig.apache.org/docs/r0.13.0/test.html#explain> output to
see how Pig generate a MR plan for your queries.

Thanks,
Cheolsoo



On Thu, Jul 17, 2014 at 1:13 PM, Ji Mahn Ok <ji...@gmail.com> wrote:

> Hello,
>
> I am trying to find the class which contains the job configuration part in
> a job file produced by pig.
>
> In detail, for example, I run a PigMix query. As you know, when I run pig
> script, it produces job jar file in tmp directory. I stored that job jar
> file separately to figure out what kind of MapReduce job is really produced
> by pig. But it is hard for me to find the class which has the job
> configuration part from the job jar file. What I want to ask is this: where
> can I find the class in which the job configurations are set in that job
> jar file? Or, is there any other better way to see the real MapReduce job
> produced by pig?
>
> Thank you  in advance.
>
> Best Regards,
>