You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@crunch.apache.org by Sandy Ryza <sa...@cloudera.com> on 2013/06/07 10:54:11 UTC

output records counter

Hey All,

Does Crunch not use the normal MR channels for outputting stuff?  I'm
noticing that when I look at a job's Counters, the output records are
always 0, even when I know data has been written.

thanks
-Sandy

Re: output records counter

Posted by Sandy Ryza <sa...@cloudera.com>.
Ok, makes sense, thanks Gabriel


On Fri, Jun 7, 2013 at 2:20 AM, Gabriel Reid <ga...@gmail.com> wrote:

> Hi Sandy,
>
> Crunch uses something similar to Hadoop's MultipleOutputFormat to allow
> writing multiple outputs in multiple formats from the same job. This leads
> to different counters being used for output, as there can be multiple
> outputs (and therefore multiple counters) from a single job.
>
> The main implementation class of this is o.a.c.io.CrunchOutputs, and the
> counters that contain the actual output count are published in the counter
> group with the name of that class, and the counter name of out<d>, where
> <d> is the index of the output for the job (i.e. starting from 0).
>
> - Gabriel
>
>
>
> On Fri, Jun 7, 2013 at 10:54 AM, Sandy Ryza <sa...@cloudera.com>wrote:
>
>> Hey All,
>>
>> Does Crunch not use the normal MR channels for outputting stuff?  I'm
>> noticing that when I look at a job's Counters, the output records are
>> always 0, even when I know data has been written.
>>
>> thanks
>> -Sandy
>>
>
>

Re: output records counter

Posted by Gabriel Reid <ga...@gmail.com>.
Hi Sandy,

Crunch uses something similar to Hadoop's MultipleOutputFormat to allow
writing multiple outputs in multiple formats from the same job. This leads
to different counters being used for output, as there can be multiple
outputs (and therefore multiple counters) from a single job.

The main implementation class of this is o.a.c.io.CrunchOutputs, and the
counters that contain the actual output count are published in the counter
group with the name of that class, and the counter name of out<d>, where
<d> is the index of the output for the job (i.e. starting from 0).

- Gabriel



On Fri, Jun 7, 2013 at 10:54 AM, Sandy Ryza <sa...@cloudera.com> wrote:

> Hey All,
>
> Does Crunch not use the normal MR channels for outputting stuff?  I'm
> noticing that when I look at a job's Counters, the output records are
> always 0, even when I know data has been written.
>
> thanks
> -Sandy
>