You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@crunch.apache.org by Yong-gang Cao <yc...@siftscience.com> on 2019/03/15 22:58:28 UTC

How can I output Hadoop counters into the log from Crunch job?

HI,
  We found crunch job has no counters outputted into the log. Tried to output logs by following code, but still, those counters emitted by InputFormat classes and those default map reduce counters were missing. How can I see those counters a normal map reduce job will produce?

public static PipelineResult printCounters(Pipeline pipeline) {
    PipelineResult result = pipeline.run();
    List<StageResult> results = result.getStageResults();
    for (StageResult partial : results) {
        String stageId = partial.getStageName();
        log.info(String.format("counters for %s :", stageId));
        for (Map.Entry<String, Set<String>> entry : partial.getCounterNames().entrySet()){
            String groupName = entry.getKey();
            log.info(String.format("    %s :", groupName));
            for (String counterName : entry.getValue()) {
                long value = partial.getCounterValue(groupName, counterName);
                log.info(String.format("        %s : %d", counterName, value));
            }
        }
    }
    return result;
}
I’m expecting to see counters like this, and all counters emitted from input format class instead of only those emitted by Crunch:
Map-Reduce Framework
                Map input records= 1234567890
                Map output records= 1234567
                Map output bytes=1234598760

Thanks!

Re: How can I output Hadoop counters into the log from Crunch job?

Posted by Josh Wills <jo...@gmail.com>.
Hrm; does calling the getCounters() method on the StageResult not return
_all_ of the counters, including the framework ones?

When we have long running indexing jobs at Slack, we extract references to
the running Job instances and periodically poll them while the job is
running to track and update the counters, b/c we ran into issues where we
would lose the counters from the job's application master while Crunch was
performing it's post-run cleanup operations. Have you tried something like
that?

On Fri, Mar 15, 2019 at 4:08 PM Yong-gang Cao <yc...@siftscience.com> wrote:

> HI,
>   We found crunch job has no counters outputted into the log. Tried to
> output logs by following code, but still, those counters emitted by
> InputFormat classes and those default map reduce counters were missing. How
> can I see those counters a normal map reduce job will produce?
>
> public static PipelineResult printCounters(Pipeline pipeline) {
>     PipelineResult result = pipeline.run();
>     List<StageResult> results = result.getStageResults();
>     for (StageResult partial : results) {
>         String stageId = partial.getStageName();
>         log.info(String.format("counters for %s :", stageId));
>         for (Map.Entry<String, Set<String>> entry : partial.getCounterNames().entrySet()){
>             String groupName = entry.getKey();
>             log.info(String.format("    %s :", groupName));
>             for (String counterName : entry.getValue()) {
>                 long value = partial.getCounterValue(groupName, counterName);
>                 log.info(String.format("        %s : %d", counterName, value));
>             }
>         }
>     }
>     return result;
> }
>
> I’m expecting to see counters like this, and all counters emitted from
> input format class instead of only those emitted by Crunch:
> Map-Reduce Framework
>                 Map input records= 1234567890
>                 Map output records= 1234567
>                 Map output bytes=1234598760
>
> Thanks!
>