You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Alexander Pivovarov (JIRA)" <ji...@apache.org> on 2014/10/04 03:45:34 UTC

[jira] [Comment Edited] (TEZ-1344) Combiner counters reported by Tez look wrong

    [ https://issues.apache.org/jira/browse/TEZ-1344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14158858#comment-14158858 ] 

Alexander Pivovarov edited comment on TEZ-1344 at 10/4/14 1:45 AM:
-------------------------------------------------------------------

MR API programm (e.g. org.apache.tez.mapreduce.examples.MapredWordCount) run by yarn-tez always return Counters: 0. 
{code}
hadoop jar tez-tests/target/tez-tests-0.6.0-SNAPSHOT.jar  wordcount -D mapreduce.framework.name=yarn-tez in out
14/10/03 18:22:59 INFO mapreduce.Job:  map 100% reduce 100%
14/10/03 18:22:59 INFO mapreduce.Job: Job job_1412382361327_0008 completed successfully
14/10/03 18:22:59 INFO mapreduce.Job: Counters: 0
{code}

Tez API programm (e.g. org.apache.tez.examples.WordCount) modified as Jeff sugested returns
{code}
$ hadoop jar tez-examples/target/tez-examples-0.6.0-SNAPSHOT.jar wordcount in out
...
	org.apache.tez.common.counters.TaskCounter
		REDUCE_INPUT_GROUPS=35518
		REDUCE_INPUT_RECORDS=284742
		COMBINE_INPUT_RECORDS=0
{code}

comments in org.apache.tez.common.counters.TaskCounte code says
{code}
 COMBINE_OUTPUT_RECORDS, // Not used at the moment.
{code}

I notieced that [~cheolsoo] mentioned class
org.apache.hadoop.mapreduce.TaskCounter   (defined in hadoop jars)

but tez api programm returns counters from different class  (defined in tez jars)
org.apache.tez.common.counters.TaskCounter

I'm confused.
How and what shoud I run by tez to get org.apache.hadoop.mapreduce.TaskCounter  COMBINE_OUTPUT_RECORDS and COMBINE_INPUT_RECORDS  counters?



was (Author: apivovarov):
MR API programm (e.g. org.apache.tez.mapreduce.examples.MapredWordCount) run by yarn-tez always return Counters: 0. 
{code}
hadoop jar tez-tests/target/tez-tests-0.6.0-SNAPSHOT.jar  wordcount -D mapreduce.framework.name=yarn-tez in out
{code}

Tez API programm (e.g. org.apache.tez.examples.WordCount) modified as Jeff sugested returns
{code}
$ hadoop jar tez-examples/target/tez-examples-0.6.0-SNAPSHOT.jar wordcount in out
...
	org.apache.tez.common.counters.TaskCounter
		REDUCE_INPUT_GROUPS=35518
		REDUCE_INPUT_RECORDS=284742
		COMBINE_INPUT_RECORDS=0
{code}

comments in org.apache.tez.common.counters.TaskCounte code says
{code}
 COMBINE_OUTPUT_RECORDS, // Not used at the moment.
{code}

I notieced that [~cheolsoo] mentioned class
org.apache.hadoop.mapreduce.TaskCounter   (defined in hadoop jars)

but tez api programm returns counters from different class  (defined in tez jars)
org.apache.tez.common.counters.TaskCounter

I'm confused.
How and what shoud I run by tez to get org.apache.hadoop.mapreduce.TaskCounter  COMBINE_OUTPUT_RECORDS and COMBINE_INPUT_RECORDS  counters?


> Combiner counters reported by Tez look wrong
> --------------------------------------------
>
>                 Key: TEZ-1344
>                 URL: https://issues.apache.org/jira/browse/TEZ-1344
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Cheolsoo Park
>            Priority: Minor
>
> Combiner input/output counters reported by a Tez job seems wrong
> {code}
> org.apache.hadoop.mapreduce.TaskCounter:
> COMBINE_OUTPUT_RECORDS 35,977,263,353
> COMBINE_INPUT_RECORDS 1,000,529,333
> {code}
> As can be seen, combiner output records > input records?!
> The same counters from a MR job looks as follows-
> {code}
> Map-Reduce Framework:
> Combine output records 1,000,316,600
> Combine input records 35,977,049,632
> {code}
> Somehow input and output are swapped?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)