You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Alexander Pivovarov (JIRA)" <ji...@apache.org> on 2014/10/04 03:45:33 UTC
[jira] [Commented] (TEZ-1344) Combiner counters reported by Tez
look wrong
[ https://issues.apache.org/jira/browse/TEZ-1344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14158858#comment-14158858 ]
Alexander Pivovarov commented on TEZ-1344:
------------------------------------------
MR API programm (e.g. org.apache.tez.mapreduce.examples.MapredWordCount) run by yarn-tez always return Counters: 0.
{code}
hadoop jar tez-tests/target/tez-tests-0.6.0-SNAPSHOT.jar wordcount -D mapreduce.framework.name=yarn-tez in out
{code}
Tez API programm (e.g. org.apache.tez.examples.WordCount) modified as Jeff sugested returns
{code}
$ hadoop jar tez-examples/target/tez-examples-0.6.0-SNAPSHOT.jar wordcount in out
...
org.apache.tez.common.counters.TaskCounter
REDUCE_INPUT_GROUPS=35518
REDUCE_INPUT_RECORDS=284742
COMBINE_INPUT_RECORDS=0
{code}
comments in org.apache.tez.common.counters.TaskCounte code says
{code}
COMBINE_OUTPUT_RECORDS, // Not used at the moment.
{code}
I notieced that [~cheolsoo] mentioned class
org.apache.hadoop.mapreduce.TaskCounter (defined in hadoop jars)
but tez api programm returns counters from different class (defined in tez jars)
org.apache.tez.common.counters.TaskCounter
I'm confused.
How and what shoud I run by tez to get org.apache.hadoop.mapreduce.TaskCounter COMBINE_OUTPUT_RECORDS and COMBINE_INPUT_RECORDS counters?
> Combiner counters reported by Tez look wrong
> --------------------------------------------
>
> Key: TEZ-1344
> URL: https://issues.apache.org/jira/browse/TEZ-1344
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Cheolsoo Park
> Priority: Minor
>
> Combiner input/output counters reported by a Tez job seems wrong
> {code}
> org.apache.hadoop.mapreduce.TaskCounter:
> COMBINE_OUTPUT_RECORDS 35,977,263,353
> COMBINE_INPUT_RECORDS 1,000,529,333
> {code}
> As can be seen, combiner output records > input records?!
> The same counters from a MR job looks as follows-
> {code}
> Map-Reduce Framework:
> Combine output records 1,000,316,600
> Combine input records 35,977,049,632
> {code}
> Somehow input and output are swapped?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)