You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by ru...@rosa.com on 2008/04/09 10:35:04 UTC

Counters giving double values

hello,

i tried to track down the problem with wrong counter values. didnt find 
any information / cases of it. maybe it is a feature i don't understand. 
the problem is, that in a local installation (localjobrunner) i sometimes 
(somehow randomly) get wrong counter values as the log() function of the 
Counters tells me in the log:

20080409044008268 INFO [DefaultQuartzScheduler_Worker-0] JobClient - Job 
complete: job_local_84
20080409044008268 INFO [DefaultQuartzScheduler_Worker-0] JobClient - 
Counters: 21
20080409044008268 INFO [DefaultQuartzScheduler_Worker-0] JobClient - 
Map-Reduce Framework
20080409044008269 INFO [DefaultQuartzScheduler_Worker-0] JobClient - Map 
input records=43917
20080409044008269 INFO [DefaultQuartzScheduler_Worker-0] JobClient - Map 
output records=43917
20080409044008269 INFO [DefaultQuartzScheduler_Worker-0] JobClient - Map 
input bytes=2116832
20080409044008269 INFO [DefaultQuartzScheduler_Worker-0] JobClient - Map 
output bytes=9310354
20080409044008269 INFO [DefaultQuartzScheduler_Worker-0] JobClient - 
Combine input records=0
20080409044008269 INFO [DefaultQuartzScheduler_Worker-0] JobClient - 
Combine output records=0
20080409044008269 INFO [DefaultQuartzScheduler_Worker-0] JobClient - 
Reduce input groups=83736
20080409044008269 INFO [DefaultQuartzScheduler_Worker-0] JobClient - 
Reduce input records=87834
20080409044008269 INFO [DefaultQuartzScheduler_Worker-0] JobClient - 
Reduce output records=86640

what i don't not understand is, why or how it comes, that the number of 
map output records is different than the reduce input records.
maybe somebody has a simple explanation for this effect.

i would expect that the number must be same. as you can see the number is 
(always) exactly the double value of what i expect it should be. 
and also when i check the output file, the amount of records matches the 
numbers from the map counters and not the double values.

thanks,

ud





RE: Counters giving double values

Posted by Runping Qi <ru...@yahoo-inc.com>.

Here is a related jira:
https://issues.apache.org/jira/browse/HADOOP-3126


> -----Original Message-----
> From: Devaraj Das [mailto:ddas@yahoo-inc.com]
> Sent: Wednesday, April 16, 2008 3:56 AM
> To: core-user@hadoop.apache.org
> Subject: RE: Counters giving double values
> 
> Also, in those cases where you see wrong counter values, did you
validate
> the final (reduce) output for correctness (I am just trying to see
whether
> the problem is with the Counter updates).
> 
> > -----Original Message-----
> > From: Devaraj Das [mailto:ddas@yahoo-inc.com]
> > Sent: Wednesday, April 16, 2008 4:23 PM
> > To: core-user@hadoop.apache.org
> > Subject: RE: Counters giving double values
> >
> > Thanks for the detailed answer. Which hadoop version are you
> > on? If you are confident that it is not a problem with your
> > app, pls raise a jira.
> >
> >
> >   _____
> >
> > From: rude@rosa.com [mailto:rude@rosa.com]
> > Sent: Wednesday, April 16, 2008 3:25 PM
> > To: core-user@hadoop.apache.org
> > Subject: RE: Counters giving double values
> >
> >
> >
> > Thanks so far.
> >
> > key and values are custom implementations.
> >
> > key implements WritableComparable
> > value extends VersionedWritable
> >
> > btw. The only problem i encounter is that the Counter values
> > are wrong. If i
> > check the records in the MapFile (re-read it) which is
> > written as the output
> > of the mapred job, the amount of records is correct and
> > represents the halve
> > of the reported counter value.
> > the same applies for the results of the operations which are
> > carried out in
> > the reduce(). everything is correct, except the counter values.
> >
> > the whole thing happens only sometimes.
> >
> >
> > Key serializing / Deserializing, i guess you want to see this
> > part of the
> > code:
> >
> >         public int language;
> >         public String term;
> >
> >         public void readFields(DataInput in) throws IOException {
> >                 language = in.readInt();
> >                 term =         Text.readString(in).toString();
> >         }
> >
> >         public void write(DataOutput out) throws IOException {
> >                 out.writeInt(language);
> >                 Text.writeString(out, term);
> >         }
> >
> >
> >
> >
> >


RE: Counters giving double values

Posted by Devaraj Das <dd...@yahoo-inc.com>.
Thanks! Will take a look at the jira issue 3267


  _____  

From: rude@rosa.com [mailto:rude@rosa.com] 
Sent: Thursday, April 17, 2008 7:09 PM
To: core-user@hadoop.apache.org
Subject: RE: Counters giving double values



hi devraj, 

so, i researched the topic with the counters further with some success. 
for one i can reproduce it now with a Test. 

i am waiting for the password for my JIRA account to get started there -
somehow i didnt get the password after registration, i sent a mail to owen. 
i am not familiar with the proceedings on JIRA / ASF... so if you see
mistakes please guide me. thanks. 

if you are interested, try out the attached testcase. i figured it is some
timing issue within the localjobrunner (in my case, dont know about
distributed running yet)  and i will try to provide a patch. 
can i now submit this testcase trough JIRA? do i need to change something on
it? 

kind regards, 

ud 







"Devaraj Das" <dd...@yahoo-inc.com> 


04/16/2008 01:18 PM 


Please respond to
core-user@hadoop.apache.org



To
<co...@hadoop.apache.org> 

cc

Subject
RE: Counters giving double values

	




Pls file a jira for the counter updates part. It will be excellent if you
can also attach a testcase that can reproduce the problem (maybe a stripped
down version of your app or something). 

> -----Original Message-----
> From: rude@rosa.com [ <ma...@rosa.com> mailto:rude@rosa.com] 
> Sent: Wednesday, April 16, 2008 4:35 PM
> To: core-user@hadoop.apache.org
> Subject: RE: Counters giving double values
> 
> hadoop 0.16.2
> (and as i remember, i had the same issue with 0.16.0)
> 
> Yes, the final data output at the end IS CORRECT. 
> only the counter values are wrong.
> 
> i didnt try to run it in a distributed environment yet. only local.
> 
> 
> 
> 
> 
> 
> "Devaraj Das" <dd...@yahoo-inc.com>
> 04/16/2008 12:56 PM
> Please respond to
> core-user@hadoop.apache.org
> 
> 
> To
> <co...@hadoop.apache.org>
> cc
> 
> Subject
> RE: Counters giving double values
> 
> 
> 
> 
> 
> 
> Also, in those cases where you see wrong counter values, did 
> you validate
> the final (reduce) output for correctness (I am just trying 
> to see whether
> the problem is with the Counter updates). 
> 
> > -----Original Message-----
> > From: Devaraj Das [ <ma...@yahoo-inc.com>
mailto:ddas@yahoo-inc.com] 
> > Sent: Wednesday, April 16, 2008 4:23 PM
> > To: core-user@hadoop.apache.org
> > Subject: RE: Counters giving double values
> > 
> > Thanks for the detailed answer. Which hadoop version are you 
> > on? If you are confident that it is not a problem with your 
> > app, pls raise a jira.
> > 
> > 
> >   _____ 
> > 
> > From: rude@rosa.com [ <ma...@rosa.com> mailto:rude@rosa.com]
> > Sent: Wednesday, April 16, 2008 3:25 PM
> > To: core-user@hadoop.apache.org
> > Subject: RE: Counters giving double values
> > 
> > 
> > 
> > Thanks so far. 
> > 
> > key and values are custom implementations. 
> > 
> > key implements WritableComparable 
> > value extends VersionedWritable 
> > 
> > btw. The only problem i encounter is that the Counter values 
> > are wrong. If i
> > check the records in the MapFile (re-read it) which is 
> > written as the output
> > of the mapred job, the amount of records is correct and 
> > represents the halve
> > of the reported counter value. 
> > the same applies for the results of the operations which are 
> > carried out in
> > the reduce(). everything is correct, except the counter values. 
> > 
> > the whole thing happens only sometimes. 
> > 
> > 
> > Key serializing / Deserializing, i guess you want to see this 
> > part of the
> > code: 
> > 
> >         public int language; 
> >         public String term; 
> > 
> >         public void readFields(DataInput in) throws IOException { 
> >                 language = in.readInt(); 
> >                 term =         Text.readString(in).toString(); 
> >         } 
> > 
> >         public void write(DataOutput out) throws IOException { 
> >                 out.writeInt(language); 
> >                 Text.writeString(out, term); 
> >         } 
> > 
> > 
> > 
> > 
> > 
> 
> 
> 





RE: Counters giving double values

Posted by ru...@rosa.com.
hi devraj,

so, i researched the topic with the counters further with some success.
for one i can reproduce it now with a Test.

i am waiting for the password for my JIRA account to get started there - 
somehow i didnt get the password after registration, i sent a mail to 
owen.
i am not familiar with the proceedings on JIRA / ASF... so if you see 
mistakes please guide me. thanks. 

if you are interested, try out the attached testcase. i figured it is some 
timing issue within the localjobrunner (in my case, dont know about 
distributed running yet)  and i will try to provide a patch.
can i now submit this testcase trough JIRA? do i need to change something 
on it?

kind regards,

ud







"Devaraj Das" <dd...@yahoo-inc.com> 
04/16/2008 01:18 PM
Please respond to
core-user@hadoop.apache.org


To
<co...@hadoop.apache.org>
cc

Subject
RE: Counters giving double values






Pls file a jira for the counter updates part. It will be excellent if you
can also attach a testcase that can reproduce the problem (maybe a 
stripped
down version of your app or something). 

> -----Original Message-----
> From: rude@rosa.com [mailto:rude@rosa.com] 
> Sent: Wednesday, April 16, 2008 4:35 PM
> To: core-user@hadoop.apache.org
> Subject: RE: Counters giving double values
> 
> hadoop 0.16.2
> (and as i remember, i had the same issue with 0.16.0)
> 
> Yes, the final data output at the end IS CORRECT. 
> only the counter values are wrong.
> 
> i didnt try to run it in a distributed environment yet. only local.
> 
> 
> 
> 
> 
> 
> "Devaraj Das" <dd...@yahoo-inc.com>
> 04/16/2008 12:56 PM
> Please respond to
> core-user@hadoop.apache.org
> 
> 
> To
> <co...@hadoop.apache.org>
> cc
> 
> Subject
> RE: Counters giving double values
> 
> 
> 
> 
> 
> 
> Also, in those cases where you see wrong counter values, did 
> you validate
> the final (reduce) output for correctness (I am just trying 
> to see whether
> the problem is with the Counter updates). 
> 
> > -----Original Message-----
> > From: Devaraj Das [mailto:ddas@yahoo-inc.com] 
> > Sent: Wednesday, April 16, 2008 4:23 PM
> > To: core-user@hadoop.apache.org
> > Subject: RE: Counters giving double values
> > 
> > Thanks for the detailed answer. Which hadoop version are you 
> > on? If you are confident that it is not a problem with your 
> > app, pls raise a jira.
> > 
> > 
> >   _____ 
> > 
> > From: rude@rosa.com [mailto:rude@rosa.com]
> > Sent: Wednesday, April 16, 2008 3:25 PM
> > To: core-user@hadoop.apache.org
> > Subject: RE: Counters giving double values
> > 
> > 
> > 
> > Thanks so far. 
> > 
> > key and values are custom implementations. 
> > 
> > key implements WritableComparable 
> > value extends VersionedWritable 
> > 
> > btw. The only problem i encounter is that the Counter values 
> > are wrong. If i
> > check the records in the MapFile (re-read it) which is 
> > written as the output
> > of the mapred job, the amount of records is correct and 
> > represents the halve
> > of the reported counter value. 
> > the same applies for the results of the operations which are 
> > carried out in
> > the reduce(). everything is correct, except the counter values. 
> > 
> > the whole thing happens only sometimes. 
> > 
> > 
> > Key serializing / Deserializing, i guess you want to see this 
> > part of the
> > code: 
> > 
> >         public int language; 
> >         public String term; 
> > 
> >         public void readFields(DataInput in) throws IOException { 
> >                 language = in.readInt(); 
> >                 term =         Text.readString(in).toString(); 
> >         } 
> > 
> >         public void write(DataOutput out) throws IOException { 
> >                 out.writeInt(language); 
> >                 Text.writeString(out, term); 
> >         } 
> > 
> > 
> > 
> > 
> > 
> 
> 
> 



RE: Counters giving double values

Posted by Devaraj Das <dd...@yahoo-inc.com>.
Pls file a jira for the counter updates part. It will be excellent if you
can also attach a testcase that can reproduce the problem (maybe a stripped
down version of your app or something). 

> -----Original Message-----
> From: rude@rosa.com [mailto:rude@rosa.com] 
> Sent: Wednesday, April 16, 2008 4:35 PM
> To: core-user@hadoop.apache.org
> Subject: RE: Counters giving double values
> 
> hadoop 0.16.2
> (and as i remember, i had the same issue with 0.16.0)
> 
> Yes, the final data output at the end IS CORRECT. 
> only the counter values are wrong.
> 
> i didnt try to run it in a distributed environment yet. only local.
> 
> 
> 
> 
> 
> 
> "Devaraj Das" <dd...@yahoo-inc.com>
> 04/16/2008 12:56 PM
> Please respond to
> core-user@hadoop.apache.org
> 
> 
> To
> <co...@hadoop.apache.org>
> cc
> 
> Subject
> RE: Counters giving double values
> 
> 
> 
> 
> 
> 
> Also, in those cases where you see wrong counter values, did 
> you validate
> the final (reduce) output for correctness (I am just trying 
> to see whether
> the problem is with the Counter updates). 
> 
> > -----Original Message-----
> > From: Devaraj Das [mailto:ddas@yahoo-inc.com] 
> > Sent: Wednesday, April 16, 2008 4:23 PM
> > To: core-user@hadoop.apache.org
> > Subject: RE: Counters giving double values
> > 
> > Thanks for the detailed answer. Which hadoop version are you 
> > on? If you are confident that it is not a problem with your 
> > app, pls raise a jira.
> > 
> > 
> >   _____ 
> > 
> > From: rude@rosa.com [mailto:rude@rosa.com]
> > Sent: Wednesday, April 16, 2008 3:25 PM
> > To: core-user@hadoop.apache.org
> > Subject: RE: Counters giving double values
> > 
> > 
> > 
> > Thanks so far. 
> > 
> > key and values are custom implementations. 
> > 
> > key implements WritableComparable 
> > value extends VersionedWritable 
> > 
> > btw. The only problem i encounter is that the Counter values 
> > are wrong. If i
> > check the records in the MapFile (re-read it) which is 
> > written as the output
> > of the mapred job, the amount of records is correct and 
> > represents the halve
> > of the reported counter value. 
> > the same applies for the results of the operations which are 
> > carried out in
> > the reduce(). everything is correct, except the counter values. 
> > 
> > the whole thing happens only sometimes. 
> > 
> > 
> > Key serializing / Deserializing, i guess you want to see this 
> > part of the
> > code: 
> > 
> >         public int language; 
> >         public String term; 
> > 
> >         public void readFields(DataInput in) throws IOException { 
> >                 language = in.readInt(); 
> >                 term =         Text.readString(in).toString(); 
> >         } 
> > 
> >         public void write(DataOutput out) throws IOException { 
> >                 out.writeInt(language); 
> >                 Text.writeString(out, term); 
> >         } 
> > 
> > 
> > 
> > 
> > 
> 
> 
> 


RE: Counters giving double values

Posted by ru...@rosa.com.
hadoop 0.16.2
(and as i remember, i had the same issue with 0.16.0)

Yes, the final data output at the end IS CORRECT. 
only the counter values are wrong.

i didnt try to run it in a distributed environment yet. only local.






"Devaraj Das" <dd...@yahoo-inc.com> 
04/16/2008 12:56 PM
Please respond to
core-user@hadoop.apache.org


To
<co...@hadoop.apache.org>
cc

Subject
RE: Counters giving double values






Also, in those cases where you see wrong counter values, did you validate
the final (reduce) output for correctness (I am just trying to see whether
the problem is with the Counter updates). 

> -----Original Message-----
> From: Devaraj Das [mailto:ddas@yahoo-inc.com] 
> Sent: Wednesday, April 16, 2008 4:23 PM
> To: core-user@hadoop.apache.org
> Subject: RE: Counters giving double values
> 
> Thanks for the detailed answer. Which hadoop version are you 
> on? If you are confident that it is not a problem with your 
> app, pls raise a jira.
> 
> 
>   _____ 
> 
> From: rude@rosa.com [mailto:rude@rosa.com]
> Sent: Wednesday, April 16, 2008 3:25 PM
> To: core-user@hadoop.apache.org
> Subject: RE: Counters giving double values
> 
> 
> 
> Thanks so far. 
> 
> key and values are custom implementations. 
> 
> key implements WritableComparable 
> value extends VersionedWritable 
> 
> btw. The only problem i encounter is that the Counter values 
> are wrong. If i
> check the records in the MapFile (re-read it) which is 
> written as the output
> of the mapred job, the amount of records is correct and 
> represents the halve
> of the reported counter value. 
> the same applies for the results of the operations which are 
> carried out in
> the reduce(). everything is correct, except the counter values. 
> 
> the whole thing happens only sometimes. 
> 
> 
> Key serializing / Deserializing, i guess you want to see this 
> part of the
> code: 
> 
>         public int language; 
>         public String term; 
> 
>         public void readFields(DataInput in) throws IOException { 
>                 language = in.readInt(); 
>                 term =         Text.readString(in).toString(); 
>         } 
> 
>         public void write(DataOutput out) throws IOException { 
>                 out.writeInt(language); 
>                 Text.writeString(out, term); 
>         } 
> 
> 
> 
> 
> 



RE: Counters giving double values

Posted by Devaraj Das <dd...@yahoo-inc.com>.
Also, in those cases where you see wrong counter values, did you validate
the final (reduce) output for correctness (I am just trying to see whether
the problem is with the Counter updates). 

> -----Original Message-----
> From: Devaraj Das [mailto:ddas@yahoo-inc.com] 
> Sent: Wednesday, April 16, 2008 4:23 PM
> To: core-user@hadoop.apache.org
> Subject: RE: Counters giving double values
> 
> Thanks for the detailed answer. Which hadoop version are you 
> on? If you are confident that it is not a problem with your 
> app, pls raise a jira.
> 
> 
>   _____  
> 
> From: rude@rosa.com [mailto:rude@rosa.com]
> Sent: Wednesday, April 16, 2008 3:25 PM
> To: core-user@hadoop.apache.org
> Subject: RE: Counters giving double values
> 
> 
> 
> Thanks so far. 
> 
> key and values are custom implementations. 
> 
> key implements WritableComparable 
> value extends VersionedWritable 
> 
> btw. The only problem i encounter is that the Counter values 
> are wrong. If i
> check the records in the MapFile (re-read it) which is 
> written as the output
> of the mapred job, the amount of records is correct and 
> represents the halve
> of the reported counter value. 
> the same applies for the results of the operations which are 
> carried out in
> the reduce(). everything is correct, except the counter values. 
> 
> the whole thing happens only sometimes. 
> 
> 
> Key serializing / Deserializing, i guess you want to see this 
> part of the
> code: 
> 
>         public int language; 
>         public String term; 
> 
>         public void readFields(DataInput in) throws IOException { 
>                 language = in.readInt(); 
>                 term =         Text.readString(in).toString(); 
>         } 
> 
>         public void write(DataOutput out) throws IOException { 
>                 out.writeInt(language); 
>                 Text.writeString(out, term); 
>         } 
> 
> 
> 
> 
> 


RE: Counters giving double values

Posted by Devaraj Das <dd...@yahoo-inc.com>.
Thanks for the detailed answer. Which hadoop version are you on? If you are
confident that it is not a problem with your app, pls raise a jira.


  _____  

From: rude@rosa.com [mailto:rude@rosa.com] 
Sent: Wednesday, April 16, 2008 3:25 PM
To: core-user@hadoop.apache.org
Subject: RE: Counters giving double values



Thanks so far. 

key and values are custom implementations. 

key implements WritableComparable 
value extends VersionedWritable 

btw. The only problem i encounter is that the Counter values are wrong. If i
check the records in the MapFile (re-read it) which is written as the output
of the mapred job, the amount of records is correct and represents the halve
of the reported counter value. 
the same applies for the results of the operations which are carried out in
the reduce(). everything is correct, except the counter values. 

the whole thing happens only sometimes. 


Key serializing / Deserializing, i guess you want to see this part of the
code: 

        public int language; 
        public String term; 

        public void readFields(DataInput in) throws IOException { 
                language = in.readInt(); 
                term =         Text.readString(in).toString(); 
        } 

        public void write(DataOutput out) throws IOException { 
                out.writeInt(language); 
                Text.writeString(out, term); 
        } 





RE: Counters giving double values

Posted by ru...@rosa.com.
Thanks so far.

key and values are custom implementations. 

key implements WritableComparable
value extends VersionedWritable

btw. The only problem i encounter is that the Counter values are wrong. If 
i check the records in the MapFile (re-read it) which is written as the 
output of the mapred job, the amount of records is correct and represents 
the halve of the reported counter value. 
the same applies for the results of the operations which are carried out 
in the reduce(). everything is correct, except the counter values.

the whole thing happens only sometimes.


Key serializing / Deserializing, i guess you want to see this part of the 
code:

        public int language;
        public String term;

        public void readFields(DataInput in) throws IOException {
                language = in.readInt();
                term =  Text.readString(in).toString();
        }

        public void write(DataOutput out) throws IOException {
                out.writeInt(language);
                Text.writeString(out, term); 
        }



i runned a mapred job sequencially over a defined set of data. the 
following data represents statistics of the outcome (s specific counter 
value). (horizontally you see the outcome of the different runs on the 
identical data) 
what you can see here, is, that the job behaves differently in every run 
and that it has some data sets where the wrong (double) counter values 
likes to show up (sometimes). 


1606742
1606742
1606742

803371
1606742
803371
770238
770238
770238

770238
770238
770238
743497
743497
743497

743497
743497
743497
788210
788210
788210

788210
788210
788210
765884
765884
765884

765884
765884
765884
784747
784747
784747

784747
784747
784747
760986
760986
760986

760986
760986
760986
746474
746474
746474

746474
746474
746474
742191
742191
742191

742191
742191
742191
716320
716320
716320

716320
716320
716320
696199
696199
696199

696199
696199
696199
732110
732110
732110

732110
732110
732110
739052
739052
739052

739052
739052
739052
756332
756332
756332

756332
756332
756332
744607
744607
744607

744607
744607
744607
748204
748204
748204

748204
748204
748204
736882
736882
736882

736882
736882
736882
700750
700750
700750

700750
700750
700750
259806
259806
259806

259806
259806
259806
1279252
639626
1279252

1279252
1279252
1279252
1270780
1270780
1270780

1270780
1270780
1270780
524832
524832
524832

524832
524832
524832
520642
520642
520642

520642
520642
520642
534722
534722
534722

534722
534722
534722
561428
1122856
1122856

561428
561428
1122856
535732
535732
535732

535732
535732
535732
525191
525191
525191

525191
525191
525191
515451
515451
515451

515451
515451
515451
531535
531535
531535

531535
531535
531535
502990
502990
502990

502990
502990
502990
515943
515943
515943

515943
515943
515943
523106
523106
523106

523106
523106
523106
510348
510348
510348

510348
510348
510348
535886
535886
535886

535886
535886
535886
522792
522792
522792

522792
522792
522792
535406
535406
535406

535406
535406
535406
523322
1046644
523322

523322
523322
523322
546711
546711
546711

546711
546711
546711
565917
565917
1131834

1131834
565917
1131834
564474
1128948
1128948

1128948
564474
1128948
1148090
1148090
1148090

1148090
1148090
1148090
565539
1131078
1131078

565539
565539
1131078
567117
1134234
1134234

567117
1134234
1134234
549599
549599
1099198

1099198
549599
549599
574316
1148632
1148632

1148632
1148632
1148632
538347
538347
538347

538347
538347
538347
1165756
1165756
1165756

1165756
1165756
1165756
533576
533576
533576

533576
533576
533576
513303
513303
513303

513303
513303
513303
564471
1128942
1128942

564471
564471
1128942
550233
550233
1100466

550233
550233
550233
543544
543544
543544

543544
543544
543544
539658
539658
539658

539658
539658
539658
559072
559072
1118144

559072
559072
559072
544966
544966
544966

544966
544966
544966
532842
532842
532842

532842
532842
532842
544734
544734
544734

544734
544734
544734
540757
540757
540757

540757
540757
540757
542141
542141
542141

542141
542141
542141
530072
530072
530072

530072
530072
530072
538829
538829
538829

538829
538829
538829
541023
541023
541023

541023
541023
541023
555735
555735
555735

555735
555735
555735
1193444
1193444
1193444

1193444
1193444
1193444
571797
571797
1143594

1143594
571797
1143594
550118
550118
550118

550118
550118
550118
566764
566764
1133528

566764
566764
566764
555628
555628
555628

1111256
555628
555628
560112
560112
560112

560112
560112
560112
558385
558385
558385

1116770
558385
558385
540215
540215
540215

540215
540215
540215
550617
550617
550617

550617
550617
550617


since nobody reported a similar problem i guess the problem must be in my 
code.... but i cannot imagine what i do wrong since the mapred job runs, 
once configured, in the hadoop jobrunner.



 






"Devaraj Das" <dd...@yahoo-inc.com> 
04/16/2008 10:43 AM
Please respond to
core-user@hadoop.apache.org


To
<co...@hadoop.apache.org>
cc

Subject
RE: Counters giving double values






Input group refers to the number of <key, {set-of-values}> that the 
reducer
gets. So if your map output looked like 
<K, 1>
<K, 1>
To the reducer, the number of input groups would be 1 - <K, {1,1}> and the
number of records would be 2.

The fact that the number of reduce input records doubled beats me ... 
Could
you please let us know the map output key/value types and if the types are
not the hadoop built-in types, pls also paste the code for
serializing/deserializing them.

> -----Original Message-----
> From: rude@rosa.com [mailto:rude@rosa.com] 
> Sent: Wednesday, April 16, 2008 12:41 PM
> To: core-user@hadoop.apache.org
> Subject: Re: Counters giving double values
> 
> Hello, 
> 
> please provide me some help by giving me answers for the following
> questions:
> 
> Is my assumption right, that the within a mapred job, the 
> number of "Map output records" should/must match the number 
> "Reduce input records" when no combiner is used in between?
> 
> Can i screw the code in a way that the "Reduce input records" 
> is twice the value of "Map output records"? And if, what 
> could be my mistake?
> 
> What is an "input group"?
> 
> Any input is appreciated,
> 
> ud
> 
> 
> 
> 
> 
> 
> 
> 
> rude@rosa.com
> 04/10/2008 05:54 PM
> Please respond to
> core-user@hadoop.apache.org
> 
> 
> To
> core-user@hadoop.apache.org
> cc
> 
> Subject
> Re: Counters giving double values
> 
> 
> 
> 
> 
> 
> Hello list readers,
> 
> i'm still looking for an explanation.
> 
> maybe i put it into this way: i run a job and at the end i 
> need to know 
> how many instances of a specific type were written into an 
> output file. 
> i wanted to rely on the counters. but maybe this is not a good idea 
> anyway. or what is the outcome if speculative execution is used?
> 
> how can i get the number of records contained in a MapFile - 
> without the 
> need to read it?
> 
> thank you for some input,
> 
> ud
> 
> 
> 
> 
> 
> 
> 
> rude@rosa.com 
> 04/09/2008 10:35 AM
> Please respond to
> core-user@hadoop.apache.org
> 
> 
> To
> core-user@hadoop.apache.org
> cc
> 
> Subject
> Counters giving double values
> 
> 
> 
> 
> 
> 
> hello,
> 
> i tried to track down the problem with wrong counter values. 
> didnt find 
> any information / cases of it. maybe it is a feature i don't 
> understand. 
> the problem is, that in a local installation (localjobrunner) 
> i sometimes 
> (somehow randomly) get wrong counter values as the log() 
> function of the 
> Counters tells me in the log:
> 
> 20080409044008268 INFO [DefaultQuartzScheduler_Worker-0] 
> JobClient - Job 
> complete: job_local_84
> 20080409044008268 INFO [DefaultQuartzScheduler_Worker-0] JobClient - 
> Counters: 21
> 20080409044008268 INFO [DefaultQuartzScheduler_Worker-0] JobClient - 
> Map-Reduce Framework
> 20080409044008269 INFO [DefaultQuartzScheduler_Worker-0] 
> JobClient - Map 
> input records=43917
> 20080409044008269 INFO [DefaultQuartzScheduler_Worker-0] 
> JobClient - Map 
> output records=43917
> 20080409044008269 INFO [DefaultQuartzScheduler_Worker-0] 
> JobClient - Map 
> input bytes=2116832
> 20080409044008269 INFO [DefaultQuartzScheduler_Worker-0] 
> JobClient - Map 
> output bytes=9310354
> 20080409044008269 INFO [DefaultQuartzScheduler_Worker-0] JobClient - 
> Combine input records=0
> 20080409044008269 INFO [DefaultQuartzScheduler_Worker-0] JobClient - 
> Combine output records=0
> 20080409044008269 INFO [DefaultQuartzScheduler_Worker-0] JobClient - 
> Reduce input groups=83736
> 20080409044008269 INFO [DefaultQuartzScheduler_Worker-0] JobClient - 
> Reduce input records=87834
> 20080409044008269 INFO [DefaultQuartzScheduler_Worker-0] JobClient - 
> Reduce output records=86640
> 
> what i don't not understand is, why or how it comes, that the 
> number of 
> map output records is different than the reduce input records.
> maybe somebody has a simple explanation for this effect.
> 
> i would expect that the number must be same. as you can see 
> the number is 
> (always) exactly the double value of what i expect it should be. 
> and also when i check the output file, the amount of records 
> matches the 
> numbers from the map counters and not the double values.
> 
> thanks,
> 
> ud
> 
> 
> 
> 
> 
> 
> 



RE: Counters giving double values

Posted by Devaraj Das <dd...@yahoo-inc.com>.
Input group refers to the number of <key, {set-of-values}> that the reducer
gets. So if your map output looked like 
<K, 1>
<K, 1>
To the reducer, the number of input groups would be 1 - <K, {1,1}> and the
number of records would be 2.

The fact that the number of reduce input records doubled beats me ... Could
you please let us know the map output key/value types and if the types are
not the hadoop built-in types, pls also paste the code for
serializing/deserializing them.

> -----Original Message-----
> From: rude@rosa.com [mailto:rude@rosa.com] 
> Sent: Wednesday, April 16, 2008 12:41 PM
> To: core-user@hadoop.apache.org
> Subject: Re: Counters giving double values
> 
> Hello, 
> 
> please provide me some help by giving me answers for the following
> questions:
> 
> Is my assumption right, that the within a mapred job, the 
> number of "Map output records" should/must match the number 
> "Reduce input records" when no combiner is used in between?
> 
> Can i screw the code in a way that the "Reduce input records" 
> is twice the value of "Map output records"? And if, what 
> could be my mistake?
> 
> What is an "input group"?
> 
> Any input is appreciated,
> 
> ud
> 
> 
> 
>  
> 
> 
> 
> 
> rude@rosa.com
> 04/10/2008 05:54 PM
> Please respond to
> core-user@hadoop.apache.org
> 
> 
> To
> core-user@hadoop.apache.org
> cc
> 
> Subject
> Re: Counters giving double values
> 
> 
> 
> 
> 
> 
> Hello list readers,
> 
> i'm still looking for an explanation.
> 
> maybe i put it into this way: i run a job and at the end i 
> need to know 
> how many instances of a specific type were written into an 
> output file. 
> i wanted to rely on the counters. but maybe this is not a good idea 
> anyway. or what is the outcome if speculative execution is used?
> 
> how can i get the number of records contained in a MapFile - 
> without the 
> need to read it?
> 
> thank you for some input,
> 
> ud
> 
> 
> 
> 
> 
> 
> 
> rude@rosa.com 
> 04/09/2008 10:35 AM
> Please respond to
> core-user@hadoop.apache.org
> 
> 
> To
> core-user@hadoop.apache.org
> cc
> 
> Subject
> Counters giving double values
> 
> 
> 
> 
> 
> 
> hello,
> 
> i tried to track down the problem with wrong counter values. 
> didnt find 
> any information / cases of it. maybe it is a feature i don't 
> understand. 
> the problem is, that in a local installation (localjobrunner) 
> i sometimes 
> (somehow randomly) get wrong counter values as the log() 
> function of the 
> Counters tells me in the log:
> 
> 20080409044008268 INFO [DefaultQuartzScheduler_Worker-0] 
> JobClient - Job 
> complete: job_local_84
> 20080409044008268 INFO [DefaultQuartzScheduler_Worker-0] JobClient - 
> Counters: 21
> 20080409044008268 INFO [DefaultQuartzScheduler_Worker-0] JobClient - 
> Map-Reduce Framework
> 20080409044008269 INFO [DefaultQuartzScheduler_Worker-0] 
> JobClient - Map 
> input records=43917
> 20080409044008269 INFO [DefaultQuartzScheduler_Worker-0] 
> JobClient - Map 
> output records=43917
> 20080409044008269 INFO [DefaultQuartzScheduler_Worker-0] 
> JobClient - Map 
> input bytes=2116832
> 20080409044008269 INFO [DefaultQuartzScheduler_Worker-0] 
> JobClient - Map 
> output bytes=9310354
> 20080409044008269 INFO [DefaultQuartzScheduler_Worker-0] JobClient - 
> Combine input records=0
> 20080409044008269 INFO [DefaultQuartzScheduler_Worker-0] JobClient - 
> Combine output records=0
> 20080409044008269 INFO [DefaultQuartzScheduler_Worker-0] JobClient - 
> Reduce input groups=83736
> 20080409044008269 INFO [DefaultQuartzScheduler_Worker-0] JobClient - 
> Reduce input records=87834
> 20080409044008269 INFO [DefaultQuartzScheduler_Worker-0] JobClient - 
> Reduce output records=86640
> 
> what i don't not understand is, why or how it comes, that the 
> number of 
> map output records is different than the reduce input records.
> maybe somebody has a simple explanation for this effect.
> 
> i would expect that the number must be same. as you can see 
> the number is 
> (always) exactly the double value of what i expect it should be. 
> and also when i check the output file, the amount of records 
> matches the 
> numbers from the map counters and not the double values.
> 
> thanks,
> 
> ud
> 
> 
> 
> 
> 
> 
> 


Re: Counters giving double values

Posted by ru...@rosa.com.
Hello, 

please provide me some help by giving me answers for the following 
questions:

Is my assumption right, that the within a mapred job, the number of "Map 
output records" should/must match the number "Reduce input records" when 
no combiner is used in between?

Can i screw the code in a way that the "Reduce input records" is twice the 
value of "Map output records"? And if, what could be my mistake?

What is an "input group"?

Any input is appreciated,

ud



 




rude@rosa.com 
04/10/2008 05:54 PM
Please respond to
core-user@hadoop.apache.org


To
core-user@hadoop.apache.org
cc

Subject
Re: Counters giving double values






Hello list readers,

i'm still looking for an explanation.

maybe i put it into this way: i run a job and at the end i need to know 
how many instances of a specific type were written into an output file. 
i wanted to rely on the counters. but maybe this is not a good idea 
anyway. or what is the outcome if speculative execution is used?

how can i get the number of records contained in a MapFile - without the 
need to read it?

thank you for some input,

ud







rude@rosa.com 
04/09/2008 10:35 AM
Please respond to
core-user@hadoop.apache.org


To
core-user@hadoop.apache.org
cc

Subject
Counters giving double values






hello,

i tried to track down the problem with wrong counter values. didnt find 
any information / cases of it. maybe it is a feature i don't understand. 
the problem is, that in a local installation (localjobrunner) i sometimes 
(somehow randomly) get wrong counter values as the log() function of the 
Counters tells me in the log:

20080409044008268 INFO [DefaultQuartzScheduler_Worker-0] JobClient - Job 
complete: job_local_84
20080409044008268 INFO [DefaultQuartzScheduler_Worker-0] JobClient - 
Counters: 21
20080409044008268 INFO [DefaultQuartzScheduler_Worker-0] JobClient - 
Map-Reduce Framework
20080409044008269 INFO [DefaultQuartzScheduler_Worker-0] JobClient - Map 
input records=43917
20080409044008269 INFO [DefaultQuartzScheduler_Worker-0] JobClient - Map 
output records=43917
20080409044008269 INFO [DefaultQuartzScheduler_Worker-0] JobClient - Map 
input bytes=2116832
20080409044008269 INFO [DefaultQuartzScheduler_Worker-0] JobClient - Map 
output bytes=9310354
20080409044008269 INFO [DefaultQuartzScheduler_Worker-0] JobClient - 
Combine input records=0
20080409044008269 INFO [DefaultQuartzScheduler_Worker-0] JobClient - 
Combine output records=0
20080409044008269 INFO [DefaultQuartzScheduler_Worker-0] JobClient - 
Reduce input groups=83736
20080409044008269 INFO [DefaultQuartzScheduler_Worker-0] JobClient - 
Reduce input records=87834
20080409044008269 INFO [DefaultQuartzScheduler_Worker-0] JobClient - 
Reduce output records=86640

what i don't not understand is, why or how it comes, that the number of 
map output records is different than the reduce input records.
maybe somebody has a simple explanation for this effect.

i would expect that the number must be same. as you can see the number is 
(always) exactly the double value of what i expect it should be. 
and also when i check the output file, the amount of records matches the 
numbers from the map counters and not the double values.

thanks,

ud







Re: Counters giving double values

Posted by ru...@rosa.com.
Hello list readers,

i'm still looking for an explanation.

maybe i put it into this way: i run a job and at the end i need to know 
how many instances of a specific type were written into an output file. 
i wanted to rely on the counters. but maybe this is not a good idea 
anyway. or what is the outcome if speculative execution is used?

how can i get the number of records contained in a MapFile - without the 
need to read it?

thank you for some input,

ud







rude@rosa.com 
04/09/2008 10:35 AM
Please respond to
core-user@hadoop.apache.org


To
core-user@hadoop.apache.org
cc

Subject
Counters giving double values






hello,

i tried to track down the problem with wrong counter values. didnt find 
any information / cases of it. maybe it is a feature i don't understand. 
the problem is, that in a local installation (localjobrunner) i sometimes 
(somehow randomly) get wrong counter values as the log() function of the 
Counters tells me in the log:

20080409044008268 INFO [DefaultQuartzScheduler_Worker-0] JobClient - Job 
complete: job_local_84
20080409044008268 INFO [DefaultQuartzScheduler_Worker-0] JobClient - 
Counters: 21
20080409044008268 INFO [DefaultQuartzScheduler_Worker-0] JobClient - 
Map-Reduce Framework
20080409044008269 INFO [DefaultQuartzScheduler_Worker-0] JobClient - Map 
input records=43917
20080409044008269 INFO [DefaultQuartzScheduler_Worker-0] JobClient - Map 
output records=43917
20080409044008269 INFO [DefaultQuartzScheduler_Worker-0] JobClient - Map 
input bytes=2116832
20080409044008269 INFO [DefaultQuartzScheduler_Worker-0] JobClient - Map 
output bytes=9310354
20080409044008269 INFO [DefaultQuartzScheduler_Worker-0] JobClient - 
Combine input records=0
20080409044008269 INFO [DefaultQuartzScheduler_Worker-0] JobClient - 
Combine output records=0
20080409044008269 INFO [DefaultQuartzScheduler_Worker-0] JobClient - 
Reduce input groups=83736
20080409044008269 INFO [DefaultQuartzScheduler_Worker-0] JobClient - 
Reduce input records=87834
20080409044008269 INFO [DefaultQuartzScheduler_Worker-0] JobClient - 
Reduce output records=86640

what i don't not understand is, why or how it comes, that the number of 
map output records is different than the reduce input records.
maybe somebody has a simple explanation for this effect.

i would expect that the number must be same. as you can see the number is 
(always) exactly the double value of what i expect it should be. 
and also when i check the output file, the amount of records matches the 
numbers from the map counters and not the double values.

thanks,

ud