You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by paco <pa...@gmail.com> on 2015/10/04 10:38:43 UTC

Combiner and KeyComposite

I am doing a secondary sort in Hadoop 2.6.0, I am following this 
tutorial: 
https://vangjee.wordpress.com/2012/03/20/secondary-sorting-aka-sorting-values-in-hadoops-mapreduce-programming-paradigm/

I have the exact same code, but now I am trying to improve performance 
so I have decided to add a combiner. I have added two modifications:

Main file:

|job.setCombinerClass(CombinerK.class);
|

Combiner file:

|public class CombinerK extends Reducer<KeyWritable, KeyWritable, KeyWritable, KeyWritable> {

     public void reduce(KeyWritable key, Iterator<KeyWritable> values, Context context) throws IOException, InterruptedException {


         Iterator<KeyWritable> it = values;

         System.err.println("combiner " + key);

         KeyWritable first_value = it.next();
         System.err.println("va: " + first_value);

         while (it.hasNext()) {

             sum += it.next().getSs();

         }
         first_value.setS(sum);
         context.write(key, first_value);


     }
}
|

But it seems that it is not run because I can't find any logs file which 
have the word "combiner". When I saw counters after running, I could see:

|     Combine input records=4040000
     Combine output records=4040000
|

The combiner seems like it is being executed but it seems as it has been 
receiving a call for each key and by this reason it has the same number 
in input as output.


Re: Combiner and KeyComposite

Posted by paco <pa...@gmail.com>.
Yes, I only have one unique node. I am checking over 
/usr/local/hadoop/logs/userlogs/*

In order to check if I am right, I have run another example (Which is 
not using KeyComposite) and I can read the word combiner in my logs file.

Thanks.

On 04/10/15 10:42, ☼ R Nair (रविशंकर नायर) wrote:
>
> Are you checking logs at correct place?
>
>
> On Sun, Oct 4, 2015, 4:39 PM paco <pacopww@gmail.com 
> <ma...@gmail.com>> wrote:
>
>     I am doing a secondary sort in Hadoop 2.6.0, I am following this
>     tutorial:
>     https://vangjee.wordpress.com/2012/03/20/secondary-sorting-aka-sorting-values-in-hadoops-mapreduce-programming-paradigm/
>
>     I have the exact same code, but now I am trying to improve
>     performance so I have decided to add a combiner. I have added two
>     modifications:
>
>     Main file:
>
>     |job.setCombinerClass(CombinerK.class);
>     |
>
>     Combiner file:
>
>     |public class CombinerK extends Reducer<KeyWritable, KeyWritable, KeyWritable, KeyWritable> {
>
>          public void reduce(KeyWritable key, Iterator<KeyWritable> values, Context context) throws IOException, InterruptedException {
>
>
>              Iterator<KeyWritable> it = values;
>
>              System.err.println("combiner " + key);
>
>              KeyWritable first_value = it.next();
>              System.err.println("va: " + first_value);
>
>              while (it.hasNext()) {
>
>                  sum += it.next().getSs();
>
>              }
>              first_value.setS(sum);
>              context.write(key, first_value);
>
>
>          }
>     }
>     |
>
>     But it seems that it is not run because I can't find any logs file
>     which have the word "combiner". When I saw counters after running,
>     I could see:
>
>     |     Combine input records=4040000
>          Combine output records=4040000
>     |
>
>     The combiner seems like it is being executed but it seems as it
>     has been receiving a call for each key and by this reason it has
>     the same number in input as output.
>


Re: Combiner and KeyComposite

Posted by paco <pa...@gmail.com>.
Yes, I only have one unique node. I am checking over 
/usr/local/hadoop/logs/userlogs/*

In order to check if I am right, I have run another example (Which is 
not using KeyComposite) and I can read the word combiner in my logs file.

Thanks.

On 04/10/15 10:42, ☼ R Nair (रविशंकर नायर) wrote:
>
> Are you checking logs at correct place?
>
>
> On Sun, Oct 4, 2015, 4:39 PM paco <pacopww@gmail.com 
> <ma...@gmail.com>> wrote:
>
>     I am doing a secondary sort in Hadoop 2.6.0, I am following this
>     tutorial:
>     https://vangjee.wordpress.com/2012/03/20/secondary-sorting-aka-sorting-values-in-hadoops-mapreduce-programming-paradigm/
>
>     I have the exact same code, but now I am trying to improve
>     performance so I have decided to add a combiner. I have added two
>     modifications:
>
>     Main file:
>
>     |job.setCombinerClass(CombinerK.class);
>     |
>
>     Combiner file:
>
>     |public class CombinerK extends Reducer<KeyWritable, KeyWritable, KeyWritable, KeyWritable> {
>
>          public void reduce(KeyWritable key, Iterator<KeyWritable> values, Context context) throws IOException, InterruptedException {
>
>
>              Iterator<KeyWritable> it = values;
>
>              System.err.println("combiner " + key);
>
>              KeyWritable first_value = it.next();
>              System.err.println("va: " + first_value);
>
>              while (it.hasNext()) {
>
>                  sum += it.next().getSs();
>
>              }
>              first_value.setS(sum);
>              context.write(key, first_value);
>
>
>          }
>     }
>     |
>
>     But it seems that it is not run because I can't find any logs file
>     which have the word "combiner". When I saw counters after running,
>     I could see:
>
>     |     Combine input records=4040000
>          Combine output records=4040000
>     |
>
>     The combiner seems like it is being executed but it seems as it
>     has been receiving a call for each key and by this reason it has
>     the same number in input as output.
>


Re: Combiner and KeyComposite

Posted by paco <pa...@gmail.com>.
Yes, I only have one unique node. I am checking over 
/usr/local/hadoop/logs/userlogs/*

In order to check if I am right, I have run another example (Which is 
not using KeyComposite) and I can read the word combiner in my logs file.

Thanks.

On 04/10/15 10:42, ☼ R Nair (रविशंकर नायर) wrote:
>
> Are you checking logs at correct place?
>
>
> On Sun, Oct 4, 2015, 4:39 PM paco <pacopww@gmail.com 
> <ma...@gmail.com>> wrote:
>
>     I am doing a secondary sort in Hadoop 2.6.0, I am following this
>     tutorial:
>     https://vangjee.wordpress.com/2012/03/20/secondary-sorting-aka-sorting-values-in-hadoops-mapreduce-programming-paradigm/
>
>     I have the exact same code, but now I am trying to improve
>     performance so I have decided to add a combiner. I have added two
>     modifications:
>
>     Main file:
>
>     |job.setCombinerClass(CombinerK.class);
>     |
>
>     Combiner file:
>
>     |public class CombinerK extends Reducer<KeyWritable, KeyWritable, KeyWritable, KeyWritable> {
>
>          public void reduce(KeyWritable key, Iterator<KeyWritable> values, Context context) throws IOException, InterruptedException {
>
>
>              Iterator<KeyWritable> it = values;
>
>              System.err.println("combiner " + key);
>
>              KeyWritable first_value = it.next();
>              System.err.println("va: " + first_value);
>
>              while (it.hasNext()) {
>
>                  sum += it.next().getSs();
>
>              }
>              first_value.setS(sum);
>              context.write(key, first_value);
>
>
>          }
>     }
>     |
>
>     But it seems that it is not run because I can't find any logs file
>     which have the word "combiner". When I saw counters after running,
>     I could see:
>
>     |     Combine input records=4040000
>          Combine output records=4040000
>     |
>
>     The combiner seems like it is being executed but it seems as it
>     has been receiving a call for each key and by this reason it has
>     the same number in input as output.
>


Re: Combiner and KeyComposite

Posted by paco <pa...@gmail.com>.
Yes, I only have one unique node. I am checking over 
/usr/local/hadoop/logs/userlogs/*

In order to check if I am right, I have run another example (Which is 
not using KeyComposite) and I can read the word combiner in my logs file.

Thanks.

On 04/10/15 10:42, ☼ R Nair (रविशंकर नायर) wrote:
>
> Are you checking logs at correct place?
>
>
> On Sun, Oct 4, 2015, 4:39 PM paco <pacopww@gmail.com 
> <ma...@gmail.com>> wrote:
>
>     I am doing a secondary sort in Hadoop 2.6.0, I am following this
>     tutorial:
>     https://vangjee.wordpress.com/2012/03/20/secondary-sorting-aka-sorting-values-in-hadoops-mapreduce-programming-paradigm/
>
>     I have the exact same code, but now I am trying to improve
>     performance so I have decided to add a combiner. I have added two
>     modifications:
>
>     Main file:
>
>     |job.setCombinerClass(CombinerK.class);
>     |
>
>     Combiner file:
>
>     |public class CombinerK extends Reducer<KeyWritable, KeyWritable, KeyWritable, KeyWritable> {
>
>          public void reduce(KeyWritable key, Iterator<KeyWritable> values, Context context) throws IOException, InterruptedException {
>
>
>              Iterator<KeyWritable> it = values;
>
>              System.err.println("combiner " + key);
>
>              KeyWritable first_value = it.next();
>              System.err.println("va: " + first_value);
>
>              while (it.hasNext()) {
>
>                  sum += it.next().getSs();
>
>              }
>              first_value.setS(sum);
>              context.write(key, first_value);
>
>
>          }
>     }
>     |
>
>     But it seems that it is not run because I can't find any logs file
>     which have the word "combiner". When I saw counters after running,
>     I could see:
>
>     |     Combine input records=4040000
>          Combine output records=4040000
>     |
>
>     The combiner seems like it is being executed but it seems as it
>     has been receiving a call for each key and by this reason it has
>     the same number in input as output.
>


Re: Combiner and KeyComposite

Posted by "☼ R Nair (रविशंकर नायर)" <ra...@gmail.com>.
Are you checking logs at correct place?

On Sun, Oct 4, 2015, 4:39 PM paco <pa...@gmail.com> wrote:

> I am doing a secondary sort in Hadoop 2.6.0, I am following this tutorial:
> https://vangjee.wordpress.com/2012/03/20/secondary-sorting-aka-sorting-values-in-hadoops-mapreduce-programming-paradigm/
>
> I have the exact same code, but now I am trying to improve performance so
> I have decided to add a combiner. I have added two modifications:
>
> Main file:
>
> job.setCombinerClass(CombinerK.class);
>
> Combiner file:
>
> public class CombinerK extends Reducer<KeyWritable, KeyWritable, KeyWritable, KeyWritable> {
>
>     public void reduce(KeyWritable key, Iterator<KeyWritable> values, Context context) throws IOException, InterruptedException {
>
>
>         Iterator<KeyWritable> it = values;
>
>         System.err.println("combiner " + key);
>
>         KeyWritable first_value = it.next();
>         System.err.println("va: " + first_value);
>
>         while (it.hasNext()) {
>
>             sum += it.next().getSs();
>
>         }
>         first_value.setS(sum);
>         context.write(key, first_value);
>
>
>     }
> }
>
> But it seems that it is not run because I can't find any logs file which
> have the word "combiner". When I saw counters after running, I could see:
>
>     Combine input records=4040000
>     Combine output records=4040000
>
> The combiner seems like it is being executed but it seems as it has been
> receiving a call for each key and by this reason it has the same number in
> input as output.
>

Re: Combiner and KeyComposite

Posted by "☼ R Nair (रविशंकर नायर)" <ra...@gmail.com>.
Are you checking logs at correct place?

On Sun, Oct 4, 2015, 4:39 PM paco <pa...@gmail.com> wrote:

> I am doing a secondary sort in Hadoop 2.6.0, I am following this tutorial:
> https://vangjee.wordpress.com/2012/03/20/secondary-sorting-aka-sorting-values-in-hadoops-mapreduce-programming-paradigm/
>
> I have the exact same code, but now I am trying to improve performance so
> I have decided to add a combiner. I have added two modifications:
>
> Main file:
>
> job.setCombinerClass(CombinerK.class);
>
> Combiner file:
>
> public class CombinerK extends Reducer<KeyWritable, KeyWritable, KeyWritable, KeyWritable> {
>
>     public void reduce(KeyWritable key, Iterator<KeyWritable> values, Context context) throws IOException, InterruptedException {
>
>
>         Iterator<KeyWritable> it = values;
>
>         System.err.println("combiner " + key);
>
>         KeyWritable first_value = it.next();
>         System.err.println("va: " + first_value);
>
>         while (it.hasNext()) {
>
>             sum += it.next().getSs();
>
>         }
>         first_value.setS(sum);
>         context.write(key, first_value);
>
>
>     }
> }
>
> But it seems that it is not run because I can't find any logs file which
> have the word "combiner". When I saw counters after running, I could see:
>
>     Combine input records=4040000
>     Combine output records=4040000
>
> The combiner seems like it is being executed but it seems as it has been
> receiving a call for each key and by this reason it has the same number in
> input as output.
>

Re: Combiner and KeyComposite

Posted by "☼ R Nair (रविशंकर नायर)" <ra...@gmail.com>.
Are you checking logs at correct place?

On Sun, Oct 4, 2015, 4:39 PM paco <pa...@gmail.com> wrote:

> I am doing a secondary sort in Hadoop 2.6.0, I am following this tutorial:
> https://vangjee.wordpress.com/2012/03/20/secondary-sorting-aka-sorting-values-in-hadoops-mapreduce-programming-paradigm/
>
> I have the exact same code, but now I am trying to improve performance so
> I have decided to add a combiner. I have added two modifications:
>
> Main file:
>
> job.setCombinerClass(CombinerK.class);
>
> Combiner file:
>
> public class CombinerK extends Reducer<KeyWritable, KeyWritable, KeyWritable, KeyWritable> {
>
>     public void reduce(KeyWritable key, Iterator<KeyWritable> values, Context context) throws IOException, InterruptedException {
>
>
>         Iterator<KeyWritable> it = values;
>
>         System.err.println("combiner " + key);
>
>         KeyWritable first_value = it.next();
>         System.err.println("va: " + first_value);
>
>         while (it.hasNext()) {
>
>             sum += it.next().getSs();
>
>         }
>         first_value.setS(sum);
>         context.write(key, first_value);
>
>
>     }
> }
>
> But it seems that it is not run because I can't find any logs file which
> have the word "combiner". When I saw counters after running, I could see:
>
>     Combine input records=4040000
>     Combine output records=4040000
>
> The combiner seems like it is being executed but it seems as it has been
> receiving a call for each key and by this reason it has the same number in
> input as output.
>

Re: Combiner and KeyComposite

Posted by "☼ R Nair (रविशंकर नायर)" <ra...@gmail.com>.
Are you checking logs at correct place?

On Sun, Oct 4, 2015, 4:39 PM paco <pa...@gmail.com> wrote:

> I am doing a secondary sort in Hadoop 2.6.0, I am following this tutorial:
> https://vangjee.wordpress.com/2012/03/20/secondary-sorting-aka-sorting-values-in-hadoops-mapreduce-programming-paradigm/
>
> I have the exact same code, but now I am trying to improve performance so
> I have decided to add a combiner. I have added two modifications:
>
> Main file:
>
> job.setCombinerClass(CombinerK.class);
>
> Combiner file:
>
> public class CombinerK extends Reducer<KeyWritable, KeyWritable, KeyWritable, KeyWritable> {
>
>     public void reduce(KeyWritable key, Iterator<KeyWritable> values, Context context) throws IOException, InterruptedException {
>
>
>         Iterator<KeyWritable> it = values;
>
>         System.err.println("combiner " + key);
>
>         KeyWritable first_value = it.next();
>         System.err.println("va: " + first_value);
>
>         while (it.hasNext()) {
>
>             sum += it.next().getSs();
>
>         }
>         first_value.setS(sum);
>         context.write(key, first_value);
>
>
>     }
> }
>
> But it seems that it is not run because I can't find any logs file which
> have the word "combiner". When I saw counters after running, I could see:
>
>     Combine input records=4040000
>     Combine output records=4040000
>
> The combiner seems like it is being executed but it seems as it has been
> receiving a call for each key and by this reason it has the same number in
> input as output.
>