You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Farhan Husain <ru...@gmail.com> on 2009/04/01 02:58:06 UTC

A bizarre problem in reduce method

Hello All,

I am facing some problems with a reduce method I have written which I cannot
understand. Here is the method:

    @Override
    public void reduce(Text key, Iterator<Text> values,
OutputCollector<Text, Text> output, Reporter reporter)
        throws IOException {
        String sValues = "";
        int iCount = 0;
        String sValue;
        while (values.hasNext()) {
            sValue = values.next().toString();
            iCount++;
            sValues += "\t" + sValue;

        }
        sValues += "\t" + iCount;
        //if (iCount == 2)
            output.collect(key, new Text(sValues));
    }

The output of the code is like the following:

D0U0:GraduateStudent0                lehigh:GraduateStudent    1    1    1
D0U0:GraduateStudent1                lehigh:GraduateStudent    1    1    1
D0U0:GraduateStudent10                lehigh:GraduateStudent    1    1    1
D0U0:GraduateStudent100                lehigh:GraduateStudent    1    1    1
D0U0:GraduateStudent101                lehigh:GraduateStudent    1
D0U0:GraduateCourse0    1    2    1
D0U0:GraduateStudent102                lehigh:GraduateStudent    1    1    1
D0U0:GraduateStudent103                lehigh:GraduateStudent    1    1    1
D0U0:GraduateStudent104                lehigh:GraduateStudent    1    1    1
D0U0:GraduateStudent105                lehigh:GraduateStudent    1    1    1

The problem is there cannot be so many 1's in the output value. The output
which I expect should be like this:

D0U0:GraduateStudent0                lehigh:GraduateStudent    1
D0U0:GraduateStudent1                lehigh:GraduateStudent    1
D0U0:GraduateStudent10                lehigh:GraduateStudent    1
D0U0:GraduateStudent100                lehigh:GraduateStudent    1
D0U0:GraduateStudent101                lehigh:GraduateStudent
D0U0:GraduateCourse0    2
D0U0:GraduateStudent102                lehigh:GraduateStudent    1
D0U0:GraduateStudent103                lehigh:GraduateStudent    1
D0U0:GraduateStudent104                lehigh:GraduateStudent    1
D0U0:GraduateStudent105                lehigh:GraduateStudent    1

If I do not append the iCount variable to sValues string, I get the
following output:

D0U0:GraduateStudent0                lehigh:GraduateStudent
D0U0:GraduateStudent1                lehigh:GraduateStudent
D0U0:GraduateStudent10                lehigh:GraduateStudent
D0U0:GraduateStudent100                lehigh:GraduateStudent
D0U0:GraduateStudent101                lehigh:GraduateStudent
D0U0:GraduateCourse0
D0U0:GraduateStudent102                lehigh:GraduateStudent
D0U0:GraduateStudent103                lehigh:GraduateStudent
D0U0:GraduateStudent104                lehigh:GraduateStudent
D0U0:GraduateStudent105                lehigh:GraduateStudent

This confirms that there is no 1's after each of those values (which I
already know from the intput data). I do not know why the output is
distorted like that when I append the iCount to sValues (like the given
code). Can anyone help in this regard?

Now comes the second problem which is equally perplexing. Actually, the
reduce method which I want to run is like the following:

    @Override
    public void reduce(Text key, Iterator<Text> values,
OutputCollector<Text, Text> output, Reporter reporter)
        throws IOException {
        String sValues = "";
        int iCount = 0;
        String sValue;
        while (values.hasNext()) {
            sValue = values.next().toString();
            iCount++;
            sValues += "\t" + sValue;

        }
        sValues += "\t" + iCount;
        if (iCount == 2)
            output.collect(key, new Text(sValues));
    }

I want to output only if "values" contained only two elements. By looking at
the output above you can see that there is at least one such key values pair
where values have exactly two elements. But when I run the code I get an
empty output file. Can anyone solve this?

I have tried many versions of the code (e.g. using StringBuffer instead of
String, using flags instead of integer count) but nothing works. Are these
problems due to bugs in Hadoop? Please let me know any kind of solution you
can think of.

Thanks,

-- 
Mohammad Farhan Husain
Research Assistant
Department of Computer Science
Erik Jonsson School of Engineering and Computer Science
University of Texas at Dallas

Re: A bizarre problem in reduce method

Posted by Farhan Husain <ru...@gmail.com>.
Thanks Rasit for your suggestion. Actually, I should have let the group know
earlier that I solved the problem and it had nothing to do with the reduce
method. I used my reducer class as the combiner too which is not appropriate
in this case. I just got rid of the combiner and everything works fine now.
I think the Map/Reduce tutorial in hadoop's website should talk more about
the combiner. In the word count example the reducer can work as a combiner
but not in all other problems. This should be highlighted a little bit more
in the tutorial.

On Thu, Apr 2, 2009 at 8:50 AM, Rasit OZDAS <ra...@gmail.com> wrote:

> Hi, Husain,
>
> 1. You can use a boolean control in your code.
>       boolean hasAlreadyOned = false;
>        int iCount = 0;
>       String sValue;
>       while (values.hasNext()) {
>           sValue = values.next().toString();
>           iCount++;
>            if (sValue.equals("1"))
>                 hasAlreadyOned = true;
>
>           if (!hasAlreadyOned)
>                 sValues += "\t" + sValue;
>       }
>       ...
>
> 2. You're actually controlling for 3 elements, not 2. You should use  if
> (iCount == 1)
>
> 2009/4/1 Farhan Husain <ru...@gmail.com>
>
> > Hello All,
> >
> > I am facing some problems with a reduce method I have written which I
> > cannot
> > understand. Here is the method:
> >
> >    @Override
> >    public void reduce(Text key, Iterator<Text> values,
> > OutputCollector<Text, Text> output, Reporter reporter)
> >        throws IOException {
> >        String sValues = "";
> >        int iCount = 0;
> >        String sValue;
> >        while (values.hasNext()) {
> >            sValue = values.next().toString();
> >            iCount++;
> >            sValues += "\t" + sValue;
> >
> >        }
> >        sValues += "\t" + iCount;
> >        //if (iCount == 2)
> >            output.collect(key, new Text(sValues));
> >    }
> >
> > The output of the code is like the following:
> >
> > D0U0:GraduateStudent0                lehigh:GraduateStudent    1    1
>  1
> > D0U0:GraduateStudent1                lehigh:GraduateStudent    1    1
>  1
> > D0U0:GraduateStudent10                lehigh:GraduateStudent    1    1
>  1
> > D0U0:GraduateStudent100                lehigh:GraduateStudent    1    1
> >  1
> > D0U0:GraduateStudent101                lehigh:GraduateStudent    1
> > D0U0:GraduateCourse0    1    2    1
> > D0U0:GraduateStudent102                lehigh:GraduateStudent    1    1
> >  1
> > D0U0:GraduateStudent103                lehigh:GraduateStudent    1    1
> >  1
> > D0U0:GraduateStudent104                lehigh:GraduateStudent    1    1
> >  1
> > D0U0:GraduateStudent105                lehigh:GraduateStudent    1    1
> >  1
> >
> > The problem is there cannot be so many 1's in the output value. The
> output
> > which I expect should be like this:
> >
> > D0U0:GraduateStudent0                lehigh:GraduateStudent    1
> > D0U0:GraduateStudent1                lehigh:GraduateStudent    1
> > D0U0:GraduateStudent10                lehigh:GraduateStudent    1
> > D0U0:GraduateStudent100                lehigh:GraduateStudent    1
> > D0U0:GraduateStudent101                lehigh:GraduateStudent
> > D0U0:GraduateCourse0    2
> > D0U0:GraduateStudent102                lehigh:GraduateStudent    1
> > D0U0:GraduateStudent103                lehigh:GraduateStudent    1
> > D0U0:GraduateStudent104                lehigh:GraduateStudent    1
> > D0U0:GraduateStudent105                lehigh:GraduateStudent    1
> >
> > If I do not append the iCount variable to sValues string, I get the
> > following output:
> >
> > D0U0:GraduateStudent0                lehigh:GraduateStudent
> > D0U0:GraduateStudent1                lehigh:GraduateStudent
> > D0U0:GraduateStudent10                lehigh:GraduateStudent
> > D0U0:GraduateStudent100                lehigh:GraduateStudent
> > D0U0:GraduateStudent101                lehigh:GraduateStudent
> > D0U0:GraduateCourse0
> > D0U0:GraduateStudent102                lehigh:GraduateStudent
> > D0U0:GraduateStudent103                lehigh:GraduateStudent
> > D0U0:GraduateStudent104                lehigh:GraduateStudent
> > D0U0:GraduateStudent105                lehigh:GraduateStudent
> >
> > This confirms that there is no 1's after each of those values (which I
> > already know from the intput data). I do not know why the output is
> > distorted like that when I append the iCount to sValues (like the given
> > code). Can anyone help in this regard?
> >
> > Now comes the second problem which is equally perplexing. Actually, the
> > reduce method which I want to run is like the following:
> >
> >    @Override
> >    public void reduce(Text key, Iterator<Text> values,
> > OutputCollector<Text, Text> output, Reporter reporter)
> >        throws IOException {
> >        String sValues = "";
> >        int iCount = 0;
> >        String sValue;
> >        while (values.hasNext()) {
> >            sValue = values.next().toString();
> >            iCount++;
> >            sValues += "\t" + sValue;
> >
> >        }
> >        sValues += "\t" + iCount;
> >        if (iCount == 2)
> >            output.collect(key, new Text(sValues));
> >    }
> >
> > I want to output only if "values" contained only two elements. By looking
> > at
> > the output above you can see that there is at least one such key values
> > pair
> > where values have exactly two elements. But when I run the code I get an
> > empty output file. Can anyone solve this?
> >
> > I have tried many versions of the code (e.g. using StringBuffer instead
> of
> > String, using flags instead of integer count) but nothing works. Are
> these
> > problems due to bugs in Hadoop? Please let me know any kind of solution
> you
> > can think of.
> >
> > Thanks,
> >
> > --
> > Mohammad Farhan Husain
> > Research Assistant
> > Department of Computer Science
> > Erik Jonsson School of Engineering and Computer Science
> > University of Texas at Dallas
> >
>
>
>
> --
> M. Raşit ÖZDAŞ
>



-- 
Mohammad Farhan Husain
Research Assistant
Department of Computer Science
Erik Jonsson School of Engineering and Computer Science
University of Texas at Dallas

Re: A bizarre problem in reduce method

Posted by Rasit OZDAS <ra...@gmail.com>.
Hi, Husain,

1. You can use a boolean control in your code.
       boolean hasAlreadyOned = false;
       int iCount = 0;
       String sValue;
       while (values.hasNext()) {
           sValue = values.next().toString();
           iCount++;
           if (sValue.equals("1"))
                 hasAlreadyOned = true;

           if (!hasAlreadyOned)
                 sValues += "\t" + sValue;
       }
       ...

2. You're actually controlling for 3 elements, not 2. You should use  if
(iCount == 1)

2009/4/1 Farhan Husain <ru...@gmail.com>

> Hello All,
>
> I am facing some problems with a reduce method I have written which I
> cannot
> understand. Here is the method:
>
>    @Override
>    public void reduce(Text key, Iterator<Text> values,
> OutputCollector<Text, Text> output, Reporter reporter)
>        throws IOException {
>        String sValues = "";
>        int iCount = 0;
>        String sValue;
>        while (values.hasNext()) {
>            sValue = values.next().toString();
>            iCount++;
>            sValues += "\t" + sValue;
>
>        }
>        sValues += "\t" + iCount;
>        //if (iCount == 2)
>            output.collect(key, new Text(sValues));
>    }
>
> The output of the code is like the following:
>
> D0U0:GraduateStudent0                lehigh:GraduateStudent    1    1    1
> D0U0:GraduateStudent1                lehigh:GraduateStudent    1    1    1
> D0U0:GraduateStudent10                lehigh:GraduateStudent    1    1    1
> D0U0:GraduateStudent100                lehigh:GraduateStudent    1    1
>  1
> D0U0:GraduateStudent101                lehigh:GraduateStudent    1
> D0U0:GraduateCourse0    1    2    1
> D0U0:GraduateStudent102                lehigh:GraduateStudent    1    1
>  1
> D0U0:GraduateStudent103                lehigh:GraduateStudent    1    1
>  1
> D0U0:GraduateStudent104                lehigh:GraduateStudent    1    1
>  1
> D0U0:GraduateStudent105                lehigh:GraduateStudent    1    1
>  1
>
> The problem is there cannot be so many 1's in the output value. The output
> which I expect should be like this:
>
> D0U0:GraduateStudent0                lehigh:GraduateStudent    1
> D0U0:GraduateStudent1                lehigh:GraduateStudent    1
> D0U0:GraduateStudent10                lehigh:GraduateStudent    1
> D0U0:GraduateStudent100                lehigh:GraduateStudent    1
> D0U0:GraduateStudent101                lehigh:GraduateStudent
> D0U0:GraduateCourse0    2
> D0U0:GraduateStudent102                lehigh:GraduateStudent    1
> D0U0:GraduateStudent103                lehigh:GraduateStudent    1
> D0U0:GraduateStudent104                lehigh:GraduateStudent    1
> D0U0:GraduateStudent105                lehigh:GraduateStudent    1
>
> If I do not append the iCount variable to sValues string, I get the
> following output:
>
> D0U0:GraduateStudent0                lehigh:GraduateStudent
> D0U0:GraduateStudent1                lehigh:GraduateStudent
> D0U0:GraduateStudent10                lehigh:GraduateStudent
> D0U0:GraduateStudent100                lehigh:GraduateStudent
> D0U0:GraduateStudent101                lehigh:GraduateStudent
> D0U0:GraduateCourse0
> D0U0:GraduateStudent102                lehigh:GraduateStudent
> D0U0:GraduateStudent103                lehigh:GraduateStudent
> D0U0:GraduateStudent104                lehigh:GraduateStudent
> D0U0:GraduateStudent105                lehigh:GraduateStudent
>
> This confirms that there is no 1's after each of those values (which I
> already know from the intput data). I do not know why the output is
> distorted like that when I append the iCount to sValues (like the given
> code). Can anyone help in this regard?
>
> Now comes the second problem which is equally perplexing. Actually, the
> reduce method which I want to run is like the following:
>
>    @Override
>    public void reduce(Text key, Iterator<Text> values,
> OutputCollector<Text, Text> output, Reporter reporter)
>        throws IOException {
>        String sValues = "";
>        int iCount = 0;
>        String sValue;
>        while (values.hasNext()) {
>            sValue = values.next().toString();
>            iCount++;
>            sValues += "\t" + sValue;
>
>        }
>        sValues += "\t" + iCount;
>        if (iCount == 2)
>            output.collect(key, new Text(sValues));
>    }
>
> I want to output only if "values" contained only two elements. By looking
> at
> the output above you can see that there is at least one such key values
> pair
> where values have exactly two elements. But when I run the code I get an
> empty output file. Can anyone solve this?
>
> I have tried many versions of the code (e.g. using StringBuffer instead of
> String, using flags instead of integer count) but nothing works. Are these
> problems due to bugs in Hadoop? Please let me know any kind of solution you
> can think of.
>
> Thanks,
>
> --
> Mohammad Farhan Husain
> Research Assistant
> Department of Computer Science
> Erik Jonsson School of Engineering and Computer Science
> University of Texas at Dallas
>



-- 
M. Raşit ÖZDAŞ