You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by Pedro Costa <ps...@gmail.com> on 2010/08/09 10:56:37 UTC

How read map outputs?

Hi,

1 - I would like to compare programatically the map output and the
reduce input to see if they're equal in MR. So, I'm trying to do an
hash on the output generated by the map, and on the input on the
reduce side and compare them. The problem is that I'm doing the hash
to all the file and not to the key/value pair and as result the hash
produced in the map side is different from the hash produced in the
reduce side.

On the map side, I'm doing an hash to the map output, and on the
reduce side, I'm doing an hash on the reduce input file.

I don't quite understand why the hashes are different. Should there
have any reason?



2 - A possible solution that I would like to do the hash to the
key/value pair. So I've to create a method that would me allow to read
the hey/value pair of any possible map output. I would like to create
a generic method that could read the map outputs that are produced in
the map side and print them out, but I can't find any good example to
try to build a generic method. I facing some difficulties on knowing
how to read the map output files that are written in file or in memory
in the map side.

Can you give me some example on how can I read a key/value pair that
is stored in the disk?


3 - The MR uses class Segments during the sort phase. A Segment
correspond to a pair Key/value in a map output?
For example, if the mapper produces the following map output file:
<A, 1>
<B, 2>

So, this map output contains 2 segments?

Thanks,

-- 
Pedro