You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Geoffry Roberts <ge...@gmail.com> on 2011/05/06 19:55:44 UTC

Multiple Outputs Not Being Written to File

All,

I am attempting to take a large file and split it up into a series of
smaller files.  I want the smaller files to be named based on values taken
from the large file.  I am using
org.apache.hadoop.mapreduce.lib.output.MultipleOutputs to do this.

The job runs without error and produces a set of files as expected and each
file is named as expected.  But most of the files are empty.  Apparently, no
data was written to them.  The fact that the file was created at all should
confirm that there was data coming in from the mapper.  When my reducer
counts as it iterates through the values then logs the count.  I am seeing
reasonable counts in my logs.  The number of lines in an output file should
equal the count.   I have counts but no lines.

What could be causing this?

My Mapper:
protected void map(LongWritable key, Text value, Context ctx) throws
IOException,
            InterruptedException {
        String[] ss = value.toString().split(",");
        String locale = ss[F.DEPARTURE_LOCALE];
        ctx.write(new Text(locale), value);
    }

My Reducer:
private MultipleOutputs<Text, Text> mos;

@Override
 protected void setup(Context ctx) throws IOException, InterruptedException
{
        mos = new MultipleOutputs<Text, Text>(ctx);
 }

    @Override
    protected void reduce(Text key, Iterable<Text> values, Context ctx)
            throws IOException, InterruptedException {
        int k = 0;
        /*
         * The key at this point can have blanks and slashes. Let us get rid
         * of both.
         */
        String blankless = key.toString().replace(' ', '+');
        String path = blankless.toString().replace("/", "");
        try {
            for (Text value : values) {
                k++;
                String[] ss = value.toString().split(F.DELIMITER);
                String id = ss[F.ID];
                String[] sslessid = Arrays.copyOfRange(ss, 1, ss.length);
                String line = UT.array2String(sslessid);

// An output file is being created,
                mos.write(new Text(id), new Text(line), path);
            }
        } catch (NullPointerException e) {
            LOG.error("<br/>" + "blankless=" + blankless);
            LOG.error("<br/>" + "values=" + values.toString());
        }

// In my logs, I see reasonable counts even when the output file is empty.
        LOG.info("<br/>key=" + path + " count=" + k);
    }
-- 
Geoffry Roberts

Re: Multiple Outputs Not Being Written to File

Posted by Joey Echeverria <jo...@cloudera.com>.
You need to add a call to MultipleOutputs.close() in your reducer's cleanup:

 public void cleanup(Context) throws IOException {
   mos.close();
   ...
 }

On Fri, May 6, 2011 at 1:55 PM, Geoffry Roberts
<ge...@gmail.com> wrote:
> All,
>
> I am attempting to take a large file and split it up into a series of
> smaller files.  I want the smaller files to be named based on values taken
> from the large file.  I am using
> org.apache.hadoop.mapreduce.lib.output.MultipleOutputs to do this.
>
> The job runs without error and produces a set of files as expected and each
> file is named as expected.  But most of the files are empty.  Apparently, no
> data was written to them.  The fact that the file was created at all should
> confirm that there was data coming in from the mapper.  When my reducer
> counts as it iterates through the values then logs the count.  I am seeing
> reasonable counts in my logs.  The number of lines in an output file should
> equal the count.   I have counts but no lines.
>
> What could be causing this?
>
> My Mapper:
> protected void map(LongWritable key, Text value, Context ctx) throws
> IOException,
>             InterruptedException {
>         String[] ss = value.toString().split(",");
>         String locale = ss[F.DEPARTURE_LOCALE];
>         ctx.write(new Text(locale), value);
>     }
>
> My Reducer:
> private MultipleOutputs<Text, Text> mos;
>
> @Override
>  protected void setup(Context ctx) throws IOException, InterruptedException
> {
>         mos = new MultipleOutputs<Text, Text>(ctx);
>  }
>
>     @Override
>     protected void reduce(Text key, Iterable<Text> values, Context ctx)
>             throws IOException, InterruptedException {
>         int k = 0;
>         /*
>          * The key at this point can have blanks and slashes. Let us get rid
>          * of both.
>          */
>         String blankless = key.toString().replace(' ', '+');
>         String path = blankless.toString().replace("/", "");
>         try {
>             for (Text value : values) {
>                 k++;
>                 String[] ss = value.toString().split(F.DELIMITER);
>                 String id = ss[F.ID];
>                 String[] sslessid = Arrays.copyOfRange(ss, 1, ss.length);
>                 String line = UT.array2String(sslessid);
>
> // An output file is being created,
>                 mos.write(new Text(id), new Text(line), path);
>             }
>         } catch (NullPointerException e) {
>             LOG.error("<br/>" + "blankless=" + blankless);
>             LOG.error("<br/>" + "values=" + values.toString());
>         }
>
> // In my logs, I see reasonable counts even when the output file is empty.
>         LOG.info("<br/>key=" + path + " count=" + k);
>     }
> --
> Geoffry Roberts
>
>



-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434