You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by howard chen <ho...@gmail.com> on 2006/12/10 15:19:35 UTC

How can I get the current file name in the Map function of WC example?

for example, in the Word Count example....

 public void map(WritableComparable key, Writable value,
        OutputCollector output,
        Reporter reporter) throws IOException {
      String line = ((Text)value).toString();
      StringTokenizer itr = new StringTokenizer(line);
      while (itr.hasMoreTokens()) {
        word.set(itr.nextToken());
        output.collect(word, one);
      }
    }


How can I get the file name of value belong to?

Thanks.

Re: How can I get the current file name in the Map function of WC example?

Posted by Arif Iqbal <it...@gmail.com>.
I also want the similar functionality but was wondering if its possible.

On 12/10/06, howard chen <ho...@gmail.com> wrote:
>
> On 12/11/06, Owen O'Malley <ow...@yahoo-inc.com> wrote:
> >
> > On Dec 10, 2006, at 6:19 AM, howard chen wrote:
> >
> > > How can I get the file name of value belong to?
> >
> > Yes, it is set as a property in the JobConf. Look at the wiki page:
> >
> > http://wiki.apache.org/lucene-hadoop/TaskExecutionEnvironment
> >
> > under "localized properties".
> >
> > -- Owen
> >
>
> Thanks!
>
> I have another problem...
>
> in the reduce function of WC example
>
> public void reduce(WritableComparable key, Iterator values,
>         OutputCollector output,
>         Reporter reporter) throws IOException {
>       int sum = 0;
>       while (values.hasNext()) {
>         sum += ((IntWritable) values.next()).get();
>       }
>       output.collect(key, new IntWritable(sum));
>     }
>
>
> rather than output to part-00000..., is it possible to output to a
> separate file (filename = key), with the content is the count?
>
> Thanks.
>

Re: How can I get the current file name in the Map function of WC example?

Posted by Owen O'Malley <ow...@yahoo-inc.com>.
> in the reduce function of WC example
>
> public void reduce(WritableComparable key, Iterator values,
>        OutputCollector output,
>        Reporter reporter) throws IOException {
>      int sum = 0;
>      while (values.hasNext()) {
>        sum += ((IntWritable) values.next()).get();
>      }
>      output.collect(key, new IntWritable(sum));
>    }
>
>
> rather than output to part-00000..., is it possible to output to a
> separate file (filename = key), with the content is the count?

That is possible by just creating a new file in each invocation of  
reduce.

void reduce(WritableComparable key, Iterator values, OutputCollector  
output, Reporter reporter
             ) throws IOException {
   ... compute sum of values ...
   Path outFile = new Path(conf.getOutputDirectory(), key.toString());
   OutputStream out = outFile.getFileSystem(conf).create(outFile);
   ... write sum to out ...
   out.close();
}

You should also turn off speculative execution or use the phased file  
system.

-- Owen

Re: How can I get the current file name in the Map function of WC example?

Posted by howard chen <ho...@gmail.com>.
On 12/11/06, Owen O'Malley <ow...@yahoo-inc.com> wrote:
>
> On Dec 10, 2006, at 6:19 AM, howard chen wrote:
>
> > How can I get the file name of value belong to?
>
> Yes, it is set as a property in the JobConf. Look at the wiki page:
>
> http://wiki.apache.org/lucene-hadoop/TaskExecutionEnvironment
>
> under "localized properties".
>
> -- Owen
>

Thanks!

I have another problem...

in the reduce function of WC example

 public void reduce(WritableComparable key, Iterator values,
        OutputCollector output,
        Reporter reporter) throws IOException {
      int sum = 0;
      while (values.hasNext()) {
        sum += ((IntWritable) values.next()).get();
      }
      output.collect(key, new IntWritable(sum));
    }


rather than output to part-00000..., is it possible to output to a
separate file (filename = key), with the content is the count?

Thanks.

Re: How can I get the current file name in the Map function of WC example?

Posted by Owen O'Malley <ow...@yahoo-inc.com>.
On Dec 10, 2006, at 6:19 AM, howard chen wrote:

> How can I get the file name of value belong to?

Yes, it is set as a property in the JobConf. Look at the wiki page:

http://wiki.apache.org/lucene-hadoop/TaskExecutionEnvironment

under "localized properties".

-- Owen