You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by xinfan meng <mx...@gmail.com> on 2008/06/09 04:00:30 UTC

compute document frequency with hadoop-streaming

In hadoopstreaming, we accept input from stdin. If we want to compute the
document frequncy of words, the somplest way is to output words as keys and
file name as values. then how can we get the input file name passed to this
MapReduce job? Thanks.

-- 
Best Wishes
Meng Xinfan(蒙新泛)
Institute of Computational Linguistics
Department of Computer Science & Technology
School of Electronic Engineering & Computer Science
Peking University
Beijing, 100871
China

Re: compute document frequency with hadoop-streaming

Posted by Miles Osborne <mi...@inf.ed.ac.uk>.
Well, you could have one document per line and another field could easily be
the filename

eg

name\tdocument\n
name\tdocument\n

etc

Miles

2008/6/9 xinfan meng <mx...@gmail.com>:

> In hadoopstreaming, we accept input from stdin. If we want to compute the
> document frequncy of words, the somplest way is to output words as keys and
> file name as values. then how can we get the input file name passed to this
> MapReduce job? Thanks.
>
> --
> Best Wishes
> Meng Xinfan(蒙新泛)
> Institute of Computational Linguistics
> Department of Computer Science & Technology
> School of Electronic Engineering & Computer Science
> Peking University
> Beijing, 100871
> China
>



-- 
The University of Edinburgh is a charitable body, registered in Scotland,
with registration number SC005336.