You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by xinfan meng <mx...@gmail.com> on 2008/06/09 04:00:30 UTC
compute document frequency with hadoop-streaming
In hadoopstreaming, we accept input from stdin. If we want to compute the
document frequncy of words, the somplest way is to output words as keys and
file name as values. then how can we get the input file name passed to this
MapReduce job? Thanks.
--
Best Wishes
Meng Xinfan(蒙新泛)
Institute of Computational Linguistics
Department of Computer Science & Technology
School of Electronic Engineering & Computer Science
Peking University
Beijing, 100871
China
Re: compute document frequency with hadoop-streaming
Posted by Miles Osborne <mi...@inf.ed.ac.uk>.
Well, you could have one document per line and another field could easily be
the filename
eg
name\tdocument\n
name\tdocument\n
etc
Miles
2008/6/9 xinfan meng <mx...@gmail.com>:
> In hadoopstreaming, we accept input from stdin. If we want to compute the
> document frequncy of words, the somplest way is to output words as keys and
> file name as values. then how can we get the input file name passed to this
> MapReduce job? Thanks.
>
> --
> Best Wishes
> Meng Xinfan(蒙新泛)
> Institute of Computational Linguistics
> Department of Computer Science & Technology
> School of Electronic Engineering & Computer Science
> Peking University
> Beijing, 100871
> China
>
--
The University of Edinburgh is a charitable body, registered in Scotland,
with registration number SC005336.