You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by pa...@yahoo.com on 2013/03/29 13:15:31 UTC
Million docs and word count scenario
If there r 1 million docs in an enterprse and we need to perform word count computation on all the docs what is the first step to be done. Is it to extract all the text of all the docs into a single file and then put into hdfs or put each one separately in hdfs.
Thanks
Sent from BlackBerry® on Airtel
Million docs and word count scenario
Posted by Ling Kun <lk...@gmail.com>.
Maybe har is a choice.
http://hadoop.apache.org/docs/r1.1.2/hadoop_archives.html
Ling kun
On Friday, March 29, 2013, Ted Dunning wrote:
> Putting each document into a separate file is not likely to be a great
> thing to do.
>
> On the other hand, putting them all into one file may not be what you want
> either.
>
> It is probably best to find a middle ground and create files each with
> many documents and each a few gigabytes in size.
>
>
> On Fri, Mar 29, 2013 at 1:15 PM, <pa...@yahoo.com> wrote:
>
>> If there r 1 million docs in an enterprse and we need to perform word
>> count computation on all the docs what is the first step to be done. Is it
>> to extract all the text of all the docs into a single file and then put
>> into hdfs or put each one separately in hdfs.
>> Thanks
>>
>> Sent from BlackBerry® on Airtel
>
>
>
--
http://www.lingcc.com
Re: Million docs and word count scenario
Posted by Ling Kun <lk...@gmail.com>.
Maybe har is a choice.
http://hadoop.apache.org/docs/r1.1.2/hadoop_archives.html
Ling kun
On Friday, March 29, 2013, Ted Dunning wrote:
> Putting each document into a separate file is not likely to be a great
> thing to do.
>
> On the other hand, putting them all into one file may not be what you want
> either.
>
> It is probably best to find a middle ground and create files each with
> many documents and each a few gigabytes in size.
>
>
> On Fri, Mar 29, 2013 at 1:15 PM, <pathurun@yahoo.com <javascript:_e({},
> 'cvml', 'pathurun@yahoo.com');>> wrote:
>
>> If there r 1 million docs in an enterprse and we need to perform word
>> count computation on all the docs what is the first step to be done. Is it
>> to extract all the text of all the docs into a single file and then put
>> into hdfs or put each one separately in hdfs.
>> Thanks
>>
>> Sent from BlackBerry® on Airtel
>
>
>
--
http://www.lingcc.com
Re: Million docs and word count scenario
Posted by Ling Kun <lk...@gmail.com>.
Maybe har is a choice.
http://hadoop.apache.org/docs/r1.1.2/hadoop_archives.html
Ling kun
On Friday, March 29, 2013, Ted Dunning wrote:
> Putting each document into a separate file is not likely to be a great
> thing to do.
>
> On the other hand, putting them all into one file may not be what you want
> either.
>
> It is probably best to find a middle ground and create files each with
> many documents and each a few gigabytes in size.
>
>
> On Fri, Mar 29, 2013 at 1:15 PM, <pathurun@yahoo.com <javascript:_e({},
> 'cvml', 'pathurun@yahoo.com');>> wrote:
>
>> If there r 1 million docs in an enterprse and we need to perform word
>> count computation on all the docs what is the first step to be done. Is it
>> to extract all the text of all the docs into a single file and then put
>> into hdfs or put each one separately in hdfs.
>> Thanks
>>
>> Sent from BlackBerry® on Airtel
>
>
>
--
http://www.lingcc.com
Re: Million docs and word count scenario
Posted by Ling Kun <lk...@gmail.com>.
Maybe har is a choice.
http://hadoop.apache.org/docs/r1.1.2/hadoop_archives.html
Ling kun
On Friday, March 29, 2013, Ted Dunning wrote:
> Putting each document into a separate file is not likely to be a great
> thing to do.
>
> On the other hand, putting them all into one file may not be what you want
> either.
>
> It is probably best to find a middle ground and create files each with
> many documents and each a few gigabytes in size.
>
>
> On Fri, Mar 29, 2013 at 1:15 PM, <pathurun@yahoo.com <javascript:_e({},
> 'cvml', 'pathurun@yahoo.com');>> wrote:
>
>> If there r 1 million docs in an enterprse and we need to perform word
>> count computation on all the docs what is the first step to be done. Is it
>> to extract all the text of all the docs into a single file and then put
>> into hdfs or put each one separately in hdfs.
>> Thanks
>>
>> Sent from BlackBerry® on Airtel
>
>
>
--
http://www.lingcc.com
Million docs and word count scenario
Posted by Ling Kun <lk...@gmail.com>.
Maybe har is a choice.
http://hadoop.apache.org/docs/r1.1.2/hadoop_archives.html
Ling kun
On Friday, March 29, 2013, Ted Dunning wrote:
> Putting each document into a separate file is not likely to be a great
> thing to do.
>
> On the other hand, putting them all into one file may not be what you want
> either.
>
> It is probably best to find a middle ground and create files each with
> many documents and each a few gigabytes in size.
>
>
> On Fri, Mar 29, 2013 at 1:15 PM, <pa...@yahoo.com> wrote:
>
>> If there r 1 million docs in an enterprse and we need to perform word
>> count computation on all the docs what is the first step to be done. Is it
>> to extract all the text of all the docs into a single file and then put
>> into hdfs or put each one separately in hdfs.
>> Thanks
>>
>> Sent from BlackBerry® on Airtel
>
>
>
--
http://www.lingcc.com
Re: Million docs and word count scenario
Posted by Ling Kun <lk...@gmail.com>.
Maybe har is a choice.
http://hadoop.apache.org/docs/r1.1.2/hadoop_archives.html
Ling kun
On Friday, March 29, 2013, Ted Dunning wrote:
> Putting each document into a separate file is not likely to be a great
> thing to do.
>
> On the other hand, putting them all into one file may not be what you want
> either.
>
> It is probably best to find a middle ground and create files each with
> many documents and each a few gigabytes in size.
>
>
> On Fri, Mar 29, 2013 at 1:15 PM, <pathurun@yahoo.com <javascript:_e({},
> 'cvml', 'pathurun@yahoo.com');>> wrote:
>
>> If there r 1 million docs in an enterprse and we need to perform word
>> count computation on all the docs what is the first step to be done. Is it
>> to extract all the text of all the docs into a single file and then put
>> into hdfs or put each one separately in hdfs.
>> Thanks
>>
>> Sent from BlackBerry® on Airtel
>
>
>
--
http://www.lingcc.com
Million docs and word count scenario
Posted by Ling Kun <lk...@gmail.com>.
Maybe har is a choice.
http://hadoop.apache.org/docs/r1.1.2/hadoop_archives.html
Ling kun
On Friday, March 29, 2013, Ted Dunning wrote:
> Putting each document into a separate file is not likely to be a great
> thing to do.
>
> On the other hand, putting them all into one file may not be what you want
> either.
>
> It is probably best to find a middle ground and create files each with
> many documents and each a few gigabytes in size.
>
>
> On Fri, Mar 29, 2013 at 1:15 PM, <pa...@yahoo.com> wrote:
>
>> If there r 1 million docs in an enterprse and we need to perform word
>> count computation on all the docs what is the first step to be done. Is it
>> to extract all the text of all the docs into a single file and then put
>> into hdfs or put each one separately in hdfs.
>> Thanks
>>
>> Sent from BlackBerry® on Airtel
>
>
>
--
http://www.lingcc.com
Million docs and word count scenario
Posted by Ling Kun <lk...@gmail.com>.
Maybe har is a choice.
http://hadoop.apache.org/docs/r1.1.2/hadoop_archives.html
Ling kun
On Friday, March 29, 2013, Ted Dunning wrote:
> Putting each document into a separate file is not likely to be a great
> thing to do.
>
> On the other hand, putting them all into one file may not be what you want
> either.
>
> It is probably best to find a middle ground and create files each with
> many documents and each a few gigabytes in size.
>
>
> On Fri, Mar 29, 2013 at 1:15 PM, <pa...@yahoo.com> wrote:
>
>> If there r 1 million docs in an enterprse and we need to perform word
>> count computation on all the docs what is the first step to be done. Is it
>> to extract all the text of all the docs into a single file and then put
>> into hdfs or put each one separately in hdfs.
>> Thanks
>>
>> Sent from BlackBerry® on Airtel
>
>
>
--
http://www.lingcc.com
Re: Million docs and word count scenario
Posted by Ted Dunning <td...@maprtech.com>.
Putting each document into a separate file is not likely to be a great
thing to do.
On the other hand, putting them all into one file may not be what you want
either.
It is probably best to find a middle ground and create files each with many
documents and each a few gigabytes in size.
On Fri, Mar 29, 2013 at 1:15 PM, <pa...@yahoo.com> wrote:
> If there r 1 million docs in an enterprse and we need to perform word
> count computation on all the docs what is the first step to be done. Is it
> to extract all the text of all the docs into a single file and then put
> into hdfs or put each one separately in hdfs.
> Thanks
>
> Sent from BlackBerry® on Airtel
Re: Million docs and word count scenario
Posted by Ted Dunning <td...@maprtech.com>.
Putting each document into a separate file is not likely to be a great
thing to do.
On the other hand, putting them all into one file may not be what you want
either.
It is probably best to find a middle ground and create files each with many
documents and each a few gigabytes in size.
On Fri, Mar 29, 2013 at 1:15 PM, <pa...@yahoo.com> wrote:
> If there r 1 million docs in an enterprse and we need to perform word
> count computation on all the docs what is the first step to be done. Is it
> to extract all the text of all the docs into a single file and then put
> into hdfs or put each one separately in hdfs.
> Thanks
>
> Sent from BlackBerry® on Airtel
Re: Million docs and word count scenario
Posted by Ted Dunning <td...@maprtech.com>.
Putting each document into a separate file is not likely to be a great
thing to do.
On the other hand, putting them all into one file may not be what you want
either.
It is probably best to find a middle ground and create files each with many
documents and each a few gigabytes in size.
On Fri, Mar 29, 2013 at 1:15 PM, <pa...@yahoo.com> wrote:
> If there r 1 million docs in an enterprse and we need to perform word
> count computation on all the docs what is the first step to be done. Is it
> to extract all the text of all the docs into a single file and then put
> into hdfs or put each one separately in hdfs.
> Thanks
>
> Sent from BlackBerry® on Airtel
Re: Million docs and word count scenario
Posted by Ted Dunning <td...@maprtech.com>.
Putting each document into a separate file is not likely to be a great
thing to do.
On the other hand, putting them all into one file may not be what you want
either.
It is probably best to find a middle ground and create files each with many
documents and each a few gigabytes in size.
On Fri, Mar 29, 2013 at 1:15 PM, <pa...@yahoo.com> wrote:
> If there r 1 million docs in an enterprse and we need to perform word
> count computation on all the docs what is the first step to be done. Is it
> to extract all the text of all the docs into a single file and then put
> into hdfs or put each one separately in hdfs.
> Thanks
>
> Sent from BlackBerry® on Airtel