You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by navaz <na...@gmail.com> on 2014/07/08 01:22:48 UTC

Huge text file for Hadoop Mapreduce

Hi

 

I am running basic word count Mapreduce code.  I have download a file
Gettysburg.txt which is of 1486bytes.  I have 3 datanodes and replication
factor is set to 3. The data is copied into all 3 datanodes but there is
only one map task is running . All other nodes are ideal. I think this is
because I have only one block of data and single task is running. I would
like to download a bigger file say 1GB and want to test the network
shuffling performance. Could you please suggest me where can I download the
huge text file. ?

 

 

Thanks & Regards

 

Abdul Navaz

Re: Huge text file for Hadoop Mapreduce

Posted by Stanley Shi <ss...@gopivotal.com>.

You can get the wikipedia data from it's website, it's pretty big;

Regards,
*Stanley Shi,*



On Tue, Jul 8, 2014 at 1:35 PM, Du Lam <de...@gmail.com> wrote:

> Configuration conf = getConf();
> conf.setLong("mapreduce.input.fileinputformat.split.maxsize",10000000);
>
> // u can set this to some small value (in bytes)  to ensure your file will
> split to multiple mappers , provided the format is not un-splitable format
> like .snappy.
>
>
> On Tue, Jul 8, 2014 at 7:32 AM, Adaryl "Bob" Wakefield, MBA <
> adaryl.wakefield@hotmail.com> wrote:
>
>>   http://www.cs.cmu.edu/~./enron/
>>
>> Not sure the uncompressed size but pretty sure it’s over a Gig.
>>
>> B.
>>
>>  *From:* navaz <na...@gmail.com>
>> *Sent:* Monday, July 07, 2014 6:22 PM
>> *To:* user@hadoop.apache.org
>> *Subject:* Huge text file for Hadoop Mapreduce
>>
>>
>> Hi
>>
>>
>>
>> I am running basic word count Mapreduce code.  I have download a file
>> Gettysburg.txt which is of 1486bytes.  I have 3 datanodes and replication
>> factor is set to 3. The data is copied into all 3 datanodes but there is
>> only one map task is running . All other nodes are ideal. I think this is
>> because I have only one block of data and single task is running. I would
>> like to download a bigger file say 1GB and want to test the network
>> shuffling performance. Could you please suggest me where can I download the
>> huge text file. ?
>>
>>
>>
>>
>>
>> Thanks & Regards
>>
>>
>>
>> Abdul Navaz
>>
>>
>>
>
>

Re: Huge text file for Hadoop Mapreduce

Posted by Stanley Shi <ss...@gopivotal.com>.

You can get the wikipedia data from it's website, it's pretty big;

Regards,
*Stanley Shi,*



On Tue, Jul 8, 2014 at 1:35 PM, Du Lam <de...@gmail.com> wrote:

> Configuration conf = getConf();
> conf.setLong("mapreduce.input.fileinputformat.split.maxsize",10000000);
>
> // u can set this to some small value (in bytes)  to ensure your file will
> split to multiple mappers , provided the format is not un-splitable format
> like .snappy.
>
>
> On Tue, Jul 8, 2014 at 7:32 AM, Adaryl "Bob" Wakefield, MBA <
> adaryl.wakefield@hotmail.com> wrote:
>
>>   http://www.cs.cmu.edu/~./enron/
>>
>> Not sure the uncompressed size but pretty sure it’s over a Gig.
>>
>> B.
>>
>>  *From:* navaz <na...@gmail.com>
>> *Sent:* Monday, July 07, 2014 6:22 PM
>> *To:* user@hadoop.apache.org
>> *Subject:* Huge text file for Hadoop Mapreduce
>>
>>
>> Hi
>>
>>
>>
>> I am running basic word count Mapreduce code.  I have download a file
>> Gettysburg.txt which is of 1486bytes.  I have 3 datanodes and replication
>> factor is set to 3. The data is copied into all 3 datanodes but there is
>> only one map task is running . All other nodes are ideal. I think this is
>> because I have only one block of data and single task is running. I would
>> like to download a bigger file say 1GB and want to test the network
>> shuffling performance. Could you please suggest me where can I download the
>> huge text file. ?
>>
>>
>>
>>
>>
>> Thanks & Regards
>>
>>
>>
>> Abdul Navaz
>>
>>
>>
>
>

Re: Huge text file for Hadoop Mapreduce

Posted by Stanley Shi <ss...@gopivotal.com>.

You can get the wikipedia data from it's website, it's pretty big;

Regards,
*Stanley Shi,*



On Tue, Jul 8, 2014 at 1:35 PM, Du Lam <de...@gmail.com> wrote:

> Configuration conf = getConf();
> conf.setLong("mapreduce.input.fileinputformat.split.maxsize",10000000);
>
> // u can set this to some small value (in bytes)  to ensure your file will
> split to multiple mappers , provided the format is not un-splitable format
> like .snappy.
>
>
> On Tue, Jul 8, 2014 at 7:32 AM, Adaryl "Bob" Wakefield, MBA <
> adaryl.wakefield@hotmail.com> wrote:
>
>>   http://www.cs.cmu.edu/~./enron/
>>
>> Not sure the uncompressed size but pretty sure it’s over a Gig.
>>
>> B.
>>
>>  *From:* navaz <na...@gmail.com>
>> *Sent:* Monday, July 07, 2014 6:22 PM
>> *To:* user@hadoop.apache.org
>> *Subject:* Huge text file for Hadoop Mapreduce
>>
>>
>> Hi
>>
>>
>>
>> I am running basic word count Mapreduce code.  I have download a file
>> Gettysburg.txt which is of 1486bytes.  I have 3 datanodes and replication
>> factor is set to 3. The data is copied into all 3 datanodes but there is
>> only one map task is running . All other nodes are ideal. I think this is
>> because I have only one block of data and single task is running. I would
>> like to download a bigger file say 1GB and want to test the network
>> shuffling performance. Could you please suggest me where can I download the
>> huge text file. ?
>>
>>
>>
>>
>>
>> Thanks & Regards
>>
>>
>>
>> Abdul Navaz
>>
>>
>>
>
>

Re: Huge text file for Hadoop Mapreduce

Posted by Stanley Shi <ss...@gopivotal.com>.

You can get the wikipedia data from it's website, it's pretty big;

Regards,
*Stanley Shi,*



On Tue, Jul 8, 2014 at 1:35 PM, Du Lam <de...@gmail.com> wrote:

> Configuration conf = getConf();
> conf.setLong("mapreduce.input.fileinputformat.split.maxsize",10000000);
>
> // u can set this to some small value (in bytes)  to ensure your file will
> split to multiple mappers , provided the format is not un-splitable format
> like .snappy.
>
>
> On Tue, Jul 8, 2014 at 7:32 AM, Adaryl "Bob" Wakefield, MBA <
> adaryl.wakefield@hotmail.com> wrote:
>
>>   http://www.cs.cmu.edu/~./enron/
>>
>> Not sure the uncompressed size but pretty sure it’s over a Gig.
>>
>> B.
>>
>>  *From:* navaz <na...@gmail.com>
>> *Sent:* Monday, July 07, 2014 6:22 PM
>> *To:* user@hadoop.apache.org
>> *Subject:* Huge text file for Hadoop Mapreduce
>>
>>
>> Hi
>>
>>
>>
>> I am running basic word count Mapreduce code.  I have download a file
>> Gettysburg.txt which is of 1486bytes.  I have 3 datanodes and replication
>> factor is set to 3. The data is copied into all 3 datanodes but there is
>> only one map task is running . All other nodes are ideal. I think this is
>> because I have only one block of data and single task is running. I would
>> like to download a bigger file say 1GB and want to test the network
>> shuffling performance. Could you please suggest me where can I download the
>> huge text file. ?
>>
>>
>>
>>
>>
>> Thanks & Regards
>>
>>
>>
>> Abdul Navaz
>>
>>
>>
>
>

Re: Huge text file for Hadoop Mapreduce

Posted by Du Lam <de...@gmail.com>.

Configuration conf = getConf();
conf.setLong("mapreduce.input.fileinputformat.split.maxsize",10000000);

// u can set this to some small value (in bytes)  to ensure your file will
split to multiple mappers , provided the format is not un-splitable format
like .snappy.


On Tue, Jul 8, 2014 at 7:32 AM, Adaryl "Bob" Wakefield, MBA <
adaryl.wakefield@hotmail.com> wrote:

>   http://www.cs.cmu.edu/~./enron/
>
> Not sure the uncompressed size but pretty sure it’s over a Gig.
>
> B.
>
>  *From:* navaz <na...@gmail.com>
> *Sent:* Monday, July 07, 2014 6:22 PM
> *To:* user@hadoop.apache.org
> *Subject:* Huge text file for Hadoop Mapreduce
>
>
> Hi
>
>
>
> I am running basic word count Mapreduce code.  I have download a file
> Gettysburg.txt which is of 1486bytes.  I have 3 datanodes and replication
> factor is set to 3. The data is copied into all 3 datanodes but there is
> only one map task is running . All other nodes are ideal. I think this is
> because I have only one block of data and single task is running. I would
> like to download a bigger file say 1GB and want to test the network
> shuffling performance. Could you please suggest me where can I download the
> huge text file. ?
>
>
>
>
>
> Thanks & Regards
>
>
>
> Abdul Navaz
>
>
>

Re: Huge text file for Hadoop Mapreduce

Posted by Du Lam <de...@gmail.com>.

Configuration conf = getConf();
conf.setLong("mapreduce.input.fileinputformat.split.maxsize",10000000);

// u can set this to some small value (in bytes)  to ensure your file will
split to multiple mappers , provided the format is not un-splitable format
like .snappy.


On Tue, Jul 8, 2014 at 7:32 AM, Adaryl "Bob" Wakefield, MBA <
adaryl.wakefield@hotmail.com> wrote:

>   http://www.cs.cmu.edu/~./enron/
>
> Not sure the uncompressed size but pretty sure it’s over a Gig.
>
> B.
>
>  *From:* navaz <na...@gmail.com>
> *Sent:* Monday, July 07, 2014 6:22 PM
> *To:* user@hadoop.apache.org
> *Subject:* Huge text file for Hadoop Mapreduce
>
>
> Hi
>
>
>
> I am running basic word count Mapreduce code.  I have download a file
> Gettysburg.txt which is of 1486bytes.  I have 3 datanodes and replication
> factor is set to 3. The data is copied into all 3 datanodes but there is
> only one map task is running . All other nodes are ideal. I think this is
> because I have only one block of data and single task is running. I would
> like to download a bigger file say 1GB and want to test the network
> shuffling performance. Could you please suggest me where can I download the
> huge text file. ?
>
>
>
>
>
> Thanks & Regards
>
>
>
> Abdul Navaz
>
>
>

Re: Huge text file for Hadoop Mapreduce

Posted by Du Lam <de...@gmail.com>.

Configuration conf = getConf();
conf.setLong("mapreduce.input.fileinputformat.split.maxsize",10000000);

// u can set this to some small value (in bytes)  to ensure your file will
split to multiple mappers , provided the format is not un-splitable format
like .snappy.


On Tue, Jul 8, 2014 at 7:32 AM, Adaryl "Bob" Wakefield, MBA <
adaryl.wakefield@hotmail.com> wrote:

>   http://www.cs.cmu.edu/~./enron/
>
> Not sure the uncompressed size but pretty sure it’s over a Gig.
>
> B.
>
>  *From:* navaz <na...@gmail.com>
> *Sent:* Monday, July 07, 2014 6:22 PM
> *To:* user@hadoop.apache.org
> *Subject:* Huge text file for Hadoop Mapreduce
>
>
> Hi
>
>
>
> I am running basic word count Mapreduce code.  I have download a file
> Gettysburg.txt which is of 1486bytes.  I have 3 datanodes and replication
> factor is set to 3. The data is copied into all 3 datanodes but there is
> only one map task is running . All other nodes are ideal. I think this is
> because I have only one block of data and single task is running. I would
> like to download a bigger file say 1GB and want to test the network
> shuffling performance. Could you please suggest me where can I download the
> huge text file. ?
>
>
>
>
>
> Thanks & Regards
>
>
>
> Abdul Navaz
>
>
>

Re: Huge text file for Hadoop Mapreduce

Posted by Du Lam <de...@gmail.com>.

Configuration conf = getConf();
conf.setLong("mapreduce.input.fileinputformat.split.maxsize",10000000);

// u can set this to some small value (in bytes)  to ensure your file will
split to multiple mappers , provided the format is not un-splitable format
like .snappy.


On Tue, Jul 8, 2014 at 7:32 AM, Adaryl "Bob" Wakefield, MBA <
adaryl.wakefield@hotmail.com> wrote:

>   http://www.cs.cmu.edu/~./enron/
>
> Not sure the uncompressed size but pretty sure it’s over a Gig.
>
> B.
>
>  *From:* navaz <na...@gmail.com>
> *Sent:* Monday, July 07, 2014 6:22 PM
> *To:* user@hadoop.apache.org
> *Subject:* Huge text file for Hadoop Mapreduce
>
>
> Hi
>
>
>
> I am running basic word count Mapreduce code.  I have download a file
> Gettysburg.txt which is of 1486bytes.  I have 3 datanodes and replication
> factor is set to 3. The data is copied into all 3 datanodes but there is
> only one map task is running . All other nodes are ideal. I think this is
> because I have only one block of data and single task is running. I would
> like to download a bigger file say 1GB and want to test the network
> shuffling performance. Could you please suggest me where can I download the
> huge text file. ?
>
>
>
>
>
> Thanks & Regards
>
>
>
> Abdul Navaz
>
>
>

Re: Huge text file for Hadoop Mapreduce

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.

http://www.cs.cmu.edu/~./enron/

Not sure the uncompressed size but pretty sure it’s over a Gig.

B.

From: navaz 
Sent: Monday, July 07, 2014 6:22 PM
To: user@hadoop.apache.org 
Subject: Huge text file for Hadoop Mapreduce

Hi

I am running basic word count Mapreduce code.  I have download a file Gettysburg.txt which is of 1486bytes.  I have 3 datanodes and replication factor is set to 3. The data is copied into all 3 datanodes but there is only one map task is running . All other nodes are ideal. I think this is because I have only one block of data and single task is running. I would like to download a bigger file say 1GB and want to test the network shuffling performance. Could you please suggest me where can I download the huge text file. ?

Thanks & Regards

Abdul Navaz

Re: Huge text file for Hadoop Mapreduce

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.

http://www.cs.cmu.edu/~./enron/

Not sure the uncompressed size but pretty sure it’s over a Gig.

B.

From: navaz 
Sent: Monday, July 07, 2014 6:22 PM
To: user@hadoop.apache.org 
Subject: Huge text file for Hadoop Mapreduce

Hi

I am running basic word count Mapreduce code.  I have download a file Gettysburg.txt which is of 1486bytes.  I have 3 datanodes and replication factor is set to 3. The data is copied into all 3 datanodes but there is only one map task is running . All other nodes are ideal. I think this is because I have only one block of data and single task is running. I would like to download a bigger file say 1GB and want to test the network shuffling performance. Could you please suggest me where can I download the huge text file. ?

Thanks & Regards

Abdul Navaz

Re: Huge text file for Hadoop Mapreduce

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.

http://www.cs.cmu.edu/~./enron/

Not sure the uncompressed size but pretty sure it’s over a Gig.

B.

From: navaz 
Sent: Monday, July 07, 2014 6:22 PM
To: user@hadoop.apache.org 
Subject: Huge text file for Hadoop Mapreduce

Hi

I am running basic word count Mapreduce code.  I have download a file Gettysburg.txt which is of 1486bytes.  I have 3 datanodes and replication factor is set to 3. The data is copied into all 3 datanodes but there is only one map task is running . All other nodes are ideal. I think this is because I have only one block of data and single task is running. I would like to download a bigger file say 1GB and want to test the network shuffling performance. Could you please suggest me where can I download the huge text file. ?

Thanks & Regards

Abdul Navaz

Re: Huge text file for Hadoop Mapreduce

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.

http://www.cs.cmu.edu/~./enron/

Not sure the uncompressed size but pretty sure it’s over a Gig.

B.

From: navaz 
Sent: Monday, July 07, 2014 6:22 PM
To: user@hadoop.apache.org 
Subject: Huge text file for Hadoop Mapreduce

Hi

I am running basic word count Mapreduce code.  I have download a file Gettysburg.txt which is of 1486bytes.  I have 3 datanodes and replication factor is set to 3. The data is copied into all 3 datanodes but there is only one map task is running . All other nodes are ideal. I think this is because I have only one block of data and single task is running. I would like to download a bigger file say 1GB and want to test the network shuffling performance. Could you please suggest me where can I download the huge text file. ?

Thanks & Regards

Abdul Navaz