You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by praveenesh kumar <pr...@gmail.com> on 2014/03/25 14:38:22 UTC

Re: Hadoop Takes 6GB Memory to run one mapper

Can you try storing your file as bytes instead of String. I can't think of
any reason why this would require 6 GB heap space.
Can you explain your use-case that might help some alternatives if you are
interested.

Regards
Prav


On Tue, Mar 25, 2014 at 7:31 AM, Nivrutti Shinde
<ni...@gmail.com>wrote:

> Yes it is in setup method, Just I am reading the file which is stored at
> hdfs
>
>
> On Tuesday, 25 March 2014 12:01:08 UTC+5:30, Praveenesh Kumar wrote:
>
>> And I am guessing you are not doing this inside map() method right, its
>> in setup() method ?
>>
>>
>> On Tue, Mar 25, 2014 at 6:05 AM, Nivrutti Shinde <ni...@gmail.com>wrote:
>>
>>> private Map<String,String> mapData = new ConcurrentHashMap<String,
>>> String>(11000000);
>>> FileInputStream fis = new FileInputStream(file);
>>> GZIPInputStream gzipInputStream = new GZIPInputStream(inputStream);
>>> BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(
>>> gzipInputStream));
>>> String line = null;
>>> while((line=bufferedReader.readLine())!=null)
>>> {
>>> String data[] = line.split("\t");
>>> mapData.put(data[0],data[1]);
>>> }
>>>
>>> On Monday, 24 March 2014 19:17:12 UTC+5:30, Praveenesh Kumar wrote:
>>>
>>>> Can you please share your code snippet. Just want to see how are you
>>>> loading your file into mapper ?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Mar 24, 2014 at 1:15 PM, Nivrutti Shinde <ni...@gmail.com>wrote:
>>>>
>>>>> Thanks For your reply,
>>>>>
>>>>> Harsh,
>>>>>
>>>>> I tried THashMap but land up in same issue.
>>>>>
>>>>> David,
>>>>>
>>>>> I tried map side join and cascading approach, but time taken by them
>>>>> is lot.
>>>>>
>>>>> On Friday, 21 March 2014 12:03:28 UTC+5:30, Nivrutti Shinde wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I have use case where I am loading 200 MB file with 11 million
>>>>>> records, (one record length is 12 ). Into map, so while running the hadoop
>>>>>> job, i can quickly get value for the key from each input record in mapper.
>>>>>>
>>>>>> Such a small file but to load the data into map, i have to allocate
>>>>>> the 6 GB heap for the same. when i run small code to load this file on
>>>>>> standalone application, it requires 2 GB memory.
>>>>>>
>>>>>> I dont understand why hadoop required 6GB to load the data into
>>>>>> memory. Hadoop Job Runs fine after that but number of mappers i can run is
>>>>>> 2. I need to get it done this in 2-3 GB only so i can run ateast 8-9
>>>>>> mappers per node.
>>>>>>
>>>>>> I have created gzip file(which is now only 17MB). I have kept the
>>>>>> file on HDFS. Using HDFS API to read the file and loading the data into
>>>>>> map. Block size is 128 MB. Cloudera Hadoop.
>>>>>>
>>>>>> Any help or alternate approaches to load data into memory with
>>>>>> minimum heap size. So i can run many mappers with 2-3 gb memory allocated
>>>>>> to each.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>  --
>>>>>
>>>>> ---
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "CDH Users" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to cdh-user+u...@cloudera.org.
>>>>>
>>>>> For more options, visit https://groups.google.com/a/cl
>>>>> oudera.org/d/optout.
>>>>>
>>>>
>>>>  --
>>>
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "CDH Users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to cdh-user+u...@cloudera.org.
>>> For more options, visit https://groups.google.com/a/
>>> cloudera.org/d/optout.
>>>
>>
>>  --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "CDH Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to cdh-user+unsubscribe@cloudera.org.
> For more options, visit https://groups.google.com/a/cloudera.org/d/optout.
>

RE: Hadoop Takes 6GB Memory to run one mapper

Posted by John Lilley <jo...@redpoint.net>.
This discussion may also be relevant to your question:
http://stackoverflow.com/questions/21005643/container-is-running-beyond-memory-limits
Do you actually need to specify that -Xmx6000m for java heap or could it be one of the other issues discussed?

John

From: John Lilley [mailto:john.lilley@redpoint.net]
Sent: Thursday, March 27, 2014 6:52 AM
To: user@hadoop.apache.org
Subject: RE: Hadoop Takes 6GB Memory to run one mapper

Could you have a pmem-vs-vmem issue as in:
http://stackoverflow.com/questions/8017500/specifying-memory-limits-with-hadoop
john
From: praveenesh kumar [mailto:praveenesh@gmail.com]
Sent: Tuesday, March 25, 2014 7:38 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Hadoop Takes 6GB Memory to run one mapper

Can you try storing your file as bytes instead of String. I can't think of any reason why this would require 6 GB heap space.
Can you explain your use-case that might help some alternatives if you are interested.
Regards
Prav

On Tue, Mar 25, 2014 at 7:31 AM, Nivrutti Shinde <ni...@gmail.com>> wrote:
Yes it is in setup method, Just I am reading the file which is stored at hdfs

On Tuesday, 25 March 2014 12:01:08 UTC+5:30, Praveenesh Kumar wrote:
And I am guessing you are not doing this inside map() method right, its in setup() method ?

On Tue, Mar 25, 2014 at 6:05 AM, Nivrutti Shinde <ni...@gmail.com>> wrote:
private Map<String,String> mapData = new ConcurrentHashMap<String, String>(11000000);
FileInputStream fis = new FileInputStream(file);
GZIPInputStream gzipInputStream = new GZIPInputStream(inputStream);
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(gzipInputStream));
String line = null;
while((line=bufferedReader.readLine())!=null)
{
String data[] = line.split("\t");
mapData.put(data[0],data[1]);
}

On Monday, 24 March 2014 19:17:12 UTC+5:30, Praveenesh Kumar wrote:
Can you please share your code snippet. Just want to see how are you loading your file into mapper ?
On Mon, Mar 24, 2014 at 1:15 PM, Nivrutti Shinde <ni...@gmail.com>> wrote:
Thanks For your reply,
Harsh,
I tried THashMap but land up in same issue.
David,
I tried map side join and cascading approach, but time taken by them is lot.

On Friday, 21 March 2014 12:03:28 UTC+5:30, Nivrutti Shinde wrote:
Hi,

I have use case where I am loading 200 MB file with 11 million records, (one record length is 12 ). Into map, so while running the hadoop job, i can quickly get value for the key from each input record in mapper.

Such a small file but to load the data into map, i have to allocate the 6 GB heap for the same. when i run small code to load this file on standalone application, it requires 2 GB memory.

I dont understand why hadoop required 6GB to load the data into memory. Hadoop Job Runs fine after that but number of mappers i can run is 2. I need to get it done this in 2-3 GB only so i can run ateast 8-9 mappers per node.

I have created gzip file(which is now only 17MB). I have kept the file on HDFS. Using HDFS API to read the file and loading the data into map. Block size is 128 MB. Cloudera Hadoop.

Any help or alternate approaches to load data into memory with minimum heap size. So i can run many mappers with 2-3 gb memory allocated to each.



RE: Hadoop Takes 6GB Memory to run one mapper

Posted by John Lilley <jo...@redpoint.net>.
This discussion may also be relevant to your question:
http://stackoverflow.com/questions/21005643/container-is-running-beyond-memory-limits
Do you actually need to specify that -Xmx6000m for java heap or could it be one of the other issues discussed?

John

From: John Lilley [mailto:john.lilley@redpoint.net]
Sent: Thursday, March 27, 2014 6:52 AM
To: user@hadoop.apache.org
Subject: RE: Hadoop Takes 6GB Memory to run one mapper

Could you have a pmem-vs-vmem issue as in:
http://stackoverflow.com/questions/8017500/specifying-memory-limits-with-hadoop
john
From: praveenesh kumar [mailto:praveenesh@gmail.com]
Sent: Tuesday, March 25, 2014 7:38 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Hadoop Takes 6GB Memory to run one mapper

Can you try storing your file as bytes instead of String. I can't think of any reason why this would require 6 GB heap space.
Can you explain your use-case that might help some alternatives if you are interested.
Regards
Prav

On Tue, Mar 25, 2014 at 7:31 AM, Nivrutti Shinde <ni...@gmail.com>> wrote:
Yes it is in setup method, Just I am reading the file which is stored at hdfs

On Tuesday, 25 March 2014 12:01:08 UTC+5:30, Praveenesh Kumar wrote:
And I am guessing you are not doing this inside map() method right, its in setup() method ?

On Tue, Mar 25, 2014 at 6:05 AM, Nivrutti Shinde <ni...@gmail.com>> wrote:
private Map<String,String> mapData = new ConcurrentHashMap<String, String>(11000000);
FileInputStream fis = new FileInputStream(file);
GZIPInputStream gzipInputStream = new GZIPInputStream(inputStream);
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(gzipInputStream));
String line = null;
while((line=bufferedReader.readLine())!=null)
{
String data[] = line.split("\t");
mapData.put(data[0],data[1]);
}

On Monday, 24 March 2014 19:17:12 UTC+5:30, Praveenesh Kumar wrote:
Can you please share your code snippet. Just want to see how are you loading your file into mapper ?
On Mon, Mar 24, 2014 at 1:15 PM, Nivrutti Shinde <ni...@gmail.com>> wrote:
Thanks For your reply,
Harsh,
I tried THashMap but land up in same issue.
David,
I tried map side join and cascading approach, but time taken by them is lot.

On Friday, 21 March 2014 12:03:28 UTC+5:30, Nivrutti Shinde wrote:
Hi,

I have use case where I am loading 200 MB file with 11 million records, (one record length is 12 ). Into map, so while running the hadoop job, i can quickly get value for the key from each input record in mapper.

Such a small file but to load the data into map, i have to allocate the 6 GB heap for the same. when i run small code to load this file on standalone application, it requires 2 GB memory.

I dont understand why hadoop required 6GB to load the data into memory. Hadoop Job Runs fine after that but number of mappers i can run is 2. I need to get it done this in 2-3 GB only so i can run ateast 8-9 mappers per node.

I have created gzip file(which is now only 17MB). I have kept the file on HDFS. Using HDFS API to read the file and loading the data into map. Block size is 128 MB. Cloudera Hadoop.

Any help or alternate approaches to load data into memory with minimum heap size. So i can run many mappers with 2-3 gb memory allocated to each.



RE: Hadoop Takes 6GB Memory to run one mapper

Posted by John Lilley <jo...@redpoint.net>.
This discussion may also be relevant to your question:
http://stackoverflow.com/questions/21005643/container-is-running-beyond-memory-limits
Do you actually need to specify that -Xmx6000m for java heap or could it be one of the other issues discussed?

John

From: John Lilley [mailto:john.lilley@redpoint.net]
Sent: Thursday, March 27, 2014 6:52 AM
To: user@hadoop.apache.org
Subject: RE: Hadoop Takes 6GB Memory to run one mapper

Could you have a pmem-vs-vmem issue as in:
http://stackoverflow.com/questions/8017500/specifying-memory-limits-with-hadoop
john
From: praveenesh kumar [mailto:praveenesh@gmail.com]
Sent: Tuesday, March 25, 2014 7:38 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Hadoop Takes 6GB Memory to run one mapper

Can you try storing your file as bytes instead of String. I can't think of any reason why this would require 6 GB heap space.
Can you explain your use-case that might help some alternatives if you are interested.
Regards
Prav

On Tue, Mar 25, 2014 at 7:31 AM, Nivrutti Shinde <ni...@gmail.com>> wrote:
Yes it is in setup method, Just I am reading the file which is stored at hdfs

On Tuesday, 25 March 2014 12:01:08 UTC+5:30, Praveenesh Kumar wrote:
And I am guessing you are not doing this inside map() method right, its in setup() method ?

On Tue, Mar 25, 2014 at 6:05 AM, Nivrutti Shinde <ni...@gmail.com>> wrote:
private Map<String,String> mapData = new ConcurrentHashMap<String, String>(11000000);
FileInputStream fis = new FileInputStream(file);
GZIPInputStream gzipInputStream = new GZIPInputStream(inputStream);
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(gzipInputStream));
String line = null;
while((line=bufferedReader.readLine())!=null)
{
String data[] = line.split("\t");
mapData.put(data[0],data[1]);
}

On Monday, 24 March 2014 19:17:12 UTC+5:30, Praveenesh Kumar wrote:
Can you please share your code snippet. Just want to see how are you loading your file into mapper ?
On Mon, Mar 24, 2014 at 1:15 PM, Nivrutti Shinde <ni...@gmail.com>> wrote:
Thanks For your reply,
Harsh,
I tried THashMap but land up in same issue.
David,
I tried map side join and cascading approach, but time taken by them is lot.

On Friday, 21 March 2014 12:03:28 UTC+5:30, Nivrutti Shinde wrote:
Hi,

I have use case where I am loading 200 MB file with 11 million records, (one record length is 12 ). Into map, so while running the hadoop job, i can quickly get value for the key from each input record in mapper.

Such a small file but to load the data into map, i have to allocate the 6 GB heap for the same. when i run small code to load this file on standalone application, it requires 2 GB memory.

I dont understand why hadoop required 6GB to load the data into memory. Hadoop Job Runs fine after that but number of mappers i can run is 2. I need to get it done this in 2-3 GB only so i can run ateast 8-9 mappers per node.

I have created gzip file(which is now only 17MB). I have kept the file on HDFS. Using HDFS API to read the file and loading the data into map. Block size is 128 MB. Cloudera Hadoop.

Any help or alternate approaches to load data into memory with minimum heap size. So i can run many mappers with 2-3 gb memory allocated to each.



RE: Hadoop Takes 6GB Memory to run one mapper

Posted by John Lilley <jo...@redpoint.net>.
This discussion may also be relevant to your question:
http://stackoverflow.com/questions/21005643/container-is-running-beyond-memory-limits
Do you actually need to specify that -Xmx6000m for java heap or could it be one of the other issues discussed?

John

From: John Lilley [mailto:john.lilley@redpoint.net]
Sent: Thursday, March 27, 2014 6:52 AM
To: user@hadoop.apache.org
Subject: RE: Hadoop Takes 6GB Memory to run one mapper

Could you have a pmem-vs-vmem issue as in:
http://stackoverflow.com/questions/8017500/specifying-memory-limits-with-hadoop
john
From: praveenesh kumar [mailto:praveenesh@gmail.com]
Sent: Tuesday, March 25, 2014 7:38 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Hadoop Takes 6GB Memory to run one mapper

Can you try storing your file as bytes instead of String. I can't think of any reason why this would require 6 GB heap space.
Can you explain your use-case that might help some alternatives if you are interested.
Regards
Prav

On Tue, Mar 25, 2014 at 7:31 AM, Nivrutti Shinde <ni...@gmail.com>> wrote:
Yes it is in setup method, Just I am reading the file which is stored at hdfs

On Tuesday, 25 March 2014 12:01:08 UTC+5:30, Praveenesh Kumar wrote:
And I am guessing you are not doing this inside map() method right, its in setup() method ?

On Tue, Mar 25, 2014 at 6:05 AM, Nivrutti Shinde <ni...@gmail.com>> wrote:
private Map<String,String> mapData = new ConcurrentHashMap<String, String>(11000000);
FileInputStream fis = new FileInputStream(file);
GZIPInputStream gzipInputStream = new GZIPInputStream(inputStream);
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(gzipInputStream));
String line = null;
while((line=bufferedReader.readLine())!=null)
{
String data[] = line.split("\t");
mapData.put(data[0],data[1]);
}

On Monday, 24 March 2014 19:17:12 UTC+5:30, Praveenesh Kumar wrote:
Can you please share your code snippet. Just want to see how are you loading your file into mapper ?
On Mon, Mar 24, 2014 at 1:15 PM, Nivrutti Shinde <ni...@gmail.com>> wrote:
Thanks For your reply,
Harsh,
I tried THashMap but land up in same issue.
David,
I tried map side join and cascading approach, but time taken by them is lot.

On Friday, 21 March 2014 12:03:28 UTC+5:30, Nivrutti Shinde wrote:
Hi,

I have use case where I am loading 200 MB file with 11 million records, (one record length is 12 ). Into map, so while running the hadoop job, i can quickly get value for the key from each input record in mapper.

Such a small file but to load the data into map, i have to allocate the 6 GB heap for the same. when i run small code to load this file on standalone application, it requires 2 GB memory.

I dont understand why hadoop required 6GB to load the data into memory. Hadoop Job Runs fine after that but number of mappers i can run is 2. I need to get it done this in 2-3 GB only so i can run ateast 8-9 mappers per node.

I have created gzip file(which is now only 17MB). I have kept the file on HDFS. Using HDFS API to read the file and loading the data into map. Block size is 128 MB. Cloudera Hadoop.

Any help or alternate approaches to load data into memory with minimum heap size. So i can run many mappers with 2-3 gb memory allocated to each.



RE: Hadoop Takes 6GB Memory to run one mapper

Posted by John Lilley <jo...@redpoint.net>.
Could you have a pmem-vs-vmem issue as in:
http://stackoverflow.com/questions/8017500/specifying-memory-limits-with-hadoop
john
From: praveenesh kumar [mailto:praveenesh@gmail.com]
Sent: Tuesday, March 25, 2014 7:38 AM
To: user@hadoop.apache.org
Subject: Re: Hadoop Takes 6GB Memory to run one mapper

Can you try storing your file as bytes instead of String. I can't think of any reason why this would require 6 GB heap space.
Can you explain your use-case that might help some alternatives if you are interested.
Regards
Prav

On Tue, Mar 25, 2014 at 7:31 AM, Nivrutti Shinde <ni...@gmail.com>> wrote:
Yes it is in setup method, Just I am reading the file which is stored at hdfs

On Tuesday, 25 March 2014 12:01:08 UTC+5:30, Praveenesh Kumar wrote:
And I am guessing you are not doing this inside map() method right, its in setup() method ?

On Tue, Mar 25, 2014 at 6:05 AM, Nivrutti Shinde <ni...@gmail.com>> wrote:
private Map<String,String> mapData = new ConcurrentHashMap<String, String>(11000000);
FileInputStream fis = new FileInputStream(file);
GZIPInputStream gzipInputStream = new GZIPInputStream(inputStream);
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(gzipInputStream));
String line = null;
while((line=bufferedReader.readLine())!=null)
{
String data[] = line.split("\t");
mapData.put(data[0],data[1]);
}

On Monday, 24 March 2014 19:17:12 UTC+5:30, Praveenesh Kumar wrote:
Can you please share your code snippet. Just want to see how are you loading your file into mapper ?
On Mon, Mar 24, 2014 at 1:15 PM, Nivrutti Shinde <ni...@gmail.com>> wrote:
Thanks For your reply,
Harsh,
I tried THashMap but land up in same issue.
David,
I tried map side join and cascading approach, but time taken by them is lot.

On Friday, 21 March 2014 12:03:28 UTC+5:30, Nivrutti Shinde wrote:
Hi,

I have use case where I am loading 200 MB file with 11 million records, (one record length is 12 ). Into map, so while running the hadoop job, i can quickly get value for the key from each input record in mapper.

Such a small file but to load the data into map, i have to allocate the 6 GB heap for the same. when i run small code to load this file on standalone application, it requires 2 GB memory.

I dont understand why hadoop required 6GB to load the data into memory. Hadoop Job Runs fine after that but number of mappers i can run is 2. I need to get it done this in 2-3 GB only so i can run ateast 8-9 mappers per node.

I have created gzip file(which is now only 17MB). I have kept the file on HDFS. Using HDFS API to read the file and loading the data into map. Block size is 128 MB. Cloudera Hadoop.

Any help or alternate approaches to load data into memory with minimum heap size. So i can run many mappers with 2-3 gb memory allocated to each.



RE: Hadoop Takes 6GB Memory to run one mapper

Posted by John Lilley <jo...@redpoint.net>.
Could you have a pmem-vs-vmem issue as in:
http://stackoverflow.com/questions/8017500/specifying-memory-limits-with-hadoop
john
From: praveenesh kumar [mailto:praveenesh@gmail.com]
Sent: Tuesday, March 25, 2014 7:38 AM
To: user@hadoop.apache.org
Subject: Re: Hadoop Takes 6GB Memory to run one mapper

Can you try storing your file as bytes instead of String. I can't think of any reason why this would require 6 GB heap space.
Can you explain your use-case that might help some alternatives if you are interested.
Regards
Prav

On Tue, Mar 25, 2014 at 7:31 AM, Nivrutti Shinde <ni...@gmail.com>> wrote:
Yes it is in setup method, Just I am reading the file which is stored at hdfs

On Tuesday, 25 March 2014 12:01:08 UTC+5:30, Praveenesh Kumar wrote:
And I am guessing you are not doing this inside map() method right, its in setup() method ?

On Tue, Mar 25, 2014 at 6:05 AM, Nivrutti Shinde <ni...@gmail.com>> wrote:
private Map<String,String> mapData = new ConcurrentHashMap<String, String>(11000000);
FileInputStream fis = new FileInputStream(file);
GZIPInputStream gzipInputStream = new GZIPInputStream(inputStream);
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(gzipInputStream));
String line = null;
while((line=bufferedReader.readLine())!=null)
{
String data[] = line.split("\t");
mapData.put(data[0],data[1]);
}

On Monday, 24 March 2014 19:17:12 UTC+5:30, Praveenesh Kumar wrote:
Can you please share your code snippet. Just want to see how are you loading your file into mapper ?
On Mon, Mar 24, 2014 at 1:15 PM, Nivrutti Shinde <ni...@gmail.com>> wrote:
Thanks For your reply,
Harsh,
I tried THashMap but land up in same issue.
David,
I tried map side join and cascading approach, but time taken by them is lot.

On Friday, 21 March 2014 12:03:28 UTC+5:30, Nivrutti Shinde wrote:
Hi,

I have use case where I am loading 200 MB file with 11 million records, (one record length is 12 ). Into map, so while running the hadoop job, i can quickly get value for the key from each input record in mapper.

Such a small file but to load the data into map, i have to allocate the 6 GB heap for the same. when i run small code to load this file on standalone application, it requires 2 GB memory.

I dont understand why hadoop required 6GB to load the data into memory. Hadoop Job Runs fine after that but number of mappers i can run is 2. I need to get it done this in 2-3 GB only so i can run ateast 8-9 mappers per node.

I have created gzip file(which is now only 17MB). I have kept the file on HDFS. Using HDFS API to read the file and loading the data into map. Block size is 128 MB. Cloudera Hadoop.

Any help or alternate approaches to load data into memory with minimum heap size. So i can run many mappers with 2-3 gb memory allocated to each.



RE: Hadoop Takes 6GB Memory to run one mapper

Posted by John Lilley <jo...@redpoint.net>.
Could you have a pmem-vs-vmem issue as in:
http://stackoverflow.com/questions/8017500/specifying-memory-limits-with-hadoop
john
From: praveenesh kumar [mailto:praveenesh@gmail.com]
Sent: Tuesday, March 25, 2014 7:38 AM
To: user@hadoop.apache.org
Subject: Re: Hadoop Takes 6GB Memory to run one mapper

Can you try storing your file as bytes instead of String. I can't think of any reason why this would require 6 GB heap space.
Can you explain your use-case that might help some alternatives if you are interested.
Regards
Prav

On Tue, Mar 25, 2014 at 7:31 AM, Nivrutti Shinde <ni...@gmail.com>> wrote:
Yes it is in setup method, Just I am reading the file which is stored at hdfs

On Tuesday, 25 March 2014 12:01:08 UTC+5:30, Praveenesh Kumar wrote:
And I am guessing you are not doing this inside map() method right, its in setup() method ?

On Tue, Mar 25, 2014 at 6:05 AM, Nivrutti Shinde <ni...@gmail.com>> wrote:
private Map<String,String> mapData = new ConcurrentHashMap<String, String>(11000000);
FileInputStream fis = new FileInputStream(file);
GZIPInputStream gzipInputStream = new GZIPInputStream(inputStream);
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(gzipInputStream));
String line = null;
while((line=bufferedReader.readLine())!=null)
{
String data[] = line.split("\t");
mapData.put(data[0],data[1]);
}

On Monday, 24 March 2014 19:17:12 UTC+5:30, Praveenesh Kumar wrote:
Can you please share your code snippet. Just want to see how are you loading your file into mapper ?
On Mon, Mar 24, 2014 at 1:15 PM, Nivrutti Shinde <ni...@gmail.com>> wrote:
Thanks For your reply,
Harsh,
I tried THashMap but land up in same issue.
David,
I tried map side join and cascading approach, but time taken by them is lot.

On Friday, 21 March 2014 12:03:28 UTC+5:30, Nivrutti Shinde wrote:
Hi,

I have use case where I am loading 200 MB file with 11 million records, (one record length is 12 ). Into map, so while running the hadoop job, i can quickly get value for the key from each input record in mapper.

Such a small file but to load the data into map, i have to allocate the 6 GB heap for the same. when i run small code to load this file on standalone application, it requires 2 GB memory.

I dont understand why hadoop required 6GB to load the data into memory. Hadoop Job Runs fine after that but number of mappers i can run is 2. I need to get it done this in 2-3 GB only so i can run ateast 8-9 mappers per node.

I have created gzip file(which is now only 17MB). I have kept the file on HDFS. Using HDFS API to read the file and loading the data into map. Block size is 128 MB. Cloudera Hadoop.

Any help or alternate approaches to load data into memory with minimum heap size. So i can run many mappers with 2-3 gb memory allocated to each.



RE: Hadoop Takes 6GB Memory to run one mapper

Posted by John Lilley <jo...@redpoint.net>.
Could you have a pmem-vs-vmem issue as in:
http://stackoverflow.com/questions/8017500/specifying-memory-limits-with-hadoop
john
From: praveenesh kumar [mailto:praveenesh@gmail.com]
Sent: Tuesday, March 25, 2014 7:38 AM
To: user@hadoop.apache.org
Subject: Re: Hadoop Takes 6GB Memory to run one mapper

Can you try storing your file as bytes instead of String. I can't think of any reason why this would require 6 GB heap space.
Can you explain your use-case that might help some alternatives if you are interested.
Regards
Prav

On Tue, Mar 25, 2014 at 7:31 AM, Nivrutti Shinde <ni...@gmail.com>> wrote:
Yes it is in setup method, Just I am reading the file which is stored at hdfs

On Tuesday, 25 March 2014 12:01:08 UTC+5:30, Praveenesh Kumar wrote:
And I am guessing you are not doing this inside map() method right, its in setup() method ?

On Tue, Mar 25, 2014 at 6:05 AM, Nivrutti Shinde <ni...@gmail.com>> wrote:
private Map<String,String> mapData = new ConcurrentHashMap<String, String>(11000000);
FileInputStream fis = new FileInputStream(file);
GZIPInputStream gzipInputStream = new GZIPInputStream(inputStream);
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(gzipInputStream));
String line = null;
while((line=bufferedReader.readLine())!=null)
{
String data[] = line.split("\t");
mapData.put(data[0],data[1]);
}

On Monday, 24 March 2014 19:17:12 UTC+5:30, Praveenesh Kumar wrote:
Can you please share your code snippet. Just want to see how are you loading your file into mapper ?
On Mon, Mar 24, 2014 at 1:15 PM, Nivrutti Shinde <ni...@gmail.com>> wrote:
Thanks For your reply,
Harsh,
I tried THashMap but land up in same issue.
David,
I tried map side join and cascading approach, but time taken by them is lot.

On Friday, 21 March 2014 12:03:28 UTC+5:30, Nivrutti Shinde wrote:
Hi,

I have use case where I am loading 200 MB file with 11 million records, (one record length is 12 ). Into map, so while running the hadoop job, i can quickly get value for the key from each input record in mapper.

Such a small file but to load the data into map, i have to allocate the 6 GB heap for the same. when i run small code to load this file on standalone application, it requires 2 GB memory.

I dont understand why hadoop required 6GB to load the data into memory. Hadoop Job Runs fine after that but number of mappers i can run is 2. I need to get it done this in 2-3 GB only so i can run ateast 8-9 mappers per node.

I have created gzip file(which is now only 17MB). I have kept the file on HDFS. Using HDFS API to read the file and loading the data into map. Block size is 128 MB. Cloudera Hadoop.

Any help or alternate approaches to load data into memory with minimum heap size. So i can run many mappers with 2-3 gb memory allocated to each.