You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Mark <st...@gmail.com> on 2010/11/11 17:23:34 UTC
Java heap space error on PFPGrowth
I am trying to run PFPGrowth but I keep receiving this Java heap space
error at the end of the first step/beginning of second step.
I am using the following parameters: .... -method mapreduce -regex [\\t]
-s 5 -g 55000
Output:
......
10/11/11 08:12:56 INFO mapred.JobClient: map 100% reduce 85%
10/11/11 08:12:59 INFO mapred.JobClient: map 100% reduce 90%
10/11/11 08:13:02 INFO mapred.JobClient: map 100% reduce 94%
10/11/11 08:13:09 INFO mapred.JobClient: map 100% reduce 100%
10/11/11 08:13:11 INFO mapred.JobClient: Job complete: job_201011101701_0005
10/11/11 08:13:11 INFO mapred.JobClient: Counters: 17
10/11/11 08:13:11 INFO mapred.JobClient: Job Counters
10/11/11 08:13:11 INFO mapred.JobClient: Launched reduce tasks=1
10/11/11 08:13:11 INFO mapred.JobClient: Launched map tasks=8
10/11/11 08:13:11 INFO mapred.JobClient: Data-local map tasks=8
10/11/11 08:13:11 INFO mapred.JobClient: FileSystemCounters
10/11/11 08:13:11 INFO mapred.JobClient: FILE_BYTES_READ=146083205
10/11/11 08:13:11 INFO mapred.JobClient: HDFS_BYTES_READ=411751517
10/11/11 08:13:11 INFO mapred.JobClient: FILE_BYTES_WRITTEN=177276794
10/11/11 08:13:11 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=82352630
10/11/11 08:13:11 INFO mapred.JobClient: Map-Reduce Framework
10/11/11 08:13:11 INFO mapred.JobClient: Reduce input groups=3146378
10/11/11 08:13:11 INFO mapred.JobClient: Combine output records=30759042
10/11/11 08:13:11 INFO mapred.JobClient: Map input records=6049220
10/11/11 08:13:11 INFO mapred.JobClient: Reduce shuffle bytes=26239336
10/11/11 08:13:11 INFO mapred.JobClient: Reduce output records=3146378
10/11/11 08:13:11 INFO mapred.JobClient: Spilled Records=54248354
10/11/11 08:13:11 INFO mapred.JobClient: Map output bytes=743485927
10/11/11 08:13:11 INFO mapred.JobClient: Combine input records=63744687
10/11/11 08:13:11 INFO mapred.JobClient: Map output records=41469874
10/11/11 08:13:11 INFO mapred.JobClient: Reduce input records=8484229
10/11/11 08:13:26 INFO pfpgrowth.PFPGrowth: No of Features: 1087215
10/11/11 08:13:40 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
10/11/11 08:13:40 INFO input.FileInputFormat: Total input paths to
process : 1
10/11/11 08:13:44 INFO mapred.JobClient: Running job: job_201011101701_0006
10/11/11 08:13:45 INFO mapred.JobClient: map 0% reduce 0%
10/11/11 08:14:16 INFO mapred.JobClient: Task Id :
attempt_201011101701_0006_m_000000_0, Status : FAILED
Error: Java heap space
....
Is there anything I can do to alleviate this problem?
FYI: I running a 4-node cluster with 12GB of ram in each machine.
Thanks
RE: Java heap space error on PFPGrowth
Posted by pr...@nokia.com.
Glad that it worked for you.
I don't think that matters much since that is not used for the actual job execution (correct me if I am wrong). I have it as 2G but that's because I have plenty of memory on my system.
Praveen
-----Original Message-----
From: ext Mark [mailto:static.void.dev@gmail.com]
Sent: Thursday, November 11, 2010 12:10 PM
To: user@mahout.apache.org
Subject: Re: Java heap space error on PFPGrowth
That did it. Thanks.
What do you have set for your HADOOP_HEAPSIZE in hadoop-env.sh?
On 11/11/10 8:28 AM, praveen.peddi@nokia.com wrote:
> Hi Mark,
> I got into the same error and figured that I needed to add following hadoop param in mapred-site.xml in hadoop 0.20.2. You can try with lesser memory than 4GB.
>
> <property>
> <name>mapred.child.java.opts</name>
> <value>-Xmx4096m</value>
> <description>map heap size for child task</description>
> </property>
>
> Hope this solves your issue.
>
> Praveen
>
> -----Original Message-----
> From: ext Mark [mailto:static.void.dev@gmail.com]
> Sent: Thursday, November 11, 2010 11:24 AM
> To: common-user@hadoop.apache.org; user@mahout.apache.org
> Subject: Java heap space error on PFPGrowth
>
> I am trying to run PFPGrowth but I keep receiving this Java heap space error at the end of the first step/beginning of second step.
>
> I am using the following parameters: .... -method mapreduce -regex
> [\\t] -s 5 -g 55000
>
> Output:
>
> ......
> 10/11/11 08:12:56 INFO mapred.JobClient: map 100% reduce 85%
> 10/11/11 08:12:59 INFO mapred.JobClient: map 100% reduce 90%
> 10/11/11 08:13:02 INFO mapred.JobClient: map 100% reduce 94%
> 10/11/11 08:13:09 INFO mapred.JobClient: map 100% reduce 100%
> 10/11/11 08:13:11 INFO mapred.JobClient: Job complete:
> job_201011101701_0005
> 10/11/11 08:13:11 INFO mapred.JobClient: Counters: 17
> 10/11/11 08:13:11 INFO mapred.JobClient: Job Counters
> 10/11/11 08:13:11 INFO mapred.JobClient: Launched reduce tasks=1
> 10/11/11 08:13:11 INFO mapred.JobClient: Launched map tasks=8
> 10/11/11 08:13:11 INFO mapred.JobClient: Data-local map tasks=8
> 10/11/11 08:13:11 INFO mapred.JobClient: FileSystemCounters
> 10/11/11 08:13:11 INFO mapred.JobClient: FILE_BYTES_READ=146083205
> 10/11/11 08:13:11 INFO mapred.JobClient: HDFS_BYTES_READ=411751517
> 10/11/11 08:13:11 INFO mapred.JobClient: FILE_BYTES_WRITTEN=177276794
> 10/11/11 08:13:11 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=82352630
> 10/11/11 08:13:11 INFO mapred.JobClient: Map-Reduce Framework
> 10/11/11 08:13:11 INFO mapred.JobClient: Reduce input groups=3146378
> 10/11/11 08:13:11 INFO mapred.JobClient: Combine output records=30759042
> 10/11/11 08:13:11 INFO mapred.JobClient: Map input records=6049220
> 10/11/11 08:13:11 INFO mapred.JobClient: Reduce shuffle bytes=26239336
> 10/11/11 08:13:11 INFO mapred.JobClient: Reduce output records=3146378
> 10/11/11 08:13:11 INFO mapred.JobClient: Spilled Records=54248354
> 10/11/11 08:13:11 INFO mapred.JobClient: Map output bytes=743485927
> 10/11/11 08:13:11 INFO mapred.JobClient: Combine input records=63744687
> 10/11/11 08:13:11 INFO mapred.JobClient: Map output records=41469874
> 10/11/11 08:13:11 INFO mapred.JobClient: Reduce input records=8484229
> 10/11/11 08:13:26 INFO pfpgrowth.PFPGrowth: No of Features: 1087215
> 10/11/11 08:13:40 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
> 10/11/11 08:13:40 INFO input.FileInputFormat: Total input paths to
> process : 1
> 10/11/11 08:13:44 INFO mapred.JobClient: Running job:
> job_201011101701_0006
> 10/11/11 08:13:45 INFO mapred.JobClient: map 0% reduce 0%
> 10/11/11 08:14:16 INFO mapred.JobClient: Task Id :
> attempt_201011101701_0006_m_000000_0, Status : FAILED
> Error: Java heap space
> ....
>
> Is there anything I can do to alleviate this problem?
>
> FYI: I running a 4-node cluster with 12GB of ram in each machine.
>
> Thanks
Re: Java heap space error on PFPGrowth
Posted by Mark <st...@gmail.com>.
That did it. Thanks.
What do you have set for your HADOOP_HEAPSIZE in hadoop-env.sh?
On 11/11/10 8:28 AM, praveen.peddi@nokia.com wrote:
> Hi Mark,
> I got into the same error and figured that I needed to add following hadoop param in mapred-site.xml in hadoop 0.20.2. You can try with lesser memory than 4GB.
>
> <property>
> <name>mapred.child.java.opts</name>
> <value>-Xmx4096m</value>
> <description>map heap size for child task</description>
> </property>
>
> Hope this solves your issue.
>
> Praveen
>
> -----Original Message-----
> From: ext Mark [mailto:static.void.dev@gmail.com]
> Sent: Thursday, November 11, 2010 11:24 AM
> To: common-user@hadoop.apache.org; user@mahout.apache.org
> Subject: Java heap space error on PFPGrowth
>
> I am trying to run PFPGrowth but I keep receiving this Java heap space error at the end of the first step/beginning of second step.
>
> I am using the following parameters: .... -method mapreduce -regex [\\t] -s 5 -g 55000
>
> Output:
>
> ......
> 10/11/11 08:12:56 INFO mapred.JobClient: map 100% reduce 85%
> 10/11/11 08:12:59 INFO mapred.JobClient: map 100% reduce 90%
> 10/11/11 08:13:02 INFO mapred.JobClient: map 100% reduce 94%
> 10/11/11 08:13:09 INFO mapred.JobClient: map 100% reduce 100%
> 10/11/11 08:13:11 INFO mapred.JobClient: Job complete: job_201011101701_0005
> 10/11/11 08:13:11 INFO mapred.JobClient: Counters: 17
> 10/11/11 08:13:11 INFO mapred.JobClient: Job Counters
> 10/11/11 08:13:11 INFO mapred.JobClient: Launched reduce tasks=1
> 10/11/11 08:13:11 INFO mapred.JobClient: Launched map tasks=8
> 10/11/11 08:13:11 INFO mapred.JobClient: Data-local map tasks=8
> 10/11/11 08:13:11 INFO mapred.JobClient: FileSystemCounters
> 10/11/11 08:13:11 INFO mapred.JobClient: FILE_BYTES_READ=146083205
> 10/11/11 08:13:11 INFO mapred.JobClient: HDFS_BYTES_READ=411751517
> 10/11/11 08:13:11 INFO mapred.JobClient: FILE_BYTES_WRITTEN=177276794
> 10/11/11 08:13:11 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=82352630
> 10/11/11 08:13:11 INFO mapred.JobClient: Map-Reduce Framework
> 10/11/11 08:13:11 INFO mapred.JobClient: Reduce input groups=3146378
> 10/11/11 08:13:11 INFO mapred.JobClient: Combine output records=30759042
> 10/11/11 08:13:11 INFO mapred.JobClient: Map input records=6049220
> 10/11/11 08:13:11 INFO mapred.JobClient: Reduce shuffle bytes=26239336
> 10/11/11 08:13:11 INFO mapred.JobClient: Reduce output records=3146378
> 10/11/11 08:13:11 INFO mapred.JobClient: Spilled Records=54248354
> 10/11/11 08:13:11 INFO mapred.JobClient: Map output bytes=743485927
> 10/11/11 08:13:11 INFO mapred.JobClient: Combine input records=63744687
> 10/11/11 08:13:11 INFO mapred.JobClient: Map output records=41469874
> 10/11/11 08:13:11 INFO mapred.JobClient: Reduce input records=8484229
> 10/11/11 08:13:26 INFO pfpgrowth.PFPGrowth: No of Features: 1087215
> 10/11/11 08:13:40 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
> 10/11/11 08:13:40 INFO input.FileInputFormat: Total input paths to process : 1
> 10/11/11 08:13:44 INFO mapred.JobClient: Running job: job_201011101701_0006
> 10/11/11 08:13:45 INFO mapred.JobClient: map 0% reduce 0%
> 10/11/11 08:14:16 INFO mapred.JobClient: Task Id :
> attempt_201011101701_0006_m_000000_0, Status : FAILED
> Error: Java heap space
> ....
>
> Is there anything I can do to alleviate this problem?
>
> FYI: I running a 4-node cluster with 12GB of ram in each machine.
>
> Thanks
RE: Java heap space error on PFPGrowth
Posted by pr...@nokia.com.
Hi Mark,
I got into the same error and figured that I needed to add following hadoop param in mapred-site.xml in hadoop 0.20.2. You can try with lesser memory than 4GB.
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx4096m</value>
<description>map heap size for child task</description>
</property>
Hope this solves your issue.
Praveen
-----Original Message-----
From: ext Mark [mailto:static.void.dev@gmail.com]
Sent: Thursday, November 11, 2010 11:24 AM
To: common-user@hadoop.apache.org; user@mahout.apache.org
Subject: Java heap space error on PFPGrowth
I am trying to run PFPGrowth but I keep receiving this Java heap space error at the end of the first step/beginning of second step.
I am using the following parameters: .... -method mapreduce -regex [\\t] -s 5 -g 55000
Output:
......
10/11/11 08:12:56 INFO mapred.JobClient: map 100% reduce 85%
10/11/11 08:12:59 INFO mapred.JobClient: map 100% reduce 90%
10/11/11 08:13:02 INFO mapred.JobClient: map 100% reduce 94%
10/11/11 08:13:09 INFO mapred.JobClient: map 100% reduce 100%
10/11/11 08:13:11 INFO mapred.JobClient: Job complete: job_201011101701_0005
10/11/11 08:13:11 INFO mapred.JobClient: Counters: 17
10/11/11 08:13:11 INFO mapred.JobClient: Job Counters
10/11/11 08:13:11 INFO mapred.JobClient: Launched reduce tasks=1
10/11/11 08:13:11 INFO mapred.JobClient: Launched map tasks=8
10/11/11 08:13:11 INFO mapred.JobClient: Data-local map tasks=8
10/11/11 08:13:11 INFO mapred.JobClient: FileSystemCounters
10/11/11 08:13:11 INFO mapred.JobClient: FILE_BYTES_READ=146083205
10/11/11 08:13:11 INFO mapred.JobClient: HDFS_BYTES_READ=411751517
10/11/11 08:13:11 INFO mapred.JobClient: FILE_BYTES_WRITTEN=177276794
10/11/11 08:13:11 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=82352630
10/11/11 08:13:11 INFO mapred.JobClient: Map-Reduce Framework
10/11/11 08:13:11 INFO mapred.JobClient: Reduce input groups=3146378
10/11/11 08:13:11 INFO mapred.JobClient: Combine output records=30759042
10/11/11 08:13:11 INFO mapred.JobClient: Map input records=6049220
10/11/11 08:13:11 INFO mapred.JobClient: Reduce shuffle bytes=26239336
10/11/11 08:13:11 INFO mapred.JobClient: Reduce output records=3146378
10/11/11 08:13:11 INFO mapred.JobClient: Spilled Records=54248354
10/11/11 08:13:11 INFO mapred.JobClient: Map output bytes=743485927
10/11/11 08:13:11 INFO mapred.JobClient: Combine input records=63744687
10/11/11 08:13:11 INFO mapred.JobClient: Map output records=41469874
10/11/11 08:13:11 INFO mapred.JobClient: Reduce input records=8484229
10/11/11 08:13:26 INFO pfpgrowth.PFPGrowth: No of Features: 1087215
10/11/11 08:13:40 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
10/11/11 08:13:40 INFO input.FileInputFormat: Total input paths to process : 1
10/11/11 08:13:44 INFO mapred.JobClient: Running job: job_201011101701_0006
10/11/11 08:13:45 INFO mapred.JobClient: map 0% reduce 0%
10/11/11 08:14:16 INFO mapred.JobClient: Task Id :
attempt_201011101701_0006_m_000000_0, Status : FAILED
Error: Java heap space
....
Is there anything I can do to alleviate this problem?
FYI: I running a 4-node cluster with 12GB of ram in each machine.
Thanks
RE: Java heap space error on PFPGrowth
Posted by pr...@nokia.com.
Hi Mark,
I got into the same error and figured that I needed to add following hadoop param in mapred-site.xml in hadoop 0.20.2. You can try with lesser memory than 4GB.
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx4096m</value>
<description>map heap size for child task</description>
</property>
Hope this solves your issue.
Praveen
-----Original Message-----
From: ext Mark [mailto:static.void.dev@gmail.com]
Sent: Thursday, November 11, 2010 11:24 AM
To: common-user@hadoop.apache.org; user@mahout.apache.org
Subject: Java heap space error on PFPGrowth
I am trying to run PFPGrowth but I keep receiving this Java heap space error at the end of the first step/beginning of second step.
I am using the following parameters: .... -method mapreduce -regex [\\t] -s 5 -g 55000
Output:
......
10/11/11 08:12:56 INFO mapred.JobClient: map 100% reduce 85%
10/11/11 08:12:59 INFO mapred.JobClient: map 100% reduce 90%
10/11/11 08:13:02 INFO mapred.JobClient: map 100% reduce 94%
10/11/11 08:13:09 INFO mapred.JobClient: map 100% reduce 100%
10/11/11 08:13:11 INFO mapred.JobClient: Job complete: job_201011101701_0005
10/11/11 08:13:11 INFO mapred.JobClient: Counters: 17
10/11/11 08:13:11 INFO mapred.JobClient: Job Counters
10/11/11 08:13:11 INFO mapred.JobClient: Launched reduce tasks=1
10/11/11 08:13:11 INFO mapred.JobClient: Launched map tasks=8
10/11/11 08:13:11 INFO mapred.JobClient: Data-local map tasks=8
10/11/11 08:13:11 INFO mapred.JobClient: FileSystemCounters
10/11/11 08:13:11 INFO mapred.JobClient: FILE_BYTES_READ=146083205
10/11/11 08:13:11 INFO mapred.JobClient: HDFS_BYTES_READ=411751517
10/11/11 08:13:11 INFO mapred.JobClient: FILE_BYTES_WRITTEN=177276794
10/11/11 08:13:11 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=82352630
10/11/11 08:13:11 INFO mapred.JobClient: Map-Reduce Framework
10/11/11 08:13:11 INFO mapred.JobClient: Reduce input groups=3146378
10/11/11 08:13:11 INFO mapred.JobClient: Combine output records=30759042
10/11/11 08:13:11 INFO mapred.JobClient: Map input records=6049220
10/11/11 08:13:11 INFO mapred.JobClient: Reduce shuffle bytes=26239336
10/11/11 08:13:11 INFO mapred.JobClient: Reduce output records=3146378
10/11/11 08:13:11 INFO mapred.JobClient: Spilled Records=54248354
10/11/11 08:13:11 INFO mapred.JobClient: Map output bytes=743485927
10/11/11 08:13:11 INFO mapred.JobClient: Combine input records=63744687
10/11/11 08:13:11 INFO mapred.JobClient: Map output records=41469874
10/11/11 08:13:11 INFO mapred.JobClient: Reduce input records=8484229
10/11/11 08:13:26 INFO pfpgrowth.PFPGrowth: No of Features: 1087215
10/11/11 08:13:40 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
10/11/11 08:13:40 INFO input.FileInputFormat: Total input paths to process : 1
10/11/11 08:13:44 INFO mapred.JobClient: Running job: job_201011101701_0006
10/11/11 08:13:45 INFO mapred.JobClient: map 0% reduce 0%
10/11/11 08:14:16 INFO mapred.JobClient: Task Id :
attempt_201011101701_0006_m_000000_0, Status : FAILED
Error: Java heap space
....
Is there anything I can do to alleviate this problem?
FYI: I running a 4-node cluster with 12GB of ram in each machine.
Thanks