You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Vipul Pandey (JIRA)" <ji...@apache.org> on 2011/03/22 21:02:05 UTC
[jira] [Created] (MAHOUT-632) PFPGrowth : Exceeded max jobconf
size
PFPGrowth : Exceeded max jobconf size
--------------------------------------
Key: MAHOUT-632
URL: https://issues.apache.org/jira/browse/MAHOUT-632
Project: Mahout
Issue Type: Bug
Components: Frequent Itemset/Association Rule Mining
Affects Versions: 0.4
Reporter: Vipul Pandey
I'm getting this error right after startParallelCounting finishes :
11/03/21 19:06:40 INFO mapred.JobClient: Map output records=164272900
11/03/21 19:06:40 INFO mapred.JobClient: SPLIT_RAW_BYTES=2860
11/03/21 19:06:40 INFO mapred.JobClient: Reduce input records=67087840
11/03/21 19:07:02 INFO pfpgrowth.PFPGrowth: No of Features: 1788471
11/03/21 19:07:09 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
11/03/21 19:07:12 INFO input.FileInputFormat: Total input paths to process :
20
11/03/21 19:07:17 INFO mapred.JobClient: Cleaning up the staging area
hdfs://nccc001:54310/mnt/analytics/data/hadoop/tmp/mapred/staging/isapps/.staging/job_201103101218_0287
Exception in thread "main" org.apache.hadoop.ipc.RemoteException:
java.io.IOException: java.io.IOException: Exceeded max jobconf size:
72276915 limit: 52428800
at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3759)
at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1416)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1412)
Quoting Robin : "I guess we just hit the limit of storing flist in the conf. Moving it do the distributed cache should fix this."
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (MAHOUT-632) PFPGrowth : Exceeded max jobconf
size
Posted by "Robin Anil (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robin Anil reassigned MAHOUT-632:
---------------------------------
Assignee: Robin Anil
> PFPGrowth : Exceeded max jobconf size
> --------------------------------------
>
> Key: MAHOUT-632
> URL: https://issues.apache.org/jira/browse/MAHOUT-632
> Project: Mahout
> Issue Type: Bug
> Components: Frequent Itemset/Association Rule Mining
> Affects Versions: 0.4
> Reporter: Vipul Pandey
> Assignee: Robin Anil
>
> I'm getting this error right after startParallelCounting finishes :
> 11/03/21 19:06:40 INFO mapred.JobClient: Map output records=164272900
> 11/03/21 19:06:40 INFO mapred.JobClient: SPLIT_RAW_BYTES=2860
> 11/03/21 19:06:40 INFO mapred.JobClient: Reduce input records=67087840
> 11/03/21 19:07:02 INFO pfpgrowth.PFPGrowth: No of Features: 1788471
> 11/03/21 19:07:09 WARN mapred.JobClient: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the same.
> 11/03/21 19:07:12 INFO input.FileInputFormat: Total input paths to process :
> 20
> 11/03/21 19:07:17 INFO mapred.JobClient: Cleaning up the staging area
> hdfs://nccc001:54310/mnt/analytics/data/hadoop/tmp/mapred/staging/isapps/.staging/job_201103101218_0287
> Exception in thread "main" org.apache.hadoop.ipc.RemoteException:
> java.io.IOException: java.io.IOException: Exceeded max jobconf size:
> 72276915 limit: 52428800
> at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3759)
> at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1416)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1412)
> Quoting Robin : "I guess we just hit the limit of storing flist in the conf. Moving it do the distributed cache should fix this."
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-632) PFPGrowth : Exceeded max jobconf
size
Posted by "Sean Owen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen updated MAHOUT-632:
-----------------------------
Resolution: Fixed
Status: Resolved (was: Patch Available)
I updated Robin's patch after applying. Looks reasonable and tests pass. Committed.
> PFPGrowth : Exceeded max jobconf size
> --------------------------------------
>
> Key: MAHOUT-632
> URL: https://issues.apache.org/jira/browse/MAHOUT-632
> Project: Mahout
> Issue Type: Bug
> Components: Frequent Itemset/Association Rule Mining
> Affects Versions: 0.4, 0.5
> Reporter: Vipul Pandey
> Assignee: Robin Anil
> Fix For: 0.6
>
> Attachments: MAHOUT-632.patch
>
>
> I'm getting this error right after startParallelCounting finishes :
> 11/03/21 19:06:40 INFO mapred.JobClient: Map output records=164272900
> 11/03/21 19:06:40 INFO mapred.JobClient: SPLIT_RAW_BYTES=2860
> 11/03/21 19:06:40 INFO mapred.JobClient: Reduce input records=67087840
> 11/03/21 19:07:02 INFO pfpgrowth.PFPGrowth: No of Features: 1788471
> 11/03/21 19:07:09 WARN mapred.JobClient: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the same.
> 11/03/21 19:07:12 INFO input.FileInputFormat: Total input paths to process :
> 20
> 11/03/21 19:07:17 INFO mapred.JobClient: Cleaning up the staging area
> hdfs://nccc001:54310/mnt/analytics/data/hadoop/tmp/mapred/staging/isapps/.staging/job_201103101218_0287
> Exception in thread "main" org.apache.hadoop.ipc.RemoteException:
> java.io.IOException: java.io.IOException: Exceeded max jobconf size:
> 72276915 limit: 52428800
> at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3759)
> at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1416)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1412)
> Quoting Robin : "I guess we just hit the limit of storing flist in the conf. Moving it do the distributed cache should fix this."
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-632) PFPGrowth : Exceeded max jobconf
size
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053863#comment-13053863 ]
Hudson commented on MAHOUT-632:
-------------------------------
Integrated in Mahout-Quality #900 (See [https://builds.apache.org/job/Mahout-Quality/900/])
> PFPGrowth : Exceeded max jobconf size
> --------------------------------------
>
> Key: MAHOUT-632
> URL: https://issues.apache.org/jira/browse/MAHOUT-632
> Project: Mahout
> Issue Type: Bug
> Components: Frequent Itemset/Association Rule Mining
> Affects Versions: 0.4, 0.5
> Reporter: Vipul Pandey
> Assignee: Robin Anil
> Fix For: 0.6
>
> Attachments: MAHOUT-632.patch
>
>
> I'm getting this error right after startParallelCounting finishes :
> 11/03/21 19:06:40 INFO mapred.JobClient: Map output records=164272900
> 11/03/21 19:06:40 INFO mapred.JobClient: SPLIT_RAW_BYTES=2860
> 11/03/21 19:06:40 INFO mapred.JobClient: Reduce input records=67087840
> 11/03/21 19:07:02 INFO pfpgrowth.PFPGrowth: No of Features: 1788471
> 11/03/21 19:07:09 WARN mapred.JobClient: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the same.
> 11/03/21 19:07:12 INFO input.FileInputFormat: Total input paths to process :
> 20
> 11/03/21 19:07:17 INFO mapred.JobClient: Cleaning up the staging area
> hdfs://nccc001:54310/mnt/analytics/data/hadoop/tmp/mapred/staging/isapps/.staging/job_201103101218_0287
> Exception in thread "main" org.apache.hadoop.ipc.RemoteException:
> java.io.IOException: java.io.IOException: Exceeded max jobconf size:
> 72276915 limit: 52428800
> at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3759)
> at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1416)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1412)
> Quoting Robin : "I guess we just hit the limit of storing flist in the conf. Moving it do the distributed cache should fix this."
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-632) PFPGrowth : Exceeded max jobconf
size
Posted by "Robin Anil (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robin Anil updated MAHOUT-632:
------------------------------
Attachment: MAHOUT-632.patch
Attaching patch. This will read and write frequency list to Distributed Cache.
> PFPGrowth : Exceeded max jobconf size
> --------------------------------------
>
> Key: MAHOUT-632
> URL: https://issues.apache.org/jira/browse/MAHOUT-632
> Project: Mahout
> Issue Type: Bug
> Components: Frequent Itemset/Association Rule Mining
> Affects Versions: 0.4
> Reporter: Vipul Pandey
> Assignee: Robin Anil
> Attachments: MAHOUT-632.patch
>
>
> I'm getting this error right after startParallelCounting finishes :
> 11/03/21 19:06:40 INFO mapred.JobClient: Map output records=164272900
> 11/03/21 19:06:40 INFO mapred.JobClient: SPLIT_RAW_BYTES=2860
> 11/03/21 19:06:40 INFO mapred.JobClient: Reduce input records=67087840
> 11/03/21 19:07:02 INFO pfpgrowth.PFPGrowth: No of Features: 1788471
> 11/03/21 19:07:09 WARN mapred.JobClient: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the same.
> 11/03/21 19:07:12 INFO input.FileInputFormat: Total input paths to process :
> 20
> 11/03/21 19:07:17 INFO mapred.JobClient: Cleaning up the staging area
> hdfs://nccc001:54310/mnt/analytics/data/hadoop/tmp/mapred/staging/isapps/.staging/job_201103101218_0287
> Exception in thread "main" org.apache.hadoop.ipc.RemoteException:
> java.io.IOException: java.io.IOException: Exceeded max jobconf size:
> 72276915 limit: 52428800
> at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3759)
> at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1416)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1412)
> Quoting Robin : "I guess we just hit the limit of storing flist in the conf. Moving it do the distributed cache should fix this."
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-632) PFPGrowth : Exceeded max jobconf
size
Posted by "Sean Owen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen updated MAHOUT-632:
-----------------------------
Affects Version/s: 0.5
Fix Version/s: 0.6
> PFPGrowth : Exceeded max jobconf size
> --------------------------------------
>
> Key: MAHOUT-632
> URL: https://issues.apache.org/jira/browse/MAHOUT-632
> Project: Mahout
> Issue Type: Bug
> Components: Frequent Itemset/Association Rule Mining
> Affects Versions: 0.4, 0.5
> Reporter: Vipul Pandey
> Assignee: Robin Anil
> Fix For: 0.6
>
> Attachments: MAHOUT-632.patch
>
>
> I'm getting this error right after startParallelCounting finishes :
> 11/03/21 19:06:40 INFO mapred.JobClient: Map output records=164272900
> 11/03/21 19:06:40 INFO mapred.JobClient: SPLIT_RAW_BYTES=2860
> 11/03/21 19:06:40 INFO mapred.JobClient: Reduce input records=67087840
> 11/03/21 19:07:02 INFO pfpgrowth.PFPGrowth: No of Features: 1788471
> 11/03/21 19:07:09 WARN mapred.JobClient: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the same.
> 11/03/21 19:07:12 INFO input.FileInputFormat: Total input paths to process :
> 20
> 11/03/21 19:07:17 INFO mapred.JobClient: Cleaning up the staging area
> hdfs://nccc001:54310/mnt/analytics/data/hadoop/tmp/mapred/staging/isapps/.staging/job_201103101218_0287
> Exception in thread "main" org.apache.hadoop.ipc.RemoteException:
> java.io.IOException: java.io.IOException: Exceeded max jobconf size:
> 72276915 limit: 52428800
> at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3759)
> at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1416)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1412)
> Quoting Robin : "I guess we just hit the limit of storing flist in the conf. Moving it do the distributed cache should fix this."
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-632) PFPGrowth : Exceeded max jobconf
size
Posted by "Robin Anil (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robin Anil updated MAHOUT-632:
------------------------------
Status: Patch Available (was: Open)
> PFPGrowth : Exceeded max jobconf size
> --------------------------------------
>
> Key: MAHOUT-632
> URL: https://issues.apache.org/jira/browse/MAHOUT-632
> Project: Mahout
> Issue Type: Bug
> Components: Frequent Itemset/Association Rule Mining
> Affects Versions: 0.4
> Reporter: Vipul Pandey
> Assignee: Robin Anil
> Attachments: MAHOUT-632.patch
>
>
> I'm getting this error right after startParallelCounting finishes :
> 11/03/21 19:06:40 INFO mapred.JobClient: Map output records=164272900
> 11/03/21 19:06:40 INFO mapred.JobClient: SPLIT_RAW_BYTES=2860
> 11/03/21 19:06:40 INFO mapred.JobClient: Reduce input records=67087840
> 11/03/21 19:07:02 INFO pfpgrowth.PFPGrowth: No of Features: 1788471
> 11/03/21 19:07:09 WARN mapred.JobClient: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the same.
> 11/03/21 19:07:12 INFO input.FileInputFormat: Total input paths to process :
> 20
> 11/03/21 19:07:17 INFO mapred.JobClient: Cleaning up the staging area
> hdfs://nccc001:54310/mnt/analytics/data/hadoop/tmp/mapred/staging/isapps/.staging/job_201103101218_0287
> Exception in thread "main" org.apache.hadoop.ipc.RemoteException:
> java.io.IOException: java.io.IOException: Exceeded max jobconf size:
> 72276915 limit: 52428800
> at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3759)
> at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1416)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1412)
> Quoting Robin : "I guess we just hit the limit of storing flist in the conf. Moving it do the distributed cache should fix this."
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira