You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@hadoop.apache.org by Wojciech Langiewicz <wl...@gmail.com> on 2011/03/15 10:55:26 UTC

java.io.IOException: Split metadata size exceeded 10000000

Hello,
I'm having this problem running mapreduce jobs over about 10TB of data 
(smaller jobs are ok):
2011-03-15 07:48:22,031 ERROR org.apache.hadoop.mapred.JobTracker: Job 
initialization failed:
java.io.IOException: Split metadata size exceeded 10000000. Aborting job 
job_201103141436_0058
         at 
org.apache.hadoop.mapreduce.split.SplitMetaInfoReader.readSplitMetaInfo(SplitMetaInfoReader.java:48)
         at 
org.apache.hadoop.mapred.JobInProgress.createSplits(JobInProgress.java:732)
         at 
org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:633)
         at 
org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:3965)
         at 
org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:79)
         at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
         at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
         at java.lang.Thread.run(Thread.java:619)

2011-03-15 07:48:22,031 INFO org.apache.hadoop.mapred.JobTracker: 
Failing job job_201103141436_0058

What settings should I change to run this job?
I'm using CDH3b3.
Thanks for all answers.

--
Wojciech Langiewcz

RE: java.io.IOException: Split metadata size exceeded 10000000

Posted by "Rottinghuis, Joep" <jr...@ebay.com>.
Doubt this is a CDH3 issue.
We saw the same with a large job using the 0.20-security branch.

There is a property (mapreduce.jobtracker.split.metainfo.maxsize) that can be used to override the default of 10^6.
We found that passing this along with the job has no effect, this worked only when setting this property on the jobtracker node. Not sure if this is a feature or a bug.

Cheers,

Joep

-----Original Message-----
From: Harsh J [mailto:qwertymaniac@gmail.com] 
Sent: Tuesday, March 15, 2011 3:33 AM
To: CDH Users
Cc: wlangiewicz@gmail.com
Subject: Re: java.io.IOException: Split metadata size exceeded 10000000

Moving this discussion to the CDH users list at cdh-user [at]
cloudera.org since it could be a CDH specific issue.

[Bcc: general]

On Tue, Mar 15, 2011 at 3:25 PM, Wojciech Langiewicz
<wl...@gmail.com> wrote:
> Hello,
> I'm having this problem running mapreduce jobs over about 10TB of data
> (smaller jobs are ok):
> 2011-03-15 07:48:22,031 ERROR org.apache.hadoop.mapred.JobTracker: Job
> initialization failed:
> java.io.IOException: Split metadata size exceeded 10000000. Aborting job
> job_201103141436_0058
>        at
> org.apache.hadoop.mapreduce.split.SplitMetaInfoReader.readSplitMetaInfo(SplitMetaInfoReader.java:48)
>        at
> org.apache.hadoop.mapred.JobInProgress.createSplits(JobInProgress.java:732)
>        at
> org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:633)
>        at org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:3965)
>        at
> org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:79)
>        at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:619)
>
> 2011-03-15 07:48:22,031 INFO org.apache.hadoop.mapred.JobTracker: Failing
> job job_201103141436_0058
>
> What settings should I change to run this job?
> I'm using CDH3b3.
> Thanks for all answers.
>
> --
> Wojciech Langiewcz
>



-- 
Harsh J
http://harshj.com

Re: java.io.IOException: Split metadata size exceeded 10000000

Posted by Harsh J <qw...@gmail.com>.
Moving this discussion to the CDH users list at cdh-user [at]
cloudera.org since it could be a CDH specific issue.

[Bcc: general]

On Tue, Mar 15, 2011 at 3:25 PM, Wojciech Langiewicz
<wl...@gmail.com> wrote:
> Hello,
> I'm having this problem running mapreduce jobs over about 10TB of data
> (smaller jobs are ok):
> 2011-03-15 07:48:22,031 ERROR org.apache.hadoop.mapred.JobTracker: Job
> initialization failed:
> java.io.IOException: Split metadata size exceeded 10000000. Aborting job
> job_201103141436_0058
>        at
> org.apache.hadoop.mapreduce.split.SplitMetaInfoReader.readSplitMetaInfo(SplitMetaInfoReader.java:48)
>        at
> org.apache.hadoop.mapred.JobInProgress.createSplits(JobInProgress.java:732)
>        at
> org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:633)
>        at org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:3965)
>        at
> org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:79)
>        at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:619)
>
> 2011-03-15 07:48:22,031 INFO org.apache.hadoop.mapred.JobTracker: Failing
> job job_201103141436_0058
>
> What settings should I change to run this job?
> I'm using CDH3b3.
> Thanks for all answers.
>
> --
> Wojciech Langiewcz
>



-- 
Harsh J
http://harshj.com