You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by Murtaza Doctor <mu...@gmail.com> on 2013/09/19 02:50:41 UTC

Issue: Max block location exceeded for split error when running hive

Folks,

Any one run into this issue before:
java.io.IOException: Max block location exceeded for split: Paths:
"/foo/bar...."
....
InputFormatClass: org.apache.hadoop.mapred.TextInputFormat
splitsize: 15 maxsize: 10
at
org.apache.hadoop.mapreduce.split.JobSplitWriter.writeOldSplits(JobSplitWriter.java:162)
at
org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:87)
at
org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:501)
at
org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:471)
at
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1266)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:606)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:601)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:601)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:586)
at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447)

When we set the property to something higher as suggested like:
mapreduce.job.max.split.locations = more than on what it failed
then the job runs successfully.

I am trying to dig up additional documentation on this since the default
seems to be 10, not sure how that limit was set.
Additionally what is the recommended value and what factors does it depend
on?

We are running YARN, the actual query is Hive on CDH 4.3, with Hive version
0.10

Any pointers in this direction will be helpful.

Regards,
md

Re: Issue: Max block location exceeded for split error when running hive

Posted by Harsh J <ha...@cloudera.com>.

Are you using a CombineFileInputFormat or similar input format then, perhaps?

On Thu, Sep 19, 2013 at 1:29 PM, Murtaza Doctor <mu...@gmail.com> wrote:
> We are using the default replication factor of 3.  When new files are put on
> HDFS we never override the replication factor. When there is more data
> involved it fails on a larger split size.
>
>
> On Wed, Sep 18, 2013 at 6:34 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Do your input files carry a replication factor of 10+? That could be
>> one cause behind this.
>>
>> On Thu, Sep 19, 2013 at 6:20 AM, Murtaza Doctor <mu...@gmail.com>
>> wrote:
>> > Folks,
>> >
>> > Any one run into this issue before:
>> > java.io.IOException: Max block location exceeded for split: Paths:
>> > "/foo/bar...."
>> > ....
>> > InputFormatClass: org.apache.hadoop.mapred.TextInputFormat
>> > splitsize: 15 maxsize: 10
>> > at
>> >
>> > org.apache.hadoop.mapreduce.split.JobSplitWriter.writeOldSplits(JobSplitWriter.java:162)
>> > at
>> >
>> > org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:87)
>> > at
>> >
>> > org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:501)
>> > at
>> >
>> > org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:471)
>> > at
>> >
>> > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366)
>> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269)
>> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266)
>> > at java.security.AccessController.doPrivileged(Native Method)
>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>> > at
>> >
>> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>> > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1266)
>> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:606)
>> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:601)
>> > at java.security.AccessController.doPrivileged(Native Method)
>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>> > at
>> >
>> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>> > at
>> > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:601)
>> > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:586)
>> > at
>> > org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447)
>> >
>> > When we set the property to something higher as suggested like:
>> > mapreduce.job.max.split.locations = more than on what it failed
>> > then the job runs successfully.
>> >
>> > I am trying to dig up additional documentation on this since the default
>> > seems to be 10, not sure how that limit was set.
>> > Additionally what is the recommended value and what factors does it
>> > depend
>> > on?
>> >
>> > We are running YARN, the actual query is Hive on CDH 4.3, with Hive
>> > version
>> > 0.10
>> >
>> > Any pointers in this direction will be helpful.
>> >
>> > Regards,
>> > md
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Harsh J

Re: Issue: Max block location exceeded for split error when running hive

Posted by Harsh J <ha...@cloudera.com>.

Are you using a CombineFileInputFormat or similar input format then, perhaps?

On Thu, Sep 19, 2013 at 1:29 PM, Murtaza Doctor <mu...@gmail.com> wrote:
> We are using the default replication factor of 3.  When new files are put on
> HDFS we never override the replication factor. When there is more data
> involved it fails on a larger split size.
>
>
> On Wed, Sep 18, 2013 at 6:34 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Do your input files carry a replication factor of 10+? That could be
>> one cause behind this.
>>
>> On Thu, Sep 19, 2013 at 6:20 AM, Murtaza Doctor <mu...@gmail.com>
>> wrote:
>> > Folks,
>> >
>> > Any one run into this issue before:
>> > java.io.IOException: Max block location exceeded for split: Paths:
>> > "/foo/bar...."
>> > ....
>> > InputFormatClass: org.apache.hadoop.mapred.TextInputFormat
>> > splitsize: 15 maxsize: 10
>> > at
>> >
>> > org.apache.hadoop.mapreduce.split.JobSplitWriter.writeOldSplits(JobSplitWriter.java:162)
>> > at
>> >
>> > org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:87)
>> > at
>> >
>> > org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:501)
>> > at
>> >
>> > org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:471)
>> > at
>> >
>> > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366)
>> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269)
>> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266)
>> > at java.security.AccessController.doPrivileged(Native Method)
>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>> > at
>> >
>> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>> > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1266)
>> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:606)
>> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:601)
>> > at java.security.AccessController.doPrivileged(Native Method)
>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>> > at
>> >
>> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>> > at
>> > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:601)
>> > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:586)
>> > at
>> > org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447)
>> >
>> > When we set the property to something higher as suggested like:
>> > mapreduce.job.max.split.locations = more than on what it failed
>> > then the job runs successfully.
>> >
>> > I am trying to dig up additional documentation on this since the default
>> > seems to be 10, not sure how that limit was set.
>> > Additionally what is the recommended value and what factors does it
>> > depend
>> > on?
>> >
>> > We are running YARN, the actual query is Hive on CDH 4.3, with Hive
>> > version
>> > 0.10
>> >
>> > Any pointers in this direction will be helpful.
>> >
>> > Regards,
>> > md
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Harsh J

Re: Issue: Max block location exceeded for split error when running hive

Posted by Harsh J <ha...@cloudera.com>.

Are you using a CombineFileInputFormat or similar input format then, perhaps?

On Thu, Sep 19, 2013 at 1:29 PM, Murtaza Doctor <mu...@gmail.com> wrote:
> We are using the default replication factor of 3.  When new files are put on
> HDFS we never override the replication factor. When there is more data
> involved it fails on a larger split size.
>
>
> On Wed, Sep 18, 2013 at 6:34 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Do your input files carry a replication factor of 10+? That could be
>> one cause behind this.
>>
>> On Thu, Sep 19, 2013 at 6:20 AM, Murtaza Doctor <mu...@gmail.com>
>> wrote:
>> > Folks,
>> >
>> > Any one run into this issue before:
>> > java.io.IOException: Max block location exceeded for split: Paths:
>> > "/foo/bar...."
>> > ....
>> > InputFormatClass: org.apache.hadoop.mapred.TextInputFormat
>> > splitsize: 15 maxsize: 10
>> > at
>> >
>> > org.apache.hadoop.mapreduce.split.JobSplitWriter.writeOldSplits(JobSplitWriter.java:162)
>> > at
>> >
>> > org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:87)
>> > at
>> >
>> > org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:501)
>> > at
>> >
>> > org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:471)
>> > at
>> >
>> > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366)
>> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269)
>> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266)
>> > at java.security.AccessController.doPrivileged(Native Method)
>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>> > at
>> >
>> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>> > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1266)
>> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:606)
>> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:601)
>> > at java.security.AccessController.doPrivileged(Native Method)
>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>> > at
>> >
>> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>> > at
>> > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:601)
>> > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:586)
>> > at
>> > org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447)
>> >
>> > When we set the property to something higher as suggested like:
>> > mapreduce.job.max.split.locations = more than on what it failed
>> > then the job runs successfully.
>> >
>> > I am trying to dig up additional documentation on this since the default
>> > seems to be 10, not sure how that limit was set.
>> > Additionally what is the recommended value and what factors does it
>> > depend
>> > on?
>> >
>> > We are running YARN, the actual query is Hive on CDH 4.3, with Hive
>> > version
>> > 0.10
>> >
>> > Any pointers in this direction will be helpful.
>> >
>> > Regards,
>> > md
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Harsh J

Re: Issue: Max block location exceeded for split error when running hive

Posted by Harsh J <ha...@cloudera.com>.

Are you using a CombineFileInputFormat or similar input format then, perhaps?

On Thu, Sep 19, 2013 at 1:29 PM, Murtaza Doctor <mu...@gmail.com> wrote:
> We are using the default replication factor of 3.  When new files are put on
> HDFS we never override the replication factor. When there is more data
> involved it fails on a larger split size.
>
>
> On Wed, Sep 18, 2013 at 6:34 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Do your input files carry a replication factor of 10+? That could be
>> one cause behind this.
>>
>> On Thu, Sep 19, 2013 at 6:20 AM, Murtaza Doctor <mu...@gmail.com>
>> wrote:
>> > Folks,
>> >
>> > Any one run into this issue before:
>> > java.io.IOException: Max block location exceeded for split: Paths:
>> > "/foo/bar...."
>> > ....
>> > InputFormatClass: org.apache.hadoop.mapred.TextInputFormat
>> > splitsize: 15 maxsize: 10
>> > at
>> >
>> > org.apache.hadoop.mapreduce.split.JobSplitWriter.writeOldSplits(JobSplitWriter.java:162)
>> > at
>> >
>> > org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:87)
>> > at
>> >
>> > org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:501)
>> > at
>> >
>> > org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:471)
>> > at
>> >
>> > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366)
>> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269)
>> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266)
>> > at java.security.AccessController.doPrivileged(Native Method)
>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>> > at
>> >
>> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>> > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1266)
>> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:606)
>> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:601)
>> > at java.security.AccessController.doPrivileged(Native Method)
>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>> > at
>> >
>> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>> > at
>> > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:601)
>> > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:586)
>> > at
>> > org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447)
>> >
>> > When we set the property to something higher as suggested like:
>> > mapreduce.job.max.split.locations = more than on what it failed
>> > then the job runs successfully.
>> >
>> > I am trying to dig up additional documentation on this since the default
>> > seems to be 10, not sure how that limit was set.
>> > Additionally what is the recommended value and what factors does it
>> > depend
>> > on?
>> >
>> > We are running YARN, the actual query is Hive on CDH 4.3, with Hive
>> > version
>> > 0.10
>> >
>> > Any pointers in this direction will be helpful.
>> >
>> > Regards,
>> > md
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Harsh J

Re: Issue: Max block location exceeded for split error when running hive

Posted by Murtaza Doctor <mu...@gmail.com>.

We are using the default replication factor of 3.  When new files are put
on HDFS we never override the replication factor. When there is more data
involved it fails on a larger split size.


On Wed, Sep 18, 2013 at 6:34 PM, Harsh J <ha...@cloudera.com> wrote:

> Do your input files carry a replication factor of 10+? That could be
> one cause behind this.
>
> On Thu, Sep 19, 2013 at 6:20 AM, Murtaza Doctor <mu...@gmail.com>
> wrote:
> > Folks,
> >
> > Any one run into this issue before:
> > java.io.IOException: Max block location exceeded for split: Paths:
> > "/foo/bar...."
> > ....
> > InputFormatClass: org.apache.hadoop.mapred.TextInputFormat
> > splitsize: 15 maxsize: 10
> > at
> >
> org.apache.hadoop.mapreduce.split.JobSplitWriter.writeOldSplits(JobSplitWriter.java:162)
> > at
> >
> org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:87)
> > at
> >
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:501)
> > at
> >
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:471)
> > at
> >
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366)
> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269)
> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at javax.security.auth.Subject.doAs(Subject.java:415)
> > at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1266)
> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:606)
> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:601)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at javax.security.auth.Subject.doAs(Subject.java:415)
> > at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> > at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:601)
> > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:586)
> > at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447)
> >
> > When we set the property to something higher as suggested like:
> > mapreduce.job.max.split.locations = more than on what it failed
> > then the job runs successfully.
> >
> > I am trying to dig up additional documentation on this since the default
> > seems to be 10, not sure how that limit was set.
> > Additionally what is the recommended value and what factors does it
> depend
> > on?
> >
> > We are running YARN, the actual query is Hive on CDH 4.3, with Hive
> version
> > 0.10
> >
> > Any pointers in this direction will be helpful.
> >
> > Regards,
> > md
>
>
>
> --
> Harsh J
>

Re: Issue: Max block location exceeded for split error when running hive

Posted by Matt Davies <ma...@mattdavies.net>.

Thanks Rahul. Our ops people have implemented the config change.

On Thursday, September 19, 2013, Rahul Jain wrote:

> Matt,
>
> It would be better for you to do an global config update: set *mapreduce.job.max.split.locations
> *to at least the number of datanodes in your cluster, either in
> hive-site.xml or mapred-site.xml. Either case, this is a sensible
> configuration update if you are going to use CombineFileInputFormat to read
> input data in hive.
>
> -Rahul
>
>
> On Thu, Sep 19, 2013 at 3:31 PM, Matt Davies <ma...@mattdavies.net> wrote:
>
> What are the ramifications of setting a hard coded value in our scripts
> and then changing parameters which influence the input data size. I.e. I
> want to run across 1 day worth of data, then a different day I want to run
> against 30 days?
>
>
>
>
> On Thu, Sep 19, 2013 at 3:11 PM, Rahul Jain <rj...@gmail.com> wrote:
>
> I am assuming you have looked at this already:
>
> https://issues.apache.org/jira/browse/MAPREDUCE-5186
>
> You do have a workaround here to increase *mapreduce.job.max.split.locations
> *value in hive configuration, or do we need more than that here ?
>
> -Rahul
>
>
> On Thu, Sep 19, 2013 at 11:00 AM, Murtaza Doctor <mu...@gmail.com>wrote:
>
> It used to throw a warning in 1.03 and now has become an IOException. I
> was more trying to figure out why it is exceeding the limit even though the
> replication factor is 3. Also Hive may use CombineInputSplit or some
> version of it, are we saying it will always exceed the limit of 10?
>
>
> On Thu, Sep 19, 2013 at 10:05 AM, Edward Capriolo <ed...@gmail.com>wrote:
>
> We have this job submit property buried in hive that defaults to 10. We
> should make that configurable.
>
>
> On Wed, Sep 18, 2013 at 9:34 PM, Harsh J <ha...@cloudera.com> wrote:
>
> Do your input files carry a replication factor of 10+? That could be
> one cause behind this.
>
> On Thu, Sep 19, 2013 at 6:20 AM, Murtaza Doctor <mu...@gmail.com>
> wrote:
> > Folks,
> >
> > Any one run into this issue before:
> > java.io.IOException: Max block location exceeded for split: Paths:
> > "/foo/bar...."
> > ....
> > InputFormatClass: org.apache.hadoop.mapred.TextInputFormat
> > splitsize: 15 maxsize: 10
> > at
> >
> org.apache.hadoop.mapreduce.split.JobSplitWriter.writeOldSplits(JobSplitWriter.java:162)
> > at
> >
> org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:87)
> > at
> >
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:501)
> > at
> >
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:471)
> > at
> >
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366)
> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269)
> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at javax.security.auth.Subject.doAs(Subject.java:415)
> > at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> > at org.apac
>
>

Re: Issue: Max block location exceeded for split error when running hive

Posted by Matt Davies <ma...@mattdavies.net>.

Thanks Rahul. Our ops people have implemented the config change.

On Thursday, September 19, 2013, Rahul Jain wrote:

> Matt,
>
> It would be better for you to do an global config update: set *mapreduce.job.max.split.locations
> *to at least the number of datanodes in your cluster, either in
> hive-site.xml or mapred-site.xml. Either case, this is a sensible
> configuration update if you are going to use CombineFileInputFormat to read
> input data in hive.
>
> -Rahul
>
>
> On Thu, Sep 19, 2013 at 3:31 PM, Matt Davies <ma...@mattdavies.net> wrote:
>
> What are the ramifications of setting a hard coded value in our scripts
> and then changing parameters which influence the input data size. I.e. I
> want to run across 1 day worth of data, then a different day I want to run
> against 30 days?
>
>
>
>
> On Thu, Sep 19, 2013 at 3:11 PM, Rahul Jain <rj...@gmail.com> wrote:
>
> I am assuming you have looked at this already:
>
> https://issues.apache.org/jira/browse/MAPREDUCE-5186
>
> You do have a workaround here to increase *mapreduce.job.max.split.locations
> *value in hive configuration, or do we need more than that here ?
>
> -Rahul
>
>
> On Thu, Sep 19, 2013 at 11:00 AM, Murtaza Doctor <mu...@gmail.com>wrote:
>
> It used to throw a warning in 1.03 and now has become an IOException. I
> was more trying to figure out why it is exceeding the limit even though the
> replication factor is 3. Also Hive may use CombineInputSplit or some
> version of it, are we saying it will always exceed the limit of 10?
>
>
> On Thu, Sep 19, 2013 at 10:05 AM, Edward Capriolo <ed...@gmail.com>wrote:
>
> We have this job submit property buried in hive that defaults to 10. We
> should make that configurable.
>
>
> On Wed, Sep 18, 2013 at 9:34 PM, Harsh J <ha...@cloudera.com> wrote:
>
> Do your input files carry a replication factor of 10+? That could be
> one cause behind this.
>
> On Thu, Sep 19, 2013 at 6:20 AM, Murtaza Doctor <mu...@gmail.com>
> wrote:
> > Folks,
> >
> > Any one run into this issue before:
> > java.io.IOException: Max block location exceeded for split: Paths:
> > "/foo/bar...."
> > ....
> > InputFormatClass: org.apache.hadoop.mapred.TextInputFormat
> > splitsize: 15 maxsize: 10
> > at
> >
> org.apache.hadoop.mapreduce.split.JobSplitWriter.writeOldSplits(JobSplitWriter.java:162)
> > at
> >
> org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:87)
> > at
> >
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:501)
> > at
> >
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:471)
> > at
> >
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366)
> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269)
> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at javax.security.auth.Subject.doAs(Subject.java:415)
> > at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> > at org.apac
>
>

Re: Issue: Max block location exceeded for split error when running hive

Posted by Matt Davies <ma...@mattdavies.net>.

Thanks Rahul. Our ops people have implemented the config change.

On Thursday, September 19, 2013, Rahul Jain wrote:

> Matt,
>
> It would be better for you to do an global config update: set *mapreduce.job.max.split.locations
> *to at least the number of datanodes in your cluster, either in
> hive-site.xml or mapred-site.xml. Either case, this is a sensible
> configuration update if you are going to use CombineFileInputFormat to read
> input data in hive.
>
> -Rahul
>
>
> On Thu, Sep 19, 2013 at 3:31 PM, Matt Davies <ma...@mattdavies.net> wrote:
>
> What are the ramifications of setting a hard coded value in our scripts
> and then changing parameters which influence the input data size. I.e. I
> want to run across 1 day worth of data, then a different day I want to run
> against 30 days?
>
>
>
>
> On Thu, Sep 19, 2013 at 3:11 PM, Rahul Jain <rj...@gmail.com> wrote:
>
> I am assuming you have looked at this already:
>
> https://issues.apache.org/jira/browse/MAPREDUCE-5186
>
> You do have a workaround here to increase *mapreduce.job.max.split.locations
> *value in hive configuration, or do we need more than that here ?
>
> -Rahul
>
>
> On Thu, Sep 19, 2013 at 11:00 AM, Murtaza Doctor <mu...@gmail.com>wrote:
>
> It used to throw a warning in 1.03 and now has become an IOException. I
> was more trying to figure out why it is exceeding the limit even though the
> replication factor is 3. Also Hive may use CombineInputSplit or some
> version of it, are we saying it will always exceed the limit of 10?
>
>
> On Thu, Sep 19, 2013 at 10:05 AM, Edward Capriolo <ed...@gmail.com>wrote:
>
> We have this job submit property buried in hive that defaults to 10. We
> should make that configurable.
>
>
> On Wed, Sep 18, 2013 at 9:34 PM, Harsh J <ha...@cloudera.com> wrote:
>
> Do your input files carry a replication factor of 10+? That could be
> one cause behind this.
>
> On Thu, Sep 19, 2013 at 6:20 AM, Murtaza Doctor <mu...@gmail.com>
> wrote:
> > Folks,
> >
> > Any one run into this issue before:
> > java.io.IOException: Max block location exceeded for split: Paths:
> > "/foo/bar...."
> > ....
> > InputFormatClass: org.apache.hadoop.mapred.TextInputFormat
> > splitsize: 15 maxsize: 10
> > at
> >
> org.apache.hadoop.mapreduce.split.JobSplitWriter.writeOldSplits(JobSplitWriter.java:162)
> > at
> >
> org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:87)
> > at
> >
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:501)
> > at
> >
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:471)
> > at
> >
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366)
> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269)
> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at javax.security.auth.Subject.doAs(Subject.java:415)
> > at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> > at org.apac
>
>

Re: Issue: Max block location exceeded for split error when running hive

Posted by Matt Davies <ma...@mattdavies.net>.

Thanks Rahul. Our ops people have implemented the config change.

On Thursday, September 19, 2013, Rahul Jain wrote:

> Matt,
>
> It would be better for you to do an global config update: set *mapreduce.job.max.split.locations
> *to at least the number of datanodes in your cluster, either in
> hive-site.xml or mapred-site.xml. Either case, this is a sensible
> configuration update if you are going to use CombineFileInputFormat to read
> input data in hive.
>
> -Rahul
>
>
> On Thu, Sep 19, 2013 at 3:31 PM, Matt Davies <ma...@mattdavies.net> wrote:
>
> What are the ramifications of setting a hard coded value in our scripts
> and then changing parameters which influence the input data size. I.e. I
> want to run across 1 day worth of data, then a different day I want to run
> against 30 days?
>
>
>
>
> On Thu, Sep 19, 2013 at 3:11 PM, Rahul Jain <rj...@gmail.com> wrote:
>
> I am assuming you have looked at this already:
>
> https://issues.apache.org/jira/browse/MAPREDUCE-5186
>
> You do have a workaround here to increase *mapreduce.job.max.split.locations
> *value in hive configuration, or do we need more than that here ?
>
> -Rahul
>
>
> On Thu, Sep 19, 2013 at 11:00 AM, Murtaza Doctor <mu...@gmail.com>wrote:
>
> It used to throw a warning in 1.03 and now has become an IOException. I
> was more trying to figure out why it is exceeding the limit even though the
> replication factor is 3. Also Hive may use CombineInputSplit or some
> version of it, are we saying it will always exceed the limit of 10?
>
>
> On Thu, Sep 19, 2013 at 10:05 AM, Edward Capriolo <ed...@gmail.com>wrote:
>
> We have this job submit property buried in hive that defaults to 10. We
> should make that configurable.
>
>
> On Wed, Sep 18, 2013 at 9:34 PM, Harsh J <ha...@cloudera.com> wrote:
>
> Do your input files carry a replication factor of 10+? That could be
> one cause behind this.
>
> On Thu, Sep 19, 2013 at 6:20 AM, Murtaza Doctor <mu...@gmail.com>
> wrote:
> > Folks,
> >
> > Any one run into this issue before:
> > java.io.IOException: Max block location exceeded for split: Paths:
> > "/foo/bar...."
> > ....
> > InputFormatClass: org.apache.hadoop.mapred.TextInputFormat
> > splitsize: 15 maxsize: 10
> > at
> >
> org.apache.hadoop.mapreduce.split.JobSplitWriter.writeOldSplits(JobSplitWriter.java:162)
> > at
> >
> org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:87)
> > at
> >
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:501)
> > at
> >
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:471)
> > at
> >
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366)
> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269)
> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at javax.security.auth.Subject.doAs(Subject.java:415)
> > at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> > at org.apac
>
>

Re: Issue: Max block location exceeded for split error when running hive

Posted by Rahul Jain <rj...@gmail.com>.

Matt,

It would be better for you to do an global config update: set
*mapreduce.job.max.split.locations
*to at least the number of datanodes in your cluster, either in
hive-site.xml or mapred-site.xml. Either case, this is a sensible
configuration update if you are going to use CombineFileInputFormat to read
input data in hive.

-Rahul


On Thu, Sep 19, 2013 at 3:31 PM, Matt Davies <ma...@mattdavies.net> wrote:

> What are the ramifications of setting a hard coded value in our scripts
> and then changing parameters which influence the input data size. I.e. I
> want to run across 1 day worth of data, then a different day I want to run
> against 30 days?
>
>
>
>
> On Thu, Sep 19, 2013 at 3:11 PM, Rahul Jain <rj...@gmail.com> wrote:
>
>> I am assuming you have looked at this already:
>>
>> https://issues.apache.org/jira/browse/MAPREDUCE-5186
>>
>> You do have a workaround here to increase *mapreduce.job.max.split.locations
>> *value in hive configuration, or do we need more than that here ?
>>
>> -Rahul
>>
>>
>> On Thu, Sep 19, 2013 at 11:00 AM, Murtaza Doctor <murtazadoctor@gmail.com
>> > wrote:
>>
>>> It used to throw a warning in 1.03 and now has become an IOException. I
>>> was more trying to figure out why it is exceeding the limit even though the
>>> replication factor is 3. Also Hive may use CombineInputSplit or some
>>> version of it, are we saying it will always exceed the limit of 10?
>>>
>>>
>>> On Thu, Sep 19, 2013 at 10:05 AM, Edward Capriolo <edlinuxguru@gmail.com
>>> > wrote:
>>>
>>>> We have this job submit property buried in hive that defaults to 10. We
>>>> should make that configurable.
>>>>
>>>>
>>>> On Wed, Sep 18, 2013 at 9:34 PM, Harsh J <ha...@cloudera.com> wrote:
>>>>
>>>>> Do your input files carry a replication factor of 10+? That could be
>>>>> one cause behind this.
>>>>>
>>>>> On Thu, Sep 19, 2013 at 6:20 AM, Murtaza Doctor <
>>>>> murtazadoctor@gmail.com> wrote:
>>>>> > Folks,
>>>>> >
>>>>> > Any one run into this issue before:
>>>>> > java.io.IOException: Max block location exceeded for split: Paths:
>>>>> > "/foo/bar...."
>>>>> > ....
>>>>> > InputFormatClass: org.apache.hadoop.mapred.TextInputFormat
>>>>> > splitsize: 15 maxsize: 10
>>>>> > at
>>>>> >
>>>>> org.apache.hadoop.mapreduce.split.JobSplitWriter.writeOldSplits(JobSplitWriter.java:162)
>>>>> > at
>>>>> >
>>>>> org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:87)
>>>>> > at
>>>>> >
>>>>> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:501)
>>>>> > at
>>>>> >
>>>>> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:471)
>>>>> > at
>>>>> >
>>>>> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366)
>>>>> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269)
>>>>> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266)
>>>>> > at java.security.AccessController.doPrivileged(Native Method)
>>>>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>> > at
>>>>> >
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>>>>> > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1266)
>>>>> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:606)
>>>>> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:601)
>>>>> > at java.security.AccessController.doPrivileged(Native Method)
>>>>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>> > at
>>>>> >
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>>>>> > at
>>>>> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:601)
>>>>> > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:586)
>>>>> > at
>>>>> org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447)
>>>>> >
>>>>> > When we set the property to something higher as suggested like:
>>>>> > mapreduce.job.max.split.locations = more than on what it failed
>>>>> > then the job runs successfully.
>>>>> >
>>>>> > I am trying to dig up additional documentation on this since the
>>>>> default
>>>>> > seems to be 10, not sure how that limit was set.
>>>>> > Additionally what is the recommended value and what factors does it
>>>>> depend
>>>>> > on?
>>>>> >
>>>>> > We are running YARN, the actual query is Hive on CDH 4.3, with Hive
>>>>> version
>>>>> > 0.10
>>>>> >
>>>>> > Any pointers in this direction will be helpful.
>>>>> >
>>>>> > Regards,
>>>>> > md
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Harsh J
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Issue: Max block location exceeded for split error when running hive

Posted by Rahul Jain <rj...@gmail.com>.

Matt,

It would be better for you to do an global config update: set
*mapreduce.job.max.split.locations
*to at least the number of datanodes in your cluster, either in
hive-site.xml or mapred-site.xml. Either case, this is a sensible
configuration update if you are going to use CombineFileInputFormat to read
input data in hive.

-Rahul


On Thu, Sep 19, 2013 at 3:31 PM, Matt Davies <ma...@mattdavies.net> wrote:

> What are the ramifications of setting a hard coded value in our scripts
> and then changing parameters which influence the input data size. I.e. I
> want to run across 1 day worth of data, then a different day I want to run
> against 30 days?
>
>
>
>
> On Thu, Sep 19, 2013 at 3:11 PM, Rahul Jain <rj...@gmail.com> wrote:
>
>> I am assuming you have looked at this already:
>>
>> https://issues.apache.org/jira/browse/MAPREDUCE-5186
>>
>> You do have a workaround here to increase *mapreduce.job.max.split.locations
>> *value in hive configuration, or do we need more than that here ?
>>
>> -Rahul
>>
>>
>> On Thu, Sep 19, 2013 at 11:00 AM, Murtaza Doctor <murtazadoctor@gmail.com
>> > wrote:
>>
>>> It used to throw a warning in 1.03 and now has become an IOException. I
>>> was more trying to figure out why it is exceeding the limit even though the
>>> replication factor is 3. Also Hive may use CombineInputSplit or some
>>> version of it, are we saying it will always exceed the limit of 10?
>>>
>>>
>>> On Thu, Sep 19, 2013 at 10:05 AM, Edward Capriolo <edlinuxguru@gmail.com
>>> > wrote:
>>>
>>>> We have this job submit property buried in hive that defaults to 10. We
>>>> should make that configurable.
>>>>
>>>>
>>>> On Wed, Sep 18, 2013 at 9:34 PM, Harsh J <ha...@cloudera.com> wrote:
>>>>
>>>>> Do your input files carry a replication factor of 10+? That could be
>>>>> one cause behind this.
>>>>>
>>>>> On Thu, Sep 19, 2013 at 6:20 AM, Murtaza Doctor <
>>>>> murtazadoctor@gmail.com> wrote:
>>>>> > Folks,
>>>>> >
>>>>> > Any one run into this issue before:
>>>>> > java.io.IOException: Max block location exceeded for split: Paths:
>>>>> > "/foo/bar...."
>>>>> > ....
>>>>> > InputFormatClass: org.apache.hadoop.mapred.TextInputFormat
>>>>> > splitsize: 15 maxsize: 10
>>>>> > at
>>>>> >
>>>>> org.apache.hadoop.mapreduce.split.JobSplitWriter.writeOldSplits(JobSplitWriter.java:162)
>>>>> > at
>>>>> >
>>>>> org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:87)
>>>>> > at
>>>>> >
>>>>> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:501)
>>>>> > at
>>>>> >
>>>>> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:471)
>>>>> > at
>>>>> >
>>>>> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366)
>>>>> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269)
>>>>> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266)
>>>>> > at java.security.AccessController.doPrivileged(Native Method)
>>>>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>> > at
>>>>> >
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>>>>> > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1266)
>>>>> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:606)
>>>>> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:601)
>>>>> > at java.security.AccessController.doPrivileged(Native Method)
>>>>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>> > at
>>>>> >
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>>>>> > at
>>>>> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:601)
>>>>> > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:586)
>>>>> > at
>>>>> org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447)
>>>>> >
>>>>> > When we set the property to something higher as suggested like:
>>>>> > mapreduce.job.max.split.locations = more than on what it failed
>>>>> > then the job runs successfully.
>>>>> >
>>>>> > I am trying to dig up additional documentation on this since the
>>>>> default
>>>>> > seems to be 10, not sure how that limit was set.
>>>>> > Additionally what is the recommended value and what factors does it
>>>>> depend
>>>>> > on?
>>>>> >
>>>>> > We are running YARN, the actual query is Hive on CDH 4.3, with Hive
>>>>> version
>>>>> > 0.10
>>>>> >
>>>>> > Any pointers in this direction will be helpful.
>>>>> >
>>>>> > Regards,
>>>>> > md
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Harsh J
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Issue: Max block location exceeded for split error when running hive

Posted by Rahul Jain <rj...@gmail.com>.

Matt,

It would be better for you to do an global config update: set
*mapreduce.job.max.split.locations
*to at least the number of datanodes in your cluster, either in
hive-site.xml or mapred-site.xml. Either case, this is a sensible
configuration update if you are going to use CombineFileInputFormat to read
input data in hive.

-Rahul


On Thu, Sep 19, 2013 at 3:31 PM, Matt Davies <ma...@mattdavies.net> wrote:

> What are the ramifications of setting a hard coded value in our scripts
> and then changing parameters which influence the input data size. I.e. I
> want to run across 1 day worth of data, then a different day I want to run
> against 30 days?
>
>
>
>
> On Thu, Sep 19, 2013 at 3:11 PM, Rahul Jain <rj...@gmail.com> wrote:
>
>> I am assuming you have looked at this already:
>>
>> https://issues.apache.org/jira/browse/MAPREDUCE-5186
>>
>> You do have a workaround here to increase *mapreduce.job.max.split.locations
>> *value in hive configuration, or do we need more than that here ?
>>
>> -Rahul
>>
>>
>> On Thu, Sep 19, 2013 at 11:00 AM, Murtaza Doctor <murtazadoctor@gmail.com
>> > wrote:
>>
>>> It used to throw a warning in 1.03 and now has become an IOException. I
>>> was more trying to figure out why it is exceeding the limit even though the
>>> replication factor is 3. Also Hive may use CombineInputSplit or some
>>> version of it, are we saying it will always exceed the limit of 10?
>>>
>>>
>>> On Thu, Sep 19, 2013 at 10:05 AM, Edward Capriolo <edlinuxguru@gmail.com
>>> > wrote:
>>>
>>>> We have this job submit property buried in hive that defaults to 10. We
>>>> should make that configurable.
>>>>
>>>>
>>>> On Wed, Sep 18, 2013 at 9:34 PM, Harsh J <ha...@cloudera.com> wrote:
>>>>
>>>>> Do your input files carry a replication factor of 10+? That could be
>>>>> one cause behind this.
>>>>>
>>>>> On Thu, Sep 19, 2013 at 6:20 AM, Murtaza Doctor <
>>>>> murtazadoctor@gmail.com> wrote:
>>>>> > Folks,
>>>>> >
>>>>> > Any one run into this issue before:
>>>>> > java.io.IOException: Max block location exceeded for split: Paths:
>>>>> > "/foo/bar...."
>>>>> > ....
>>>>> > InputFormatClass: org.apache.hadoop.mapred.TextInputFormat
>>>>> > splitsize: 15 maxsize: 10
>>>>> > at
>>>>> >
>>>>> org.apache.hadoop.mapreduce.split.JobSplitWriter.writeOldSplits(JobSplitWriter.java:162)
>>>>> > at
>>>>> >
>>>>> org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:87)
>>>>> > at
>>>>> >
>>>>> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:501)
>>>>> > at
>>>>> >
>>>>> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:471)
>>>>> > at
>>>>> >
>>>>> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366)
>>>>> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269)
>>>>> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266)
>>>>> > at java.security.AccessController.doPrivileged(Native Method)
>>>>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>> > at
>>>>> >
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>>>>> > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1266)
>>>>> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:606)
>>>>> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:601)
>>>>> > at java.security.AccessController.doPrivileged(Native Method)
>>>>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>> > at
>>>>> >
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>>>>> > at
>>>>> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:601)
>>>>> > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:586)
>>>>> > at
>>>>> org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447)
>>>>> >
>>>>> > When we set the property to something higher as suggested like:
>>>>> > mapreduce.job.max.split.locations = more than on what it failed
>>>>> > then the job runs successfully.
>>>>> >
>>>>> > I am trying to dig up additional documentation on this since the
>>>>> default
>>>>> > seems to be 10, not sure how that limit was set.
>>>>> > Additionally what is the recommended value and what factors does it
>>>>> depend
>>>>> > on?
>>>>> >
>>>>> > We are running YARN, the actual query is Hive on CDH 4.3, with Hive
>>>>> version
>>>>> > 0.10
>>>>> >
>>>>> > Any pointers in this direction will be helpful.
>>>>> >
>>>>> > Regards,
>>>>> > md
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Harsh J
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Issue: Max block location exceeded for split error when running hive

Posted by Rahul Jain <rj...@gmail.com>.

Matt,

It would be better for you to do an global config update: set
*mapreduce.job.max.split.locations
*to at least the number of datanodes in your cluster, either in
hive-site.xml or mapred-site.xml. Either case, this is a sensible
configuration update if you are going to use CombineFileInputFormat to read
input data in hive.

-Rahul


On Thu, Sep 19, 2013 at 3:31 PM, Matt Davies <ma...@mattdavies.net> wrote:

> What are the ramifications of setting a hard coded value in our scripts
> and then changing parameters which influence the input data size. I.e. I
> want to run across 1 day worth of data, then a different day I want to run
> against 30 days?
>
>
>
>
> On Thu, Sep 19, 2013 at 3:11 PM, Rahul Jain <rj...@gmail.com> wrote:
>
>> I am assuming you have looked at this already:
>>
>> https://issues.apache.org/jira/browse/MAPREDUCE-5186
>>
>> You do have a workaround here to increase *mapreduce.job.max.split.locations
>> *value in hive configuration, or do we need more than that here ?
>>
>> -Rahul
>>
>>
>> On Thu, Sep 19, 2013 at 11:00 AM, Murtaza Doctor <murtazadoctor@gmail.com
>> > wrote:
>>
>>> It used to throw a warning in 1.03 and now has become an IOException. I
>>> was more trying to figure out why it is exceeding the limit even though the
>>> replication factor is 3. Also Hive may use CombineInputSplit or some
>>> version of it, are we saying it will always exceed the limit of 10?
>>>
>>>
>>> On Thu, Sep 19, 2013 at 10:05 AM, Edward Capriolo <edlinuxguru@gmail.com
>>> > wrote:
>>>
>>>> We have this job submit property buried in hive that defaults to 10. We
>>>> should make that configurable.
>>>>
>>>>
>>>> On Wed, Sep 18, 2013 at 9:34 PM, Harsh J <ha...@cloudera.com> wrote:
>>>>
>>>>> Do your input files carry a replication factor of 10+? That could be
>>>>> one cause behind this.
>>>>>
>>>>> On Thu, Sep 19, 2013 at 6:20 AM, Murtaza Doctor <
>>>>> murtazadoctor@gmail.com> wrote:
>>>>> > Folks,
>>>>> >
>>>>> > Any one run into this issue before:
>>>>> > java.io.IOException: Max block location exceeded for split: Paths:
>>>>> > "/foo/bar...."
>>>>> > ....
>>>>> > InputFormatClass: org.apache.hadoop.mapred.TextInputFormat
>>>>> > splitsize: 15 maxsize: 10
>>>>> > at
>>>>> >
>>>>> org.apache.hadoop.mapreduce.split.JobSplitWriter.writeOldSplits(JobSplitWriter.java:162)
>>>>> > at
>>>>> >
>>>>> org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:87)
>>>>> > at
>>>>> >
>>>>> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:501)
>>>>> > at
>>>>> >
>>>>> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:471)
>>>>> > at
>>>>> >
>>>>> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366)
>>>>> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269)
>>>>> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266)
>>>>> > at java.security.AccessController.doPrivileged(Native Method)
>>>>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>> > at
>>>>> >
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>>>>> > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1266)
>>>>> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:606)
>>>>> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:601)
>>>>> > at java.security.AccessController.doPrivileged(Native Method)
>>>>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>> > at
>>>>> >
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>>>>> > at
>>>>> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:601)
>>>>> > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:586)
>>>>> > at
>>>>> org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447)
>>>>> >
>>>>> > When we set the property to something higher as suggested like:
>>>>> > mapreduce.job.max.split.locations = more than on what it failed
>>>>> > then the job runs successfully.
>>>>> >
>>>>> > I am trying to dig up additional documentation on this since the
>>>>> default
>>>>> > seems to be 10, not sure how that limit was set.
>>>>> > Additionally what is the recommended value and what factors does it
>>>>> depend
>>>>> > on?
>>>>> >
>>>>> > We are running YARN, the actual query is Hive on CDH 4.3, with Hive
>>>>> version
>>>>> > 0.10
>>>>> >
>>>>> > Any pointers in this direction will be helpful.
>>>>> >
>>>>> > Regards,
>>>>> > md
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Harsh J
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Issue: Max block location exceeded for split error when running hive

Posted by Matt Davies <ma...@mattdavies.net>.

What are the ramifications of setting a hard coded value in our scripts and
then changing parameters which influence the input data size. I.e. I want
to run across 1 day worth of data, then a different day I want to run
against 30 days?




On Thu, Sep 19, 2013 at 3:11 PM, Rahul Jain <rj...@gmail.com> wrote:

> I am assuming you have looked at this already:
>
> https://issues.apache.org/jira/browse/MAPREDUCE-5186
>
> You do have a workaround here to increase *mapreduce.job.max.split.locations
> *value in hive configuration, or do we need more than that here ?
>
> -Rahul
>
>
> On Thu, Sep 19, 2013 at 11:00 AM, Murtaza Doctor <mu...@gmail.com>wrote:
>
>> It used to throw a warning in 1.03 and now has become an IOException. I
>> was more trying to figure out why it is exceeding the limit even though the
>> replication factor is 3. Also Hive may use CombineInputSplit or some
>> version of it, are we saying it will always exceed the limit of 10?
>>
>>
>> On Thu, Sep 19, 2013 at 10:05 AM, Edward Capriolo <ed...@gmail.com>wrote:
>>
>>> We have this job submit property buried in hive that defaults to 10. We
>>> should make that configurable.
>>>
>>>
>>> On Wed, Sep 18, 2013 at 9:34 PM, Harsh J <ha...@cloudera.com> wrote:
>>>
>>>> Do your input files carry a replication factor of 10+? That could be
>>>> one cause behind this.
>>>>
>>>> On Thu, Sep 19, 2013 at 6:20 AM, Murtaza Doctor <
>>>> murtazadoctor@gmail.com> wrote:
>>>> > Folks,
>>>> >
>>>> > Any one run into this issue before:
>>>> > java.io.IOException: Max block location exceeded for split: Paths:
>>>> > "/foo/bar...."
>>>> > ....
>>>> > InputFormatClass: org.apache.hadoop.mapred.TextInputFormat
>>>> > splitsize: 15 maxsize: 10
>>>> > at
>>>> >
>>>> org.apache.hadoop.mapreduce.split.JobSplitWriter.writeOldSplits(JobSplitWriter.java:162)
>>>> > at
>>>> >
>>>> org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:87)
>>>> > at
>>>> >
>>>> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:501)
>>>> > at
>>>> >
>>>> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:471)
>>>> > at
>>>> >
>>>> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366)
>>>> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269)
>>>> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266)
>>>> > at java.security.AccessController.doPrivileged(Native Method)
>>>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>>>> > at
>>>> >
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>>>> > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1266)
>>>> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:606)
>>>> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:601)
>>>> > at java.security.AccessController.doPrivileged(Native Method)
>>>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>>>> > at
>>>> >
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>>>> > at
>>>> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:601)
>>>> > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:586)
>>>> > at
>>>> org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447)
>>>> >
>>>> > When we set the property to something higher as suggested like:
>>>> > mapreduce.job.max.split.locations = more than on what it failed
>>>> > then the job runs successfully.
>>>> >
>>>> > I am trying to dig up additional documentation on this since the
>>>> default
>>>> > seems to be 10, not sure how that limit was set.
>>>> > Additionally what is the recommended value and what factors does it
>>>> depend
>>>> > on?
>>>> >
>>>> > We are running YARN, the actual query is Hive on CDH 4.3, with Hive
>>>> version
>>>> > 0.10
>>>> >
>>>> > Any pointers in this direction will be helpful.
>>>> >
>>>> > Regards,
>>>> > md
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>>
>>>
>>>
>>
>

Re: Issue: Max block location exceeded for split error when running hive

Posted by Matt Davies <ma...@mattdavies.net>.

What are the ramifications of setting a hard coded value in our scripts and
then changing parameters which influence the input data size. I.e. I want
to run across 1 day worth of data, then a different day I want to run
against 30 days?




On Thu, Sep 19, 2013 at 3:11 PM, Rahul Jain <rj...@gmail.com> wrote:

> I am assuming you have looked at this already:
>
> https://issues.apache.org/jira/browse/MAPREDUCE-5186
>
> You do have a workaround here to increase *mapreduce.job.max.split.locations
> *value in hive configuration, or do we need more than that here ?
>
> -Rahul
>
>
> On Thu, Sep 19, 2013 at 11:00 AM, Murtaza Doctor <mu...@gmail.com>wrote:
>
>> It used to throw a warning in 1.03 and now has become an IOException. I
>> was more trying to figure out why it is exceeding the limit even though the
>> replication factor is 3. Also Hive may use CombineInputSplit or some
>> version of it, are we saying it will always exceed the limit of 10?
>>
>>
>> On Thu, Sep 19, 2013 at 10:05 AM, Edward Capriolo <ed...@gmail.com>wrote:
>>
>>> We have this job submit property buried in hive that defaults to 10. We
>>> should make that configurable.
>>>
>>>
>>> On Wed, Sep 18, 2013 at 9:34 PM, Harsh J <ha...@cloudera.com> wrote:
>>>
>>>> Do your input files carry a replication factor of 10+? That could be
>>>> one cause behind this.
>>>>
>>>> On Thu, Sep 19, 2013 at 6:20 AM, Murtaza Doctor <
>>>> murtazadoctor@gmail.com> wrote:
>>>> > Folks,
>>>> >
>>>> > Any one run into this issue before:
>>>> > java.io.IOException: Max block location exceeded for split: Paths:
>>>> > "/foo/bar...."
>>>> > ....
>>>> > InputFormatClass: org.apache.hadoop.mapred.TextInputFormat
>>>> > splitsize: 15 maxsize: 10
>>>> > at
>>>> >
>>>> org.apache.hadoop.mapreduce.split.JobSplitWriter.writeOldSplits(JobSplitWriter.java:162)
>>>> > at
>>>> >
>>>> org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:87)
>>>> > at
>>>> >
>>>> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:501)
>>>> > at
>>>> >
>>>> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:471)
>>>> > at
>>>> >
>>>> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366)
>>>> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269)
>>>> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266)
>>>> > at java.security.AccessController.doPrivileged(Native Method)
>>>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>>>> > at
>>>> >
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>>>> > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1266)
>>>> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:606)
>>>> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:601)
>>>> > at java.security.AccessController.doPrivileged(Native Method)
>>>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>>>> > at
>>>> >
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>>>> > at
>>>> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:601)
>>>> > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:586)
>>>> > at
>>>> org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447)
>>>> >
>>>> > When we set the property to something higher as suggested like:
>>>> > mapreduce.job.max.split.locations = more than on what it failed
>>>> > then the job runs successfully.
>>>> >
>>>> > I am trying to dig up additional documentation on this since the
>>>> default
>>>> > seems to be 10, not sure how that limit was set.
>>>> > Additionally what is the recommended value and what factors does it
>>>> depend
>>>> > on?
>>>> >
>>>> > We are running YARN, the actual query is Hive on CDH 4.3, with Hive
>>>> version
>>>> > 0.10
>>>> >
>>>> > Any pointers in this direction will be helpful.
>>>> >
>>>> > Regards,
>>>> > md
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>>
>>>
>>>
>>
>

Re: Issue: Max block location exceeded for split error when running hive

Posted by Matt Davies <ma...@mattdavies.net>.

What are the ramifications of setting a hard coded value in our scripts and
then changing parameters which influence the input data size. I.e. I want
to run across 1 day worth of data, then a different day I want to run
against 30 days?




On Thu, Sep 19, 2013 at 3:11 PM, Rahul Jain <rj...@gmail.com> wrote:

> I am assuming you have looked at this already:
>
> https://issues.apache.org/jira/browse/MAPREDUCE-5186
>
> You do have a workaround here to increase *mapreduce.job.max.split.locations
> *value in hive configuration, or do we need more than that here ?
>
> -Rahul
>
>
> On Thu, Sep 19, 2013 at 11:00 AM, Murtaza Doctor <mu...@gmail.com>wrote:
>
>> It used to throw a warning in 1.03 and now has become an IOException. I
>> was more trying to figure out why it is exceeding the limit even though the
>> replication factor is 3. Also Hive may use CombineInputSplit or some
>> version of it, are we saying it will always exceed the limit of 10?
>>
>>
>> On Thu, Sep 19, 2013 at 10:05 AM, Edward Capriolo <ed...@gmail.com>wrote:
>>
>>> We have this job submit property buried in hive that defaults to 10. We
>>> should make that configurable.
>>>
>>>
>>> On Wed, Sep 18, 2013 at 9:34 PM, Harsh J <ha...@cloudera.com> wrote:
>>>
>>>> Do your input files carry a replication factor of 10+? That could be
>>>> one cause behind this.
>>>>
>>>> On Thu, Sep 19, 2013 at 6:20 AM, Murtaza Doctor <
>>>> murtazadoctor@gmail.com> wrote:
>>>> > Folks,
>>>> >
>>>> > Any one run into this issue before:
>>>> > java.io.IOException: Max block location exceeded for split: Paths:
>>>> > "/foo/bar...."
>>>> > ....
>>>> > InputFormatClass: org.apache.hadoop.mapred.TextInputFormat
>>>> > splitsize: 15 maxsize: 10
>>>> > at
>>>> >
>>>> org.apache.hadoop.mapreduce.split.JobSplitWriter.writeOldSplits(JobSplitWriter.java:162)
>>>> > at
>>>> >
>>>> org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:87)
>>>> > at
>>>> >
>>>> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:501)
>>>> > at
>>>> >
>>>> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:471)
>>>> > at
>>>> >
>>>> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366)
>>>> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269)
>>>> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266)
>>>> > at java.security.AccessController.doPrivileged(Native Method)
>>>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>>>> > at
>>>> >
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>>>> > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1266)
>>>> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:606)
>>>> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:601)
>>>> > at java.security.AccessController.doPrivileged(Native Method)
>>>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>>>> > at
>>>> >
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>>>> > at
>>>> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:601)
>>>> > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:586)
>>>> > at
>>>> org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447)
>>>> >
>>>> > When we set the property to something higher as suggested like:
>>>> > mapreduce.job.max.split.locations = more than on what it failed
>>>> > then the job runs successfully.
>>>> >
>>>> > I am trying to dig up additional documentation on this since the
>>>> default
>>>> > seems to be 10, not sure how that limit was set.
>>>> > Additionally what is the recommended value and what factors does it
>>>> depend
>>>> > on?
>>>> >
>>>> > We are running YARN, the actual query is Hive on CDH 4.3, with Hive
>>>> version
>>>> > 0.10
>>>> >
>>>> > Any pointers in this direction will be helpful.
>>>> >
>>>> > Regards,
>>>> > md
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>>
>>>
>>>
>>
>

Re: Issue: Max block location exceeded for split error when running hive

Posted by Matt Davies <ma...@mattdavies.net>.

What are the ramifications of setting a hard coded value in our scripts and
then changing parameters which influence the input data size. I.e. I want
to run across 1 day worth of data, then a different day I want to run
against 30 days?




On Thu, Sep 19, 2013 at 3:11 PM, Rahul Jain <rj...@gmail.com> wrote:

> I am assuming you have looked at this already:
>
> https://issues.apache.org/jira/browse/MAPREDUCE-5186
>
> You do have a workaround here to increase *mapreduce.job.max.split.locations
> *value in hive configuration, or do we need more than that here ?
>
> -Rahul
>
>
> On Thu, Sep 19, 2013 at 11:00 AM, Murtaza Doctor <mu...@gmail.com>wrote:
>
>> It used to throw a warning in 1.03 and now has become an IOException. I
>> was more trying to figure out why it is exceeding the limit even though the
>> replication factor is 3. Also Hive may use CombineInputSplit or some
>> version of it, are we saying it will always exceed the limit of 10?
>>
>>
>> On Thu, Sep 19, 2013 at 10:05 AM, Edward Capriolo <ed...@gmail.com>wrote:
>>
>>> We have this job submit property buried in hive that defaults to 10. We
>>> should make that configurable.
>>>
>>>
>>> On Wed, Sep 18, 2013 at 9:34 PM, Harsh J <ha...@cloudera.com> wrote:
>>>
>>>> Do your input files carry a replication factor of 10+? That could be
>>>> one cause behind this.
>>>>
>>>> On Thu, Sep 19, 2013 at 6:20 AM, Murtaza Doctor <
>>>> murtazadoctor@gmail.com> wrote:
>>>> > Folks,
>>>> >
>>>> > Any one run into this issue before:
>>>> > java.io.IOException: Max block location exceeded for split: Paths:
>>>> > "/foo/bar...."
>>>> > ....
>>>> > InputFormatClass: org.apache.hadoop.mapred.TextInputFormat
>>>> > splitsize: 15 maxsize: 10
>>>> > at
>>>> >
>>>> org.apache.hadoop.mapreduce.split.JobSplitWriter.writeOldSplits(JobSplitWriter.java:162)
>>>> > at
>>>> >
>>>> org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:87)
>>>> > at
>>>> >
>>>> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:501)
>>>> > at
>>>> >
>>>> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:471)
>>>> > at
>>>> >
>>>> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366)
>>>> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269)
>>>> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266)
>>>> > at java.security.AccessController.doPrivileged(Native Method)
>>>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>>>> > at
>>>> >
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>>>> > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1266)
>>>> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:606)
>>>> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:601)
>>>> > at java.security.AccessController.doPrivileged(Native Method)
>>>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>>>> > at
>>>> >
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>>>> > at
>>>> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:601)
>>>> > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:586)
>>>> > at
>>>> org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447)
>>>> >
>>>> > When we set the property to something higher as suggested like:
>>>> > mapreduce.job.max.split.locations = more than on what it failed
>>>> > then the job runs successfully.
>>>> >
>>>> > I am trying to dig up additional documentation on this since the
>>>> default
>>>> > seems to be 10, not sure how that limit was set.
>>>> > Additionally what is the recommended value and what factors does it
>>>> depend
>>>> > on?
>>>> >
>>>> > We are running YARN, the actual query is Hive on CDH 4.3, with Hive
>>>> version
>>>> > 0.10
>>>> >
>>>> > Any pointers in this direction will be helpful.
>>>> >
>>>> > Regards,
>>>> > md
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>>
>>>
>>>
>>
>

Re: Issue: Max block location exceeded for split error when running hive

Posted by Rahul Jain <rj...@gmail.com>.

I am assuming you have looked at this already:

https://issues.apache.org/jira/browse/MAPREDUCE-5186

You do have a workaround here to increase *mapreduce.job.max.split.locations
*value in hive configuration, or do we need more than that here ?

-Rahul


On Thu, Sep 19, 2013 at 11:00 AM, Murtaza Doctor <mu...@gmail.com>wrote:

> It used to throw a warning in 1.03 and now has become an IOException. I
> was more trying to figure out why it is exceeding the limit even though the
> replication factor is 3. Also Hive may use CombineInputSplit or some
> version of it, are we saying it will always exceed the limit of 10?
>
>
> On Thu, Sep 19, 2013 at 10:05 AM, Edward Capriolo <ed...@gmail.com>wrote:
>
>> We have this job submit property buried in hive that defaults to 10. We
>> should make that configurable.
>>
>>
>> On Wed, Sep 18, 2013 at 9:34 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>>> Do your input files carry a replication factor of 10+? That could be
>>> one cause behind this.
>>>
>>> On Thu, Sep 19, 2013 at 6:20 AM, Murtaza Doctor <mu...@gmail.com>
>>> wrote:
>>> > Folks,
>>> >
>>> > Any one run into this issue before:
>>> > java.io.IOException: Max block location exceeded for split: Paths:
>>> > "/foo/bar...."
>>> > ....
>>> > InputFormatClass: org.apache.hadoop.mapred.TextInputFormat
>>> > splitsize: 15 maxsize: 10
>>> > at
>>> >
>>> org.apache.hadoop.mapreduce.split.JobSplitWriter.writeOldSplits(JobSplitWriter.java:162)
>>> > at
>>> >
>>> org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:87)
>>> > at
>>> >
>>> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:501)
>>> > at
>>> >
>>> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:471)
>>> > at
>>> >
>>> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366)
>>> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269)
>>> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266)
>>> > at java.security.AccessController.doPrivileged(Native Method)
>>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>>> > at
>>> >
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>>> > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1266)
>>> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:606)
>>> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:601)
>>> > at java.security.AccessController.doPrivileged(Native Method)
>>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>>> > at
>>> >
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>>> > at
>>> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:601)
>>> > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:586)
>>> > at
>>> org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447)
>>> >
>>> > When we set the property to something higher as suggested like:
>>> > mapreduce.job.max.split.locations = more than on what it failed
>>> > then the job runs successfully.
>>> >
>>> > I am trying to dig up additional documentation on this since the
>>> default
>>> > seems to be 10, not sure how that limit was set.
>>> > Additionally what is the recommended value and what factors does it
>>> depend
>>> > on?
>>> >
>>> > We are running YARN, the actual query is Hive on CDH 4.3, with Hive
>>> version
>>> > 0.10
>>> >
>>> > Any pointers in this direction will be helpful.
>>> >
>>> > Regards,
>>> > md
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>

Re: Issue: Max block location exceeded for split error when running hive

Posted by Rahul Jain <rj...@gmail.com>.

I am assuming you have looked at this already:

https://issues.apache.org/jira/browse/MAPREDUCE-5186

You do have a workaround here to increase *mapreduce.job.max.split.locations
*value in hive configuration, or do we need more than that here ?

-Rahul


On Thu, Sep 19, 2013 at 11:00 AM, Murtaza Doctor <mu...@gmail.com>wrote:

> It used to throw a warning in 1.03 and now has become an IOException. I
> was more trying to figure out why it is exceeding the limit even though the
> replication factor is 3. Also Hive may use CombineInputSplit or some
> version of it, are we saying it will always exceed the limit of 10?
>
>
> On Thu, Sep 19, 2013 at 10:05 AM, Edward Capriolo <ed...@gmail.com>wrote:
>
>> We have this job submit property buried in hive that defaults to 10. We
>> should make that configurable.
>>
>>
>> On Wed, Sep 18, 2013 at 9:34 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>>> Do your input files carry a replication factor of 10+? That could be
>>> one cause behind this.
>>>
>>> On Thu, Sep 19, 2013 at 6:20 AM, Murtaza Doctor <mu...@gmail.com>
>>> wrote:
>>> > Folks,
>>> >
>>> > Any one run into this issue before:
>>> > java.io.IOException: Max block location exceeded for split: Paths:
>>> > "/foo/bar...."
>>> > ....
>>> > InputFormatClass: org.apache.hadoop.mapred.TextInputFormat
>>> > splitsize: 15 maxsize: 10
>>> > at
>>> >
>>> org.apache.hadoop.mapreduce.split.JobSplitWriter.writeOldSplits(JobSplitWriter.java:162)
>>> > at
>>> >
>>> org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:87)
>>> > at
>>> >
>>> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:501)
>>> > at
>>> >
>>> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:471)
>>> > at
>>> >
>>> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366)
>>> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269)
>>> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266)
>>> > at java.security.AccessController.doPrivileged(Native Method)
>>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>>> > at
>>> >
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>>> > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1266)
>>> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:606)
>>> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:601)
>>> > at java.security.AccessController.doPrivileged(Native Method)
>>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>>> > at
>>> >
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>>> > at
>>> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:601)
>>> > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:586)
>>> > at
>>> org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447)
>>> >
>>> > When we set the property to something higher as suggested like:
>>> > mapreduce.job.max.split.locations = more than on what it failed
>>> > then the job runs successfully.
>>> >
>>> > I am trying to dig up additional documentation on this since the
>>> default
>>> > seems to be 10, not sure how that limit was set.
>>> > Additionally what is the recommended value and what factors does it
>>> depend
>>> > on?
>>> >
>>> > We are running YARN, the actual query is Hive on CDH 4.3, with Hive
>>> version
>>> > 0.10
>>> >
>>> > Any pointers in this direction will be helpful.
>>> >
>>> > Regards,
>>> > md
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>

Re: Issue: Max block location exceeded for split error when running hive

Posted by Rahul Jain <rj...@gmail.com>.

I am assuming you have looked at this already:

https://issues.apache.org/jira/browse/MAPREDUCE-5186

You do have a workaround here to increase *mapreduce.job.max.split.locations
*value in hive configuration, or do we need more than that here ?

-Rahul


On Thu, Sep 19, 2013 at 11:00 AM, Murtaza Doctor <mu...@gmail.com>wrote:

> It used to throw a warning in 1.03 and now has become an IOException. I
> was more trying to figure out why it is exceeding the limit even though the
> replication factor is 3. Also Hive may use CombineInputSplit or some
> version of it, are we saying it will always exceed the limit of 10?
>
>
> On Thu, Sep 19, 2013 at 10:05 AM, Edward Capriolo <ed...@gmail.com>wrote:
>
>> We have this job submit property buried in hive that defaults to 10. We
>> should make that configurable.
>>
>>
>> On Wed, Sep 18, 2013 at 9:34 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>>> Do your input files carry a replication factor of 10+? That could be
>>> one cause behind this.
>>>
>>> On Thu, Sep 19, 2013 at 6:20 AM, Murtaza Doctor <mu...@gmail.com>
>>> wrote:
>>> > Folks,
>>> >
>>> > Any one run into this issue before:
>>> > java.io.IOException: Max block location exceeded for split: Paths:
>>> > "/foo/bar...."
>>> > ....
>>> > InputFormatClass: org.apache.hadoop.mapred.TextInputFormat
>>> > splitsize: 15 maxsize: 10
>>> > at
>>> >
>>> org.apache.hadoop.mapreduce.split.JobSplitWriter.writeOldSplits(JobSplitWriter.java:162)
>>> > at
>>> >
>>> org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:87)
>>> > at
>>> >
>>> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:501)
>>> > at
>>> >
>>> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:471)
>>> > at
>>> >
>>> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366)
>>> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269)
>>> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266)
>>> > at java.security.AccessController.doPrivileged(Native Method)
>>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>>> > at
>>> >
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>>> > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1266)
>>> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:606)
>>> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:601)
>>> > at java.security.AccessController.doPrivileged(Native Method)
>>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>>> > at
>>> >
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>>> > at
>>> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:601)
>>> > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:586)
>>> > at
>>> org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447)
>>> >
>>> > When we set the property to something higher as suggested like:
>>> > mapreduce.job.max.split.locations = more than on what it failed
>>> > then the job runs successfully.
>>> >
>>> > I am trying to dig up additional documentation on this since the
>>> default
>>> > seems to be 10, not sure how that limit was set.
>>> > Additionally what is the recommended value and what factors does it
>>> depend
>>> > on?
>>> >
>>> > We are running YARN, the actual query is Hive on CDH 4.3, with Hive
>>> version
>>> > 0.10
>>> >
>>> > Any pointers in this direction will be helpful.
>>> >
>>> > Regards,
>>> > md
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>

Re: Issue: Max block location exceeded for split error when running hive

Posted by Rahul Jain <rj...@gmail.com>.

I am assuming you have looked at this already:

https://issues.apache.org/jira/browse/MAPREDUCE-5186

You do have a workaround here to increase *mapreduce.job.max.split.locations
*value in hive configuration, or do we need more than that here ?

-Rahul


On Thu, Sep 19, 2013 at 11:00 AM, Murtaza Doctor <mu...@gmail.com>wrote:

> It used to throw a warning in 1.03 and now has become an IOException. I
> was more trying to figure out why it is exceeding the limit even though the
> replication factor is 3. Also Hive may use CombineInputSplit or some
> version of it, are we saying it will always exceed the limit of 10?
>
>
> On Thu, Sep 19, 2013 at 10:05 AM, Edward Capriolo <ed...@gmail.com>wrote:
>
>> We have this job submit property buried in hive that defaults to 10. We
>> should make that configurable.
>>
>>
>> On Wed, Sep 18, 2013 at 9:34 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>>> Do your input files carry a replication factor of 10+? That could be
>>> one cause behind this.
>>>
>>> On Thu, Sep 19, 2013 at 6:20 AM, Murtaza Doctor <mu...@gmail.com>
>>> wrote:
>>> > Folks,
>>> >
>>> > Any one run into this issue before:
>>> > java.io.IOException: Max block location exceeded for split: Paths:
>>> > "/foo/bar...."
>>> > ....
>>> > InputFormatClass: org.apache.hadoop.mapred.TextInputFormat
>>> > splitsize: 15 maxsize: 10
>>> > at
>>> >
>>> org.apache.hadoop.mapreduce.split.JobSplitWriter.writeOldSplits(JobSplitWriter.java:162)
>>> > at
>>> >
>>> org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:87)
>>> > at
>>> >
>>> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:501)
>>> > at
>>> >
>>> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:471)
>>> > at
>>> >
>>> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366)
>>> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269)
>>> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266)
>>> > at java.security.AccessController.doPrivileged(Native Method)
>>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>>> > at
>>> >
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>>> > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1266)
>>> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:606)
>>> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:601)
>>> > at java.security.AccessController.doPrivileged(Native Method)
>>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>>> > at
>>> >
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>>> > at
>>> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:601)
>>> > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:586)
>>> > at
>>> org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447)
>>> >
>>> > When we set the property to something higher as suggested like:
>>> > mapreduce.job.max.split.locations = more than on what it failed
>>> > then the job runs successfully.
>>> >
>>> > I am trying to dig up additional documentation on this since the
>>> default
>>> > seems to be 10, not sure how that limit was set.
>>> > Additionally what is the recommended value and what factors does it
>>> depend
>>> > on?
>>> >
>>> > We are running YARN, the actual query is Hive on CDH 4.3, with Hive
>>> version
>>> > 0.10
>>> >
>>> > Any pointers in this direction will be helpful.
>>> >
>>> > Regards,
>>> > md
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>

Re: Issue: Max block location exceeded for split error when running hive

Posted by Murtaza Doctor <mu...@gmail.com>.

It used to throw a warning in 1.03 and now has become an IOException. I was
more trying to figure out why it is exceeding the limit even though the
replication factor is 3. Also Hive may use CombineInputSplit or some
version of it, are we saying it will always exceed the limit of 10?


On Thu, Sep 19, 2013 at 10:05 AM, Edward Capriolo <ed...@gmail.com>wrote:

> We have this job submit property buried in hive that defaults to 10. We
> should make that configurable.
>
>
> On Wed, Sep 18, 2013 at 9:34 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> Do your input files carry a replication factor of 10+? That could be
>> one cause behind this.
>>
>> On Thu, Sep 19, 2013 at 6:20 AM, Murtaza Doctor <mu...@gmail.com>
>> wrote:
>> > Folks,
>> >
>> > Any one run into this issue before:
>> > java.io.IOException: Max block location exceeded for split: Paths:
>> > "/foo/bar...."
>> > ....
>> > InputFormatClass: org.apache.hadoop.mapred.TextInputFormat
>> > splitsize: 15 maxsize: 10
>> > at
>> >
>> org.apache.hadoop.mapreduce.split.JobSplitWriter.writeOldSplits(JobSplitWriter.java:162)
>> > at
>> >
>> org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:87)
>> > at
>> >
>> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:501)
>> > at
>> >
>> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:471)
>> > at
>> >
>> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366)
>> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269)
>> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266)
>> > at java.security.AccessController.doPrivileged(Native Method)
>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>> > at
>> >
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>> > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1266)
>> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:606)
>> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:601)
>> > at java.security.AccessController.doPrivileged(Native Method)
>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>> > at
>> >
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>> > at
>> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:601)
>> > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:586)
>> > at
>> org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447)
>> >
>> > When we set the property to something higher as suggested like:
>> > mapreduce.job.max.split.locations = more than on what it failed
>> > then the job runs successfully.
>> >
>> > I am trying to dig up additional documentation on this since the default
>> > seems to be 10, not sure how that limit was set.
>> > Additionally what is the recommended value and what factors does it
>> depend
>> > on?
>> >
>> > We are running YARN, the actual query is Hive on CDH 4.3, with Hive
>> version
>> > 0.10
>> >
>> > Any pointers in this direction will be helpful.
>> >
>> > Regards,
>> > md
>>
>>
>>
>> --
>> Harsh J
>>
>
>

Re: Issue: Max block location exceeded for split error when running hive

Posted by Murtaza Doctor <mu...@gmail.com>.

It used to throw a warning in 1.03 and now has become an IOException. I was
more trying to figure out why it is exceeding the limit even though the
replication factor is 3. Also Hive may use CombineInputSplit or some
version of it, are we saying it will always exceed the limit of 10?


On Thu, Sep 19, 2013 at 10:05 AM, Edward Capriolo <ed...@gmail.com>wrote:

> We have this job submit property buried in hive that defaults to 10. We
> should make that configurable.
>
>
> On Wed, Sep 18, 2013 at 9:34 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> Do your input files carry a replication factor of 10+? That could be
>> one cause behind this.
>>
>> On Thu, Sep 19, 2013 at 6:20 AM, Murtaza Doctor <mu...@gmail.com>
>> wrote:
>> > Folks,
>> >
>> > Any one run into this issue before:
>> > java.io.IOException: Max block location exceeded for split: Paths:
>> > "/foo/bar...."
>> > ....
>> > InputFormatClass: org.apache.hadoop.mapred.TextInputFormat
>> > splitsize: 15 maxsize: 10
>> > at
>> >
>> org.apache.hadoop.mapreduce.split.JobSplitWriter.writeOldSplits(JobSplitWriter.java:162)
>> > at
>> >
>> org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:87)
>> > at
>> >
>> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:501)
>> > at
>> >
>> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:471)
>> > at
>> >
>> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366)
>> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269)
>> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266)
>> > at java.security.AccessController.doPrivileged(Native Method)
>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>> > at
>> >
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>> > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1266)
>> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:606)
>> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:601)
>> > at java.security.AccessController.doPrivileged(Native Method)
>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>> > at
>> >
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>> > at
>> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:601)
>> > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:586)
>> > at
>> org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447)
>> >
>> > When we set the property to something higher as suggested like:
>> > mapreduce.job.max.split.locations = more than on what it failed
>> > then the job runs successfully.
>> >
>> > I am trying to dig up additional documentation on this since the default
>> > seems to be 10, not sure how that limit was set.
>> > Additionally what is the recommended value and what factors does it
>> depend
>> > on?
>> >
>> > We are running YARN, the actual query is Hive on CDH 4.3, with Hive
>> version
>> > 0.10
>> >
>> > Any pointers in this direction will be helpful.
>> >
>> > Regards,
>> > md
>>
>>
>>
>> --
>> Harsh J
>>
>
>

Re: Issue: Max block location exceeded for split error when running hive

Posted by Murtaza Doctor <mu...@gmail.com>.

It used to throw a warning in 1.03 and now has become an IOException. I was
more trying to figure out why it is exceeding the limit even though the
replication factor is 3. Also Hive may use CombineInputSplit or some
version of it, are we saying it will always exceed the limit of 10?


On Thu, Sep 19, 2013 at 10:05 AM, Edward Capriolo <ed...@gmail.com>wrote:

> We have this job submit property buried in hive that defaults to 10. We
> should make that configurable.
>
>
> On Wed, Sep 18, 2013 at 9:34 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> Do your input files carry a replication factor of 10+? That could be
>> one cause behind this.
>>
>> On Thu, Sep 19, 2013 at 6:20 AM, Murtaza Doctor <mu...@gmail.com>
>> wrote:
>> > Folks,
>> >
>> > Any one run into this issue before:
>> > java.io.IOException: Max block location exceeded for split: Paths:
>> > "/foo/bar...."
>> > ....
>> > InputFormatClass: org.apache.hadoop.mapred.TextInputFormat
>> > splitsize: 15 maxsize: 10
>> > at
>> >
>> org.apache.hadoop.mapreduce.split.JobSplitWriter.writeOldSplits(JobSplitWriter.java:162)
>> > at
>> >
>> org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:87)
>> > at
>> >
>> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:501)
>> > at
>> >
>> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:471)
>> > at
>> >
>> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366)
>> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269)
>> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266)
>> > at java.security.AccessController.doPrivileged(Native Method)
>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>> > at
>> >
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>> > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1266)
>> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:606)
>> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:601)
>> > at java.security.AccessController.doPrivileged(Native Method)
>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>> > at
>> >
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>> > at
>> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:601)
>> > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:586)
>> > at
>> org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447)
>> >
>> > When we set the property to something higher as suggested like:
>> > mapreduce.job.max.split.locations = more than on what it failed
>> > then the job runs successfully.
>> >
>> > I am trying to dig up additional documentation on this since the default
>> > seems to be 10, not sure how that limit was set.
>> > Additionally what is the recommended value and what factors does it
>> depend
>> > on?
>> >
>> > We are running YARN, the actual query is Hive on CDH 4.3, with Hive
>> version
>> > 0.10
>> >
>> > Any pointers in this direction will be helpful.
>> >
>> > Regards,
>> > md
>>
>>
>>
>> --
>> Harsh J
>>
>
>

Re: Issue: Max block location exceeded for split error when running hive

Posted by Murtaza Doctor <mu...@gmail.com>.

It used to throw a warning in 1.03 and now has become an IOException. I was
more trying to figure out why it is exceeding the limit even though the
replication factor is 3. Also Hive may use CombineInputSplit or some
version of it, are we saying it will always exceed the limit of 10?


On Thu, Sep 19, 2013 at 10:05 AM, Edward Capriolo <ed...@gmail.com>wrote:

> We have this job submit property buried in hive that defaults to 10. We
> should make that configurable.
>
>
> On Wed, Sep 18, 2013 at 9:34 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> Do your input files carry a replication factor of 10+? That could be
>> one cause behind this.
>>
>> On Thu, Sep 19, 2013 at 6:20 AM, Murtaza Doctor <mu...@gmail.com>
>> wrote:
>> > Folks,
>> >
>> > Any one run into this issue before:
>> > java.io.IOException: Max block location exceeded for split: Paths:
>> > "/foo/bar...."
>> > ....
>> > InputFormatClass: org.apache.hadoop.mapred.TextInputFormat
>> > splitsize: 15 maxsize: 10
>> > at
>> >
>> org.apache.hadoop.mapreduce.split.JobSplitWriter.writeOldSplits(JobSplitWriter.java:162)
>> > at
>> >
>> org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:87)
>> > at
>> >
>> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:501)
>> > at
>> >
>> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:471)
>> > at
>> >
>> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366)
>> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269)
>> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266)
>> > at java.security.AccessController.doPrivileged(Native Method)
>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>> > at
>> >
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>> > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1266)
>> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:606)
>> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:601)
>> > at java.security.AccessController.doPrivileged(Native Method)
>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>> > at
>> >
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>> > at
>> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:601)
>> > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:586)
>> > at
>> org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447)
>> >
>> > When we set the property to something higher as suggested like:
>> > mapreduce.job.max.split.locations = more than on what it failed
>> > then the job runs successfully.
>> >
>> > I am trying to dig up additional documentation on this since the default
>> > seems to be 10, not sure how that limit was set.
>> > Additionally what is the recommended value and what factors does it
>> depend
>> > on?
>> >
>> > We are running YARN, the actual query is Hive on CDH 4.3, with Hive
>> version
>> > 0.10
>> >
>> > Any pointers in this direction will be helpful.
>> >
>> > Regards,
>> > md
>>
>>
>>
>> --
>> Harsh J
>>
>
>

Re: Issue: Max block location exceeded for split error when running hive

Posted by Edward Capriolo <ed...@gmail.com>.

We have this job submit property buried in hive that defaults to 10. We
should make that configurable.


On Wed, Sep 18, 2013 at 9:34 PM, Harsh J <ha...@cloudera.com> wrote:

> Do your input files carry a replication factor of 10+? That could be
> one cause behind this.
>
> On Thu, Sep 19, 2013 at 6:20 AM, Murtaza Doctor <mu...@gmail.com>
> wrote:
> > Folks,
> >
> > Any one run into this issue before:
> > java.io.IOException: Max block location exceeded for split: Paths:
> > "/foo/bar...."
> > ....
> > InputFormatClass: org.apache.hadoop.mapred.TextInputFormat
> > splitsize: 15 maxsize: 10
> > at
> >
> org.apache.hadoop.mapreduce.split.JobSplitWriter.writeOldSplits(JobSplitWriter.java:162)
> > at
> >
> org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:87)
> > at
> >
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:501)
> > at
> >
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:471)
> > at
> >
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366)
> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269)
> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at javax.security.auth.Subject.doAs(Subject.java:415)
> > at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1266)
> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:606)
> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:601)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at javax.security.auth.Subject.doAs(Subject.java:415)
> > at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> > at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:601)
> > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:586)
> > at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447)
> >
> > When we set the property to something higher as suggested like:
> > mapreduce.job.max.split.locations = more than on what it failed
> > then the job runs successfully.
> >
> > I am trying to dig up additional documentation on this since the default
> > seems to be 10, not sure how that limit was set.
> > Additionally what is the recommended value and what factors does it
> depend
> > on?
> >
> > We are running YARN, the actual query is Hive on CDH 4.3, with Hive
> version
> > 0.10
> >
> > Any pointers in this direction will be helpful.
> >
> > Regards,
> > md
>
>
>
> --
> Harsh J
>

Re: Issue: Max block location exceeded for split error when running hive

Posted by Murtaza Doctor <mu...@gmail.com>.

We are using the default replication factor of 3.  When new files are put
on HDFS we never override the replication factor. When there is more data
involved it fails on a larger split size.


On Wed, Sep 18, 2013 at 6:34 PM, Harsh J <ha...@cloudera.com> wrote:

> Do your input files carry a replication factor of 10+? That could be
> one cause behind this.
>
> On Thu, Sep 19, 2013 at 6:20 AM, Murtaza Doctor <mu...@gmail.com>
> wrote:
> > Folks,
> >
> > Any one run into this issue before:
> > java.io.IOException: Max block location exceeded for split: Paths:
> > "/foo/bar...."
> > ....
> > InputFormatClass: org.apache.hadoop.mapred.TextInputFormat
> > splitsize: 15 maxsize: 10
> > at
> >
> org.apache.hadoop.mapreduce.split.JobSplitWriter.writeOldSplits(JobSplitWriter.java:162)
> > at
> >
> org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:87)
> > at
> >
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:501)
> > at
> >
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:471)
> > at
> >
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366)
> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269)
> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at javax.security.auth.Subject.doAs(Subject.java:415)
> > at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1266)
> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:606)
> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:601)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at javax.security.auth.Subject.doAs(Subject.java:415)
> > at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> > at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:601)
> > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:586)
> > at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447)
> >
> > When we set the property to something higher as suggested like:
> > mapreduce.job.max.split.locations = more than on what it failed
> > then the job runs successfully.
> >
> > I am trying to dig up additional documentation on this since the default
> > seems to be 10, not sure how that limit was set.
> > Additionally what is the recommended value and what factors does it
> depend
> > on?
> >
> > We are running YARN, the actual query is Hive on CDH 4.3, with Hive
> version
> > 0.10
> >
> > Any pointers in this direction will be helpful.
> >
> > Regards,
> > md
>
>
>
> --
> Harsh J
>

Re: Issue: Max block location exceeded for split error when running hive

Posted by Edward Capriolo <ed...@gmail.com>.

We have this job submit property buried in hive that defaults to 10. We
should make that configurable.


On Wed, Sep 18, 2013 at 9:34 PM, Harsh J <ha...@cloudera.com> wrote:

> Do your input files carry a replication factor of 10+? That could be
> one cause behind this.
>
> On Thu, Sep 19, 2013 at 6:20 AM, Murtaza Doctor <mu...@gmail.com>
> wrote:
> > Folks,
> >
> > Any one run into this issue before:
> > java.io.IOException: Max block location exceeded for split: Paths:
> > "/foo/bar...."
> > ....
> > InputFormatClass: org.apache.hadoop.mapred.TextInputFormat
> > splitsize: 15 maxsize: 10
> > at
> >
> org.apache.hadoop.mapreduce.split.JobSplitWriter.writeOldSplits(JobSplitWriter.java:162)
> > at
> >
> org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:87)
> > at
> >
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:501)
> > at
> >
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:471)
> > at
> >
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366)
> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269)
> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at javax.security.auth.Subject.doAs(Subject.java:415)
> > at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1266)
> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:606)
> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:601)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at javax.security.auth.Subject.doAs(Subject.java:415)
> > at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> > at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:601)
> > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:586)
> > at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447)
> >
> > When we set the property to something higher as suggested like:
> > mapreduce.job.max.split.locations = more than on what it failed
> > then the job runs successfully.
> >
> > I am trying to dig up additional documentation on this since the default
> > seems to be 10, not sure how that limit was set.
> > Additionally what is the recommended value and what factors does it
> depend
> > on?
> >
> > We are running YARN, the actual query is Hive on CDH 4.3, with Hive
> version
> > 0.10
> >
> > Any pointers in this direction will be helpful.
> >
> > Regards,
> > md
>
>
>
> --
> Harsh J
>

Re: Issue: Max block location exceeded for split error when running hive

Posted by Edward Capriolo <ed...@gmail.com>.

We have this job submit property buried in hive that defaults to 10. We
should make that configurable.


On Wed, Sep 18, 2013 at 9:34 PM, Harsh J <ha...@cloudera.com> wrote:

> Do your input files carry a replication factor of 10+? That could be
> one cause behind this.
>
> On Thu, Sep 19, 2013 at 6:20 AM, Murtaza Doctor <mu...@gmail.com>
> wrote:
> > Folks,
> >
> > Any one run into this issue before:
> > java.io.IOException: Max block location exceeded for split: Paths:
> > "/foo/bar...."
> > ....
> > InputFormatClass: org.apache.hadoop.mapred.TextInputFormat
> > splitsize: 15 maxsize: 10
> > at
> >
> org.apache.hadoop.mapreduce.split.JobSplitWriter.writeOldSplits(JobSplitWriter.java:162)
> > at
> >
> org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:87)
> > at
> >
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:501)
> > at
> >
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:471)
> > at
> >
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366)
> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269)
> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at javax.security.auth.Subject.doAs(Subject.java:415)
> > at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1266)
> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:606)
> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:601)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at javax.security.auth.Subject.doAs(Subject.java:415)
> > at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> > at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:601)
> > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:586)
> > at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447)
> >
> > When we set the property to something higher as suggested like:
> > mapreduce.job.max.split.locations = more than on what it failed
> > then the job runs successfully.
> >
> > I am trying to dig up additional documentation on this since the default
> > seems to be 10, not sure how that limit was set.
> > Additionally what is the recommended value and what factors does it
> depend
> > on?
> >
> > We are running YARN, the actual query is Hive on CDH 4.3, with Hive
> version
> > 0.10
> >
> > Any pointers in this direction will be helpful.
> >
> > Regards,
> > md
>
>
>
> --
> Harsh J
>

Re: Issue: Max block location exceeded for split error when running hive

Posted by Murtaza Doctor <mu...@gmail.com>.

We are using the default replication factor of 3.  When new files are put
on HDFS we never override the replication factor. When there is more data
involved it fails on a larger split size.


On Wed, Sep 18, 2013 at 6:34 PM, Harsh J <ha...@cloudera.com> wrote:

> Do your input files carry a replication factor of 10+? That could be
> one cause behind this.
>
> On Thu, Sep 19, 2013 at 6:20 AM, Murtaza Doctor <mu...@gmail.com>
> wrote:
> > Folks,
> >
> > Any one run into this issue before:
> > java.io.IOException: Max block location exceeded for split: Paths:
> > "/foo/bar...."
> > ....
> > InputFormatClass: org.apache.hadoop.mapred.TextInputFormat
> > splitsize: 15 maxsize: 10
> > at
> >
> org.apache.hadoop.mapreduce.split.JobSplitWriter.writeOldSplits(JobSplitWriter.java:162)
> > at
> >
> org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:87)
> > at
> >
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:501)
> > at
> >
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:471)
> > at
> >
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366)
> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269)
> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at javax.security.auth.Subject.doAs(Subject.java:415)
> > at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1266)
> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:606)
> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:601)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at javax.security.auth.Subject.doAs(Subject.java:415)
> > at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> > at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:601)
> > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:586)
> > at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447)
> >
> > When we set the property to something higher as suggested like:
> > mapreduce.job.max.split.locations = more than on what it failed
> > then the job runs successfully.
> >
> > I am trying to dig up additional documentation on this since the default
> > seems to be 10, not sure how that limit was set.
> > Additionally what is the recommended value and what factors does it
> depend
> > on?
> >
> > We are running YARN, the actual query is Hive on CDH 4.3, with Hive
> version
> > 0.10
> >
> > Any pointers in this direction will be helpful.
> >
> > Regards,
> > md
>
>
>
> --
> Harsh J
>

Re: Issue: Max block location exceeded for split error when running hive

Posted by Murtaza Doctor <mu...@gmail.com>.

We are using the default replication factor of 3.  When new files are put
on HDFS we never override the replication factor. When there is more data
involved it fails on a larger split size.


On Wed, Sep 18, 2013 at 6:34 PM, Harsh J <ha...@cloudera.com> wrote:

> Do your input files carry a replication factor of 10+? That could be
> one cause behind this.
>
> On Thu, Sep 19, 2013 at 6:20 AM, Murtaza Doctor <mu...@gmail.com>
> wrote:
> > Folks,
> >
> > Any one run into this issue before:
> > java.io.IOException: Max block location exceeded for split: Paths:
> > "/foo/bar...."
> > ....
> > InputFormatClass: org.apache.hadoop.mapred.TextInputFormat
> > splitsize: 15 maxsize: 10
> > at
> >
> org.apache.hadoop.mapreduce.split.JobSplitWriter.writeOldSplits(JobSplitWriter.java:162)
> > at
> >
> org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:87)
> > at
> >
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:501)
> > at
> >
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:471)
> > at
> >
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366)
> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269)
> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at javax.security.auth.Subject.doAs(Subject.java:415)
> > at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1266)
> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:606)
> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:601)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at javax.security.auth.Subject.doAs(Subject.java:415)
> > at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> > at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:601)
> > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:586)
> > at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447)
> >
> > When we set the property to something higher as suggested like:
> > mapreduce.job.max.split.locations = more than on what it failed
> > then the job runs successfully.
> >
> > I am trying to dig up additional documentation on this since the default
> > seems to be 10, not sure how that limit was set.
> > Additionally what is the recommended value and what factors does it
> depend
> > on?
> >
> > We are running YARN, the actual query is Hive on CDH 4.3, with Hive
> version
> > 0.10
> >
> > Any pointers in this direction will be helpful.
> >
> > Regards,
> > md
>
>
>
> --
> Harsh J
>

Re: Issue: Max block location exceeded for split error when running hive

Posted by Edward Capriolo <ed...@gmail.com>.

We have this job submit property buried in hive that defaults to 10. We
should make that configurable.


On Wed, Sep 18, 2013 at 9:34 PM, Harsh J <ha...@cloudera.com> wrote:

> Do your input files carry a replication factor of 10+? That could be
> one cause behind this.
>
> On Thu, Sep 19, 2013 at 6:20 AM, Murtaza Doctor <mu...@gmail.com>
> wrote:
> > Folks,
> >
> > Any one run into this issue before:
> > java.io.IOException: Max block location exceeded for split: Paths:
> > "/foo/bar...."
> > ....
> > InputFormatClass: org.apache.hadoop.mapred.TextInputFormat
> > splitsize: 15 maxsize: 10
> > at
> >
> org.apache.hadoop.mapreduce.split.JobSplitWriter.writeOldSplits(JobSplitWriter.java:162)
> > at
> >
> org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:87)
> > at
> >
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:501)
> > at
> >
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:471)
> > at
> >
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366)
> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269)
> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at javax.security.auth.Subject.doAs(Subject.java:415)
> > at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1266)
> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:606)
> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:601)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at javax.security.auth.Subject.doAs(Subject.java:415)
> > at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> > at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:601)
> > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:586)
> > at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447)
> >
> > When we set the property to something higher as suggested like:
> > mapreduce.job.max.split.locations = more than on what it failed
> > then the job runs successfully.
> >
> > I am trying to dig up additional documentation on this since the default
> > seems to be 10, not sure how that limit was set.
> > Additionally what is the recommended value and what factors does it
> depend
> > on?
> >
> > We are running YARN, the actual query is Hive on CDH 4.3, with Hive
> version
> > 0.10
> >
> > Any pointers in this direction will be helpful.
> >
> > Regards,
> > md
>
>
>
> --
> Harsh J
>

Re: Issue: Max block location exceeded for split error when running hive

Posted by Harsh J <ha...@cloudera.com>.

Do your input files carry a replication factor of 10+? That could be
one cause behind this.

On Thu, Sep 19, 2013 at 6:20 AM, Murtaza Doctor <mu...@gmail.com> wrote:
> Folks,
>
> Any one run into this issue before:
> java.io.IOException: Max block location exceeded for split: Paths:
> "/foo/bar...."
> ....
> InputFormatClass: org.apache.hadoop.mapred.TextInputFormat
> splitsize: 15 maxsize: 10
> at
> org.apache.hadoop.mapreduce.split.JobSplitWriter.writeOldSplits(JobSplitWriter.java:162)
> at
> org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:87)
> at
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:501)
> at
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:471)
> at
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366)
> at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269)
> at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1266)
> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:606)
> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:601)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:601)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:586)
> at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447)
>
> When we set the property to something higher as suggested like:
> mapreduce.job.max.split.locations = more than on what it failed
> then the job runs successfully.
>
> I am trying to dig up additional documentation on this since the default
> seems to be 10, not sure how that limit was set.
> Additionally what is the recommended value and what factors does it depend
> on?
>
> We are running YARN, the actual query is Hive on CDH 4.3, with Hive version
> 0.10
>
> Any pointers in this direction will be helpful.
>
> Regards,
> md



-- 
Harsh J

Re: Issue: Max block location exceeded for split error when running hive

Posted by Harsh J <ha...@cloudera.com>.

Do your input files carry a replication factor of 10+? That could be
one cause behind this.

On Thu, Sep 19, 2013 at 6:20 AM, Murtaza Doctor <mu...@gmail.com> wrote:
> Folks,
>
> Any one run into this issue before:
> java.io.IOException: Max block location exceeded for split: Paths:
> "/foo/bar...."
> ....
> InputFormatClass: org.apache.hadoop.mapred.TextInputFormat
> splitsize: 15 maxsize: 10
> at
> org.apache.hadoop.mapreduce.split.JobSplitWriter.writeOldSplits(JobSplitWriter.java:162)
> at
> org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:87)
> at
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:501)
> at
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:471)
> at
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366)
> at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269)
> at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1266)
> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:606)
> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:601)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:601)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:586)
> at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447)
>
> When we set the property to something higher as suggested like:
> mapreduce.job.max.split.locations = more than on what it failed
> then the job runs successfully.
>
> I am trying to dig up additional documentation on this since the default
> seems to be 10, not sure how that limit was set.
> Additionally what is the recommended value and what factors does it depend
> on?
>
> We are running YARN, the actual query is Hive on CDH 4.3, with Hive version
> 0.10
>
> Any pointers in this direction will be helpful.
>
> Regards,
> md



-- 
Harsh J

Re: Issue: Max block location exceeded for split error when running hive

Posted by Harsh J <ha...@cloudera.com>.

Do your input files carry a replication factor of 10+? That could be
one cause behind this.

On Thu, Sep 19, 2013 at 6:20 AM, Murtaza Doctor <mu...@gmail.com> wrote:
> Folks,
>
> Any one run into this issue before:
> java.io.IOException: Max block location exceeded for split: Paths:
> "/foo/bar...."
> ....
> InputFormatClass: org.apache.hadoop.mapred.TextInputFormat
> splitsize: 15 maxsize: 10
> at
> org.apache.hadoop.mapreduce.split.JobSplitWriter.writeOldSplits(JobSplitWriter.java:162)
> at
> org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:87)
> at
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:501)
> at
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:471)
> at
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366)
> at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269)
> at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1266)
> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:606)
> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:601)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:601)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:586)
> at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447)
>
> When we set the property to something higher as suggested like:
> mapreduce.job.max.split.locations = more than on what it failed
> then the job runs successfully.
>
> I am trying to dig up additional documentation on this since the default
> seems to be 10, not sure how that limit was set.
> Additionally what is the recommended value and what factors does it depend
> on?
>
> We are running YARN, the actual query is Hive on CDH 4.3, with Hive version
> 0.10
>
> Any pointers in this direction will be helpful.
>
> Regards,
> md



-- 
Harsh J

Re: Issue: Max block location exceeded for split error when running hive

Posted by Harsh J <ha...@cloudera.com>.

Do your input files carry a replication factor of 10+? That could be
one cause behind this.

On Thu, Sep 19, 2013 at 6:20 AM, Murtaza Doctor <mu...@gmail.com> wrote:
> Folks,
>
> Any one run into this issue before:
> java.io.IOException: Max block location exceeded for split: Paths:
> "/foo/bar...."
> ....
> InputFormatClass: org.apache.hadoop.mapred.TextInputFormat
> splitsize: 15 maxsize: 10
> at
> org.apache.hadoop.mapreduce.split.JobSplitWriter.writeOldSplits(JobSplitWriter.java:162)
> at
> org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:87)
> at
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:501)
> at
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:471)
> at
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366)
> at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269)
> at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1266)
> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:606)
> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:601)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:601)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:586)
> at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447)
>
> When we set the property to something higher as suggested like:
> mapreduce.job.max.split.locations = more than on what it failed
> then the job runs successfully.
>
> I am trying to dig up additional documentation on this since the default
> seems to be 10, not sure how that limit was set.
> Additionally what is the recommended value and what factors does it depend
> on?
>
> We are running YARN, the actual query is Hive on CDH 4.3, with Hive version
> 0.10
>
> Any pointers in this direction will be helpful.
>
> Regards,
> md



-- 
Harsh J