You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Gautam <ga...@gmail.com> on 2016/03/11 02:33:22 UTC

Tez job submissions failing when cluster is under provisioned..

Hello,

Ran into this today.. We'r seeing Tez jobs failing to submit when cluster
is under high load. In particular, the split calculation seems to fall over
when it sees # slots <0. This seems to be something YARN fair-scheduler
reporting it this way.. although Tez doesn't seem to handle.

Vertex failed, vertexName=Map 1, vertexId=vertex_1457029908268_101939_1_00,
diagnostics=[Vertex vertex_1457029908268_101939_1_00 [Map 1] killed/failed
due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: upsight_clean_aggregate_data
initializer failed, vertex=vertex_1457029908268_101939_1_00 [Map 1], java.
lang.IllegalArgumentException: Illegal Capacity: -135

        at java.util.ArrayList.<init>(ArrayList.java:142)

        at org.apache.hadoop.mapred.FileInputFormat.getSplits(
FileInputFormat.java:330)

        at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(
HiveInputFormat.java:306)

        at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(
HiveInputFormat.java:408)

        at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(
HiveSplitGenerator.java:129)




 I did come across HIVE-12957, in which the fix patch seems to only report
the error better instead of doing anything about it.

Now comes my question, is this in an expected failure case ? Is there a bug
I should know about in YARN scheduling or am I misunderstanding the issue?
It seems rather frivolous on Tez's part to give up when the cluster is
under high load instead of just defaulting to some sane default and adding
tasks to the queue.


-Gautam.

Re: Tez job submissions failing when cluster is under provisioned..

Posted by Gautam <ga...@gmail.com>.

This one seems related

https://issues.apache.org/jira/browse/YARN-4538

Yet to ascertain if it actually fixes this issue.

On Thu, Mar 10, 2016 at 11:43 PM, Gopal Vijayaraghavan <go...@apache.org>
wrote:

>
> > This seems to be something YARN fair-scheduler reporting it this way..
> >although Tez doesn't seem to handle.
>
> Pepperdata?
>
>
> > I did come across HIVE-12957, in which the fix patch seems to only
> >report the error better instead of doing anything about it.
> ...
> > Now comes my question, is this in an expected failure case ? Is there a
> >bug I should know about in YARN scheduling or am I misunderstanding the
> >issue?
>
> YARN is reporting -ve head-room, this means your cluster is running with
> negative-capacity for some strange reason.
>
> That is a bug somewhere in the FairScheduler's internal state.
>
> The issue was never reproduced after we switched over to the
> CapacityScheduler, so it's in limbo.
>
> Cheers,
> Gopal
>
>
>


-- 
"If you really want something in this life, you have to work for it. Now,
quiet! They're about to announce the lottery numbers..."

Re: Tez job submissions failing when cluster is under provisioned..

Posted by Gopal Vijayaraghavan <go...@apache.org>.

> This seems to be something YARN fair-scheduler reporting it this way..
>although Tez doesn't seem to handle.

Pepperdata?

 
> I did come across HIVE-12957, in which the fix patch seems to only
>report the error better instead of doing anything about it.
...
> Now comes my question, is this in an expected failure case ? Is there a
>bug I should know about in YARN scheduling or am I misunderstanding the
>issue?

YARN is reporting -ve head-room, this means your cluster is running with
negative-capacity for some strange reason.

That is a bug somewhere in the FairScheduler's internal state.

The issue was never reproduced after we switched over to the
CapacityScheduler, so it's in limbo.

Cheers,
Gopal