You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Stephan Ewen <se...@apache.org> on 2015/05/13 01:13:23 UTC

Flink on Tez Test stuck

I have observed that a Flink-on-Tez test job stalls in two cases on the
Travis CI server.

https://travis-ci.org/StephanEwen/incubator-flink/jobs/62302207

It looks like a shuffle fetch is simply not continuing, but freezing. The
stack traces suggest at a first glance that this is actually a Tez issue ,
rather than a Flink issue (all threads stuck in Tez methods), but one
cannot be sure.

Anyone observed something similar before?

Re: Flink on Tez Test stuck

Posted by Robert Metzger <rm...@apache.org>.
Tez has just announced the availability of version 0.6.1.
Maybe that version is more stable. I've filed a jira for upgrading the
version: https://issues.apache.org/jira/browse/FLINK-2064

On Sun, May 17, 2015 at 12:04 PM, Robert Metzger <rm...@apache.org>
wrote:

> I saw this failure also multiple times now.
> This is another case of it:
> https://travis-ci.org/apache/flink/jobs/62767646
>
> I think the Tez community is currently voting on a new release. Maybe we
> should see if this one fixes the issue.
> Otherwise we should ask on their list.
>
> On Wed, May 13, 2015 at 9:35 AM, Aljoscha Krettek <al...@apache.org>
> wrote:
>
>> I think I saw it once, yes. But dismissed it as a fluke.
>>
>> On Wed, May 13, 2015 at 1:13 AM, Stephan Ewen <se...@apache.org> wrote:
>> > I have observed that a Flink-on-Tez test job stalls in two cases on the
>> > Travis CI server.
>> >
>> > https://travis-ci.org/StephanEwen/incubator-flink/jobs/62302207
>> >
>> > It looks like a shuffle fetch is simply not continuing, but freezing.
>> The
>> > stack traces suggest at a first glance that this is actually a Tez
>> issue ,
>> > rather than a Flink issue (all threads stuck in Tez methods), but one
>> > cannot be sure.
>> >
>> > Anyone observed something similar before?
>>
>
>

Re: Flink on Tez Test stuck

Posted by Robert Metzger <rm...@apache.org>.
I saw this failure also multiple times now.
This is another case of it: https://travis-ci.org/apache/flink/jobs/62767646

I think the Tez community is currently voting on a new release. Maybe we
should see if this one fixes the issue.
Otherwise we should ask on their list.

On Wed, May 13, 2015 at 9:35 AM, Aljoscha Krettek <al...@apache.org>
wrote:

> I think I saw it once, yes. But dismissed it as a fluke.
>
> On Wed, May 13, 2015 at 1:13 AM, Stephan Ewen <se...@apache.org> wrote:
> > I have observed that a Flink-on-Tez test job stalls in two cases on the
> > Travis CI server.
> >
> > https://travis-ci.org/StephanEwen/incubator-flink/jobs/62302207
> >
> > It looks like a shuffle fetch is simply not continuing, but freezing. The
> > stack traces suggest at a first glance that this is actually a Tez issue
> ,
> > rather than a Flink issue (all threads stuck in Tez methods), but one
> > cannot be sure.
> >
> > Anyone observed something similar before?
>

Re: Flink on Tez Test stuck

Posted by Aljoscha Krettek <al...@apache.org>.
I think I saw it once, yes. But dismissed it as a fluke.

On Wed, May 13, 2015 at 1:13 AM, Stephan Ewen <se...@apache.org> wrote:
> I have observed that a Flink-on-Tez test job stalls in two cases on the
> Travis CI server.
>
> https://travis-ci.org/StephanEwen/incubator-flink/jobs/62302207
>
> It looks like a shuffle fetch is simply not continuing, but freezing. The
> stack traces suggest at a first glance that this is actually a Tez issue ,
> rather than a Flink issue (all threads stuck in Tez methods), but one
> cannot be sure.
>
> Anyone observed something similar before?