You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Kurt Young <yk...@gmail.com> on 2019/06/18 06:34:52 UTC

Something wrong with travis?

Hi dev,

I noticed that all the travis tests triggered by pull request are failed
with the same error:

"Cached flink dir /home/travis/flink_cache/xxxxx/flink does not exist.
Exiting build."

Anyone have a clue on what happened and how to fix this?

Best,
Kurt

Re: Something wrong with travis?

Posted by Chesnay Schepler <ch...@apache.org>.
There is nothing to report; we already know what the problem is but it 
cannot be fixed.

On 30/07/2019 08:46, Yun Tang wrote:
> I met this problem again at https://api.travis-ci.com/v3/job/220732163/log.txt . Is there any place we could ask for help to contact tarvis or any clues we could use to figure out this?
>
> Best
> Yun Tang
> ________________________________
> From: Yun Tang <my...@live.com>
> Sent: Monday, June 24, 2019 14:22
> To: dev@flink.apache.org <de...@flink.apache.org>; Kurt Young <yk...@gmail.com>
> Subject: Re: Something wrong with travis?
>
> Unfortunately, I met this problem again just now https://api.travis-ci.org/v3/job/549534496/log.txt (the build overview https://travis-ci.org/apache/flink/builds/549534489). For those non-committers, including me, we have to close-reopen the PR or push another commit to re-trigger the PR check🙁
>
> Best
> Yun Tang
> ________________________________
> From: Chesnay Schepler <ch...@apache.org>
> Sent: Wednesday, June 19, 2019 16:59
> To: dev@flink.apache.org; Kurt Young
> Subject: Re: Something wrong with travis?
>
> Recent builds are passing again.
>
> On 18/06/2019 08:34, Kurt Young wrote:
>> Hi dev,
>>
>> I noticed that all the travis tests triggered by pull request are failed
>> with the same error:
>>
>> "Cached flink dir /home/travis/flink_cache/xxxxx/flink does not exist.
>> Exiting build."
>>
>> Anyone have a clue on what happened and how to fix this?
>>
>> Best,
>> Kurt
>>


Re: Something wrong with travis?

Posted by Yun Tang <my...@live.com>.
I met this problem again at https://api.travis-ci.com/v3/job/220732163/log.txt . Is there any place we could ask for help to contact tarvis or any clues we could use to figure out this?

Best
Yun Tang
________________________________
From: Yun Tang <my...@live.com>
Sent: Monday, June 24, 2019 14:22
To: dev@flink.apache.org <de...@flink.apache.org>; Kurt Young <yk...@gmail.com>
Subject: Re: Something wrong with travis?

Unfortunately, I met this problem again just now https://api.travis-ci.org/v3/job/549534496/log.txt (the build overview https://travis-ci.org/apache/flink/builds/549534489). For those non-committers, including me, we have to close-reopen the PR or push another commit to re-trigger the PR check🙁

Best
Yun Tang
________________________________
From: Chesnay Schepler <ch...@apache.org>
Sent: Wednesday, June 19, 2019 16:59
To: dev@flink.apache.org; Kurt Young
Subject: Re: Something wrong with travis?

Recent builds are passing again.

On 18/06/2019 08:34, Kurt Young wrote:
> Hi dev,
>
> I noticed that all the travis tests triggered by pull request are failed
> with the same error:
>
> "Cached flink dir /home/travis/flink_cache/xxxxx/flink does not exist.
> Exiting build."
>
> Anyone have a clue on what happened and how to fix this?
>
> Best,
> Kurt
>


Re: Something wrong with travis?

Posted by Yun Tang <my...@live.com>.
Unfortunately, I met this problem again just now https://api.travis-ci.org/v3/job/549534496/log.txt (the build overview https://travis-ci.org/apache/flink/builds/549534489). For those non-committers, including me, we have to close-reopen the PR or push another commit to re-trigger the PR check🙁

Best
Yun Tang
________________________________
From: Chesnay Schepler <ch...@apache.org>
Sent: Wednesday, June 19, 2019 16:59
To: dev@flink.apache.org; Kurt Young
Subject: Re: Something wrong with travis?

Recent builds are passing again.

On 18/06/2019 08:34, Kurt Young wrote:
> Hi dev,
>
> I noticed that all the travis tests triggered by pull request are failed
> with the same error:
>
> "Cached flink dir /home/travis/flink_cache/xxxxx/flink does not exist.
> Exiting build."
>
> Anyone have a clue on what happened and how to fix this?
>
> Best,
> Kurt
>


Re: Something wrong with travis?

Posted by Chesnay Schepler <ch...@apache.org>.
Recent builds are passing again.

On 18/06/2019 08:34, Kurt Young wrote:
> Hi dev,
>
> I noticed that all the travis tests triggered by pull request are failed
> with the same error:
>
> "Cached flink dir /home/travis/flink_cache/xxxxx/flink does not exist.
> Exiting build."
>
> Anyone have a clue on what happened and how to fix this?
>
> Best,
> Kurt
>


Re: Something wrong with travis?

Posted by Biao Liu <mm...@gmail.com>.
It has been crashed for more than 14 hours. Hope it recovers soon.

Jeff Zhang <zj...@gmail.com> 于2019年6月18日周二 下午3:21写道:

> If it is travis caching issue, we can file apache infra ticket and ask them
> to clean the cache.
>
>
>
> Chesnay Schepler <ch...@apache.org> 于2019年6月18日周二 下午3:18写道:
>
> > This is (hopefully a short-lived) hiccup on the Travis caching
> > infrastructure.
> >
> > There's nothing we can do to _fix_ it; if it persists we'll have to
> > rework our travis setup again to not rely on caching.
> >
> > On 18/06/2019 08:34, Kurt Young wrote:
> > > Hi dev,
> > >
> > > I noticed that all the travis tests triggered by pull request are
> failed
> > > with the same error:
> > >
> > > "Cached flink dir /home/travis/flink_cache/xxxxx/flink does not exist.
> > > Exiting build."
> > >
> > > Anyone have a clue on what happened and how to fix this?
> > >
> > > Best,
> > > Kurt
> > >
> >
> >
>
> --
> Best Regards
>
> Jeff Zhang
>

Re: Something wrong with travis?

Posted by Chesnay Schepler <ch...@apache.org>.
The compile stage was always passing.

The timeout makes no difference, it only affects how long we wait for 
the download to complete.
We already had significantly more data in the cache a while ago (like 
twice as much), so I skeptical that the amount of cached data is the 
problem.

On 18/06/2019 12:47, jincheng sun wrote:
> Test result:
>   - The test for only compile state are succeeding (I deleted some old
> caches) cache size 1146.26M. See here
> https://travis-ci.org/sunjincheng121/flink/caches
> - timeout to 1200 test fail, get the same error, but I think maybe the
> storage problem, so I delete more old cache and restart the CI. See here
> https://travis-ci.org/apache/flink/builds/547136163
>
> So now it feels like the storage size of the cache is limited. If so we can
> add some cleanup logic for the old cache (I am not sure,some validation is
> needed)
>
> Best
> Jincheng
>
> jincheng sun <su...@gmail.com> 于2019年6月18日周二 下午6:00写道:
>
>> I agree with the explanation from @Chesnay Schepler <ch...@apache.org>.  this
>> should be a problem with the Travis infrastructure because recently we have
>> not big changed the logic of Travis inside Flink.
>> At present, most of the failures are after the compile is completed. The
>> cache size is only 7.7M, which means that the JARs are not successfully
>> uploaded.
>>
>> So here is a question:
>>   - Where can we check the cache storage to see if there is a problem with
>> the storage?
>>
>> In order to try to find out some reason for the CI issue,  I do the
>> follows test:
>>
>>   - I delete other test phases locally and test them - Test whether the
>> cache is uploaded normally during the compilation phase. See here
>> https://travis-ci.org/sunjincheng121/flink/builds/547155029
>>   - Increase Travis cache timeout to 1200 - Test the cache cannot be
>> downloaded due to cache is a timeout. (I think this test will have the same
>> result ) See here https://travis-ci.org/apache/flink/builds/547136163
>>
>> Will feedback here after testing.
>>
>> Best,
>> Jincheng
>>
>> Chesnay Schepler <ch...@apache.org> 于2019年6月18日周二 下午3:53写道:
>>
>>> The problem is not that bad stuff is in the cache (which is the only
>>> thing a cache cleaning solves), it is that the test stages don't
>>> download the correct one.
>>>
>>> Our compile stage uploads stuff in to the cache, and the subsequent test
>>> builds downloads it again.
>>>
>>> Whether the upload from the compile phase is visible to the test phase
>>> is basically a timing thing; it depends on the visibility guarantee that
>>> the backing infrastructure provides. So far it _usually_ worked, but
>>> these are naturally things that may change over time.
>>>
>>> On 18/06/2019 09:20, Jeff Zhang wrote:
>>>> If it is travis caching issue, we can file apache infra ticket and ask
>>> them
>>>> to clean the cache.
>>>>
>>>>
>>>>
>>>> Chesnay Schepler <ch...@apache.org> 于2019年6月18日周二 下午3:18写道:
>>>>
>>>>> This is (hopefully a short-lived) hiccup on the Travis caching
>>>>> infrastructure.
>>>>>
>>>>> There's nothing we can do to _fix_ it; if it persists we'll have to
>>>>> rework our travis setup again to not rely on caching.
>>>>>
>>>>> On 18/06/2019 08:34, Kurt Young wrote:
>>>>>> Hi dev,
>>>>>>
>>>>>> I noticed that all the travis tests triggered by pull request are
>>> failed
>>>>>> with the same error:
>>>>>>
>>>>>> "Cached flink dir /home/travis/flink_cache/xxxxx/flink does not exist.
>>>>>> Exiting build."
>>>>>>
>>>>>> Anyone have a clue on what happened and how to fix this?
>>>>>>
>>>>>> Best,
>>>>>> Kurt
>>>>>>
>>>


Re: Something wrong with travis?

Posted by jincheng sun <su...@gmail.com>.
Test result:
 - The test for only compile state are succeeding (I deleted some old
caches) cache size 1146.26M. See here
https://travis-ci.org/sunjincheng121/flink/caches
- timeout to 1200 test fail, get the same error, but I think maybe the
storage problem, so I delete more old cache and restart the CI. See here
https://travis-ci.org/apache/flink/builds/547136163

So now it feels like the storage size of the cache is limited. If so we can
add some cleanup logic for the old cache (I am not sure,some validation is
needed)

Best
Jincheng

jincheng sun <su...@gmail.com> 于2019年6月18日周二 下午6:00写道:

> I agree with the explanation from @Chesnay Schepler <ch...@apache.org>.  this
> should be a problem with the Travis infrastructure because recently we have
> not big changed the logic of Travis inside Flink.
> At present, most of the failures are after the compile is completed. The
> cache size is only 7.7M, which means that the JARs are not successfully
> uploaded.
>
> So here is a question:
>  - Where can we check the cache storage to see if there is a problem with
> the storage?
>
> In order to try to find out some reason for the CI issue,  I do the
> follows test:
>
>  - I delete other test phases locally and test them - Test whether the
> cache is uploaded normally during the compilation phase. See here
> https://travis-ci.org/sunjincheng121/flink/builds/547155029
>  - Increase Travis cache timeout to 1200 - Test the cache cannot be
> downloaded due to cache is a timeout. (I think this test will have the same
> result ) See here https://travis-ci.org/apache/flink/builds/547136163
>
> Will feedback here after testing.
>
> Best,
> Jincheng
>
> Chesnay Schepler <ch...@apache.org> 于2019年6月18日周二 下午3:53写道:
>
>> The problem is not that bad stuff is in the cache (which is the only
>> thing a cache cleaning solves), it is that the test stages don't
>> download the correct one.
>>
>> Our compile stage uploads stuff in to the cache, and the subsequent test
>> builds downloads it again.
>>
>> Whether the upload from the compile phase is visible to the test phase
>> is basically a timing thing; it depends on the visibility guarantee that
>> the backing infrastructure provides. So far it _usually_ worked, but
>> these are naturally things that may change over time.
>>
>> On 18/06/2019 09:20, Jeff Zhang wrote:
>> > If it is travis caching issue, we can file apache infra ticket and ask
>> them
>> > to clean the cache.
>> >
>> >
>> >
>> > Chesnay Schepler <ch...@apache.org> 于2019年6月18日周二 下午3:18写道:
>> >
>> >> This is (hopefully a short-lived) hiccup on the Travis caching
>> >> infrastructure.
>> >>
>> >> There's nothing we can do to _fix_ it; if it persists we'll have to
>> >> rework our travis setup again to not rely on caching.
>> >>
>> >> On 18/06/2019 08:34, Kurt Young wrote:
>> >>> Hi dev,
>> >>>
>> >>> I noticed that all the travis tests triggered by pull request are
>> failed
>> >>> with the same error:
>> >>>
>> >>> "Cached flink dir /home/travis/flink_cache/xxxxx/flink does not exist.
>> >>> Exiting build."
>> >>>
>> >>> Anyone have a clue on what happened and how to fix this?
>> >>>
>> >>> Best,
>> >>> Kurt
>> >>>
>> >>
>>
>>

Re: Something wrong with travis?

Posted by jincheng sun <su...@gmail.com>.
I agree with the explanation from @Chesnay Schepler <ch...@apache.org>.  this
should be a problem with the Travis infrastructure because recently we have
not big changed the logic of Travis inside Flink.
At present, most of the failures are after the compile is completed. The
cache size is only 7.7M, which means that the JARs are not successfully
uploaded.

So here is a question:
 - Where can we check the cache storage to see if there is a problem with
the storage?

In order to try to find out some reason for the CI issue,  I do the follows
test:

 - I delete other test phases locally and test them - Test whether the
cache is uploaded normally during the compilation phase. See here
https://travis-ci.org/sunjincheng121/flink/builds/547155029
 - Increase Travis cache timeout to 1200 - Test the cache cannot be
downloaded due to cache is a timeout. (I think this test will have the same
result ) See here https://travis-ci.org/apache/flink/builds/547136163

Will feedback here after testing.

Best,
Jincheng

Chesnay Schepler <ch...@apache.org> 于2019年6月18日周二 下午3:53写道:

> The problem is not that bad stuff is in the cache (which is the only
> thing a cache cleaning solves), it is that the test stages don't
> download the correct one.
>
> Our compile stage uploads stuff in to the cache, and the subsequent test
> builds downloads it again.
>
> Whether the upload from the compile phase is visible to the test phase
> is basically a timing thing; it depends on the visibility guarantee that
> the backing infrastructure provides. So far it _usually_ worked, but
> these are naturally things that may change over time.
>
> On 18/06/2019 09:20, Jeff Zhang wrote:
> > If it is travis caching issue, we can file apache infra ticket and ask
> them
> > to clean the cache.
> >
> >
> >
> > Chesnay Schepler <ch...@apache.org> 于2019年6月18日周二 下午3:18写道:
> >
> >> This is (hopefully a short-lived) hiccup on the Travis caching
> >> infrastructure.
> >>
> >> There's nothing we can do to _fix_ it; if it persists we'll have to
> >> rework our travis setup again to not rely on caching.
> >>
> >> On 18/06/2019 08:34, Kurt Young wrote:
> >>> Hi dev,
> >>>
> >>> I noticed that all the travis tests triggered by pull request are
> failed
> >>> with the same error:
> >>>
> >>> "Cached flink dir /home/travis/flink_cache/xxxxx/flink does not exist.
> >>> Exiting build."
> >>>
> >>> Anyone have a clue on what happened and how to fix this?
> >>>
> >>> Best,
> >>> Kurt
> >>>
> >>
>
>

Re: Something wrong with travis?

Posted by Chesnay Schepler <ch...@apache.org>.
The problem is not that bad stuff is in the cache (which is the only 
thing a cache cleaning solves), it is that the test stages don't 
download the correct one.

Our compile stage uploads stuff in to the cache, and the subsequent test 
builds downloads it again.

Whether the upload from the compile phase is visible to the test phase 
is basically a timing thing; it depends on the visibility guarantee that 
the backing infrastructure provides. So far it _usually_ worked, but 
these are naturally things that may change over time.

On 18/06/2019 09:20, Jeff Zhang wrote:
> If it is travis caching issue, we can file apache infra ticket and ask them
> to clean the cache.
>
>
>
> Chesnay Schepler <ch...@apache.org> 于2019年6月18日周二 下午3:18写道:
>
>> This is (hopefully a short-lived) hiccup on the Travis caching
>> infrastructure.
>>
>> There's nothing we can do to _fix_ it; if it persists we'll have to
>> rework our travis setup again to not rely on caching.
>>
>> On 18/06/2019 08:34, Kurt Young wrote:
>>> Hi dev,
>>>
>>> I noticed that all the travis tests triggered by pull request are failed
>>> with the same error:
>>>
>>> "Cached flink dir /home/travis/flink_cache/xxxxx/flink does not exist.
>>> Exiting build."
>>>
>>> Anyone have a clue on what happened and how to fix this?
>>>
>>> Best,
>>> Kurt
>>>
>>


Re: Something wrong with travis?

Posted by Jeff Zhang <zj...@gmail.com>.
If it is travis caching issue, we can file apache infra ticket and ask them
to clean the cache.



Chesnay Schepler <ch...@apache.org> 于2019年6月18日周二 下午3:18写道:

> This is (hopefully a short-lived) hiccup on the Travis caching
> infrastructure.
>
> There's nothing we can do to _fix_ it; if it persists we'll have to
> rework our travis setup again to not rely on caching.
>
> On 18/06/2019 08:34, Kurt Young wrote:
> > Hi dev,
> >
> > I noticed that all the travis tests triggered by pull request are failed
> > with the same error:
> >
> > "Cached flink dir /home/travis/flink_cache/xxxxx/flink does not exist.
> > Exiting build."
> >
> > Anyone have a clue on what happened and how to fix this?
> >
> > Best,
> > Kurt
> >
>
>

-- 
Best Regards

Jeff Zhang

Re: Something wrong with travis?

Posted by Chesnay Schepler <ch...@apache.org>.
This is (hopefully a short-lived) hiccup on the Travis caching 
infrastructure.

There's nothing we can do to _fix_ it; if it persists we'll have to 
rework our travis setup again to not rely on caching.

On 18/06/2019 08:34, Kurt Young wrote:
> Hi dev,
>
> I noticed that all the travis tests triggered by pull request are failed
> with the same error:
>
> "Cached flink dir /home/travis/flink_cache/xxxxx/flink does not exist.
> Exiting build."
>
> Anyone have a clue on what happened and how to fix this?
>
> Best,
> Kurt
>