You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Sean Owen <sr...@gmail.com> on 2018/07/15 20:51:15 UTC

Cleaning Spark releases from mirrors, and the flakiness of HiveExternalCatalogVersionsSuite

Yesterday I cleaned out old Spark releases from the mirror system -- we're
supposed to only keep the latest release from active branches out on
mirrors. (All releases are available from the Apache archive site.)

Having done so I realized quickly that the HiveExternalCatalogVersionsSuite
relies on the versions it downloads being available from mirrors. It has
been flaky, as sometimes mirrors are unreliable. I think now it will not
work for any versions except 2.3.1, 2.2.2, 2.1.3.

Because we do need to clean those releases out of the mirrors soon anyway,
and because they're flaky sometimes, I propose adding logic to the test to
fall back on downloading from the Apache archive site.

... and I'll do that right away to unblock HiveExternalCatalogVersionsSuite
runs. I think it needs to be backported to other branches as they will
still be testing against potentially non-current Spark releases.

Sean

Re: Cleaning Spark releases from mirrors, and the flakiness of HiveExternalCatalogVersionsSuite

Posted by Mark Hamstra <ma...@clearstorydata.com>.
Yeah, I was mostly thinking that, if the normal Spark PR tests were setup
to check the sigs (every time? some of the time?), then this could serve as
an automatic check that nothing funny has been done to the archives. There
shouldn't be any difference between the cache and the archive; but if there
ever is, then we may well have a serious security problem.

On Thu, Jul 19, 2018 at 12:41 PM Sean Owen <sr...@apache.org> wrote:

> Yeah if the test code keeps around the archive and/or digest of what it
> unpacked. A release should never be modified though, so highly rare.
>
> If the worry is hacked mirrors then we might have bigger problems, but
> there the issue is verifying the download sigs in the first place. Those
> would have to come from archive.apache.org.
>
> If you're up for it, yes that could be a fine security precaution.
>
> On Thu, Jul 19, 2018, 2:11 PM Mark Hamstra <ma...@clearstorydata.com>
> wrote:
>
>> Is there or should there be some checking of digests just to make sure
>> that we are really testing against the same thing in /tmp/test-spark that
>> we are distributing from the archive?
>>
>> On Thu, Jul 19, 2018 at 11:15 AM Sean Owen <sr...@apache.org> wrote:
>>
>>> Ideally, that list is updated with each release, yes. Non-current
>>> releases will now always download from archive.apache.org though. But
>>> we run into rate-limiting problems if that gets pinged too much. So yes
>>> good to keep the list only to current branches.
>>>
>>> It looks like the download is cached in /tmp/test-spark, for what it's
>>> worth.
>>>
>>> On Thu, Jul 19, 2018 at 11:06 AM Felix Cheung <fe...@hotmail.com>
>>> wrote:
>>>
>>>> +1 this has been problematic.
>>>>
>>>> Also, this list needs to be updated every time we make a new release?
>>>>
>>>> Plus can we cache them on Jenkins, maybe we can avoid downloading the
>>>> same thing from Apache archive every test run.
>>>>
>>>>
>>>> ------------------------------
>>>> *From:* Marco Gaido <ma...@gmail.com>
>>>> *Sent:* Monday, July 16, 2018 11:12 PM
>>>> *To:* Hyukjin Kwon
>>>> *Cc:* Sean Owen; dev
>>>> *Subject:* Re: Cleaning Spark releases from mirrors, and the flakiness
>>>> of HiveExternalCatalogVersionsSuite
>>>>
>>>> +1 too
>>>>
>>>> On Tue, 17 Jul 2018, 05:38 Hyukjin Kwon, <gu...@gmail.com> wrote:
>>>>
>>>>> +1
>>>>>
>>>>> 2018년 7월 17일 (화) 오전 7:34, Sean Owen <sr...@apache.org>님이 작성:
>>>>>
>>>>>> Fix is committed to branches back through 2.2.x, where this test was
>>>>>> added.
>>>>>>
>>>>>> There is still some issue; I'm seeing that archive.apache.org is
>>>>>> rate-limiting downloads and frequently returning 503 errors.
>>>>>>
>>>>>> We can help, I guess, by avoiding testing against non-current
>>>>>> releases. Right now we should be testing against 2.3.1, 2.2.2, 2.1.3,
>>>>>> right? 2.0.x is now effectively EOL right?
>>>>>>
>>>>>> I can make that quick change too if everyone's amenable, in order to
>>>>>> prevent more failures in this test from master.
>>>>>>
>>>>>> On Sun, Jul 15, 2018 at 3:51 PM Sean Owen <sr...@gmail.com> wrote:
>>>>>>
>>>>>>> Yesterday I cleaned out old Spark releases from the mirror system --
>>>>>>> we're supposed to only keep the latest release from active branches out on
>>>>>>> mirrors. (All releases are available from the Apache archive site.)
>>>>>>>
>>>>>>> Having done so I realized quickly that the
>>>>>>> HiveExternalCatalogVersionsSuite relies on the versions it downloads being
>>>>>>> available from mirrors. It has been flaky, as sometimes mirrors are
>>>>>>> unreliable. I think now it will not work for any versions except 2.3.1,
>>>>>>> 2.2.2, 2.1.3.
>>>>>>>
>>>>>>> Because we do need to clean those releases out of the mirrors soon
>>>>>>> anyway, and because they're flaky sometimes, I propose adding logic to the
>>>>>>> test to fall back on downloading from the Apache archive site.
>>>>>>>
>>>>>>> ... and I'll do that right away to unblock
>>>>>>> HiveExternalCatalogVersionsSuite runs. I think it needs to be backported to
>>>>>>> other branches as they will still be testing against potentially
>>>>>>> non-current Spark releases.
>>>>>>>
>>>>>>> Sean
>>>>>>>
>>>>>>

Re: Cleaning Spark releases from mirrors, and the flakiness of HiveExternalCatalogVersionsSuite

Posted by Sean Owen <sr...@apache.org>.
Yeah if the test code keeps around the archive and/or digest of what it
unpacked. A release should never be modified though, so highly rare.

If the worry is hacked mirrors then we might have bigger problems, but
there the issue is verifying the download sigs in the first place. Those
would have to come from archive.apache.org.

If you're up for it, yes that could be a fine security precaution.

On Thu, Jul 19, 2018, 2:11 PM Mark Hamstra <ma...@clearstorydata.com> wrote:

> Is there or should there be some checking of digests just to make sure
> that we are really testing against the same thing in /tmp/test-spark that
> we are distributing from the archive?
>
> On Thu, Jul 19, 2018 at 11:15 AM Sean Owen <sr...@apache.org> wrote:
>
>> Ideally, that list is updated with each release, yes. Non-current
>> releases will now always download from archive.apache.org though. But we
>> run into rate-limiting problems if that gets pinged too much. So yes good
>> to keep the list only to current branches.
>>
>> It looks like the download is cached in /tmp/test-spark, for what it's
>> worth.
>>
>> On Thu, Jul 19, 2018 at 11:06 AM Felix Cheung <fe...@hotmail.com>
>> wrote:
>>
>>> +1 this has been problematic.
>>>
>>> Also, this list needs to be updated every time we make a new release?
>>>
>>> Plus can we cache them on Jenkins, maybe we can avoid downloading the
>>> same thing from Apache archive every test run.
>>>
>>>
>>> ------------------------------
>>> *From:* Marco Gaido <ma...@gmail.com>
>>> *Sent:* Monday, July 16, 2018 11:12 PM
>>> *To:* Hyukjin Kwon
>>> *Cc:* Sean Owen; dev
>>> *Subject:* Re: Cleaning Spark releases from mirrors, and the flakiness
>>> of HiveExternalCatalogVersionsSuite
>>>
>>> +1 too
>>>
>>> On Tue, 17 Jul 2018, 05:38 Hyukjin Kwon, <gu...@gmail.com> wrote:
>>>
>>>> +1
>>>>
>>>> 2018년 7월 17일 (화) 오전 7:34, Sean Owen <sr...@apache.org>님이 작성:
>>>>
>>>>> Fix is committed to branches back through 2.2.x, where this test was
>>>>> added.
>>>>>
>>>>> There is still some issue; I'm seeing that archive.apache.org is
>>>>> rate-limiting downloads and frequently returning 503 errors.
>>>>>
>>>>> We can help, I guess, by avoiding testing against non-current
>>>>> releases. Right now we should be testing against 2.3.1, 2.2.2, 2.1.3,
>>>>> right? 2.0.x is now effectively EOL right?
>>>>>
>>>>> I can make that quick change too if everyone's amenable, in order to
>>>>> prevent more failures in this test from master.
>>>>>
>>>>> On Sun, Jul 15, 2018 at 3:51 PM Sean Owen <sr...@gmail.com> wrote:
>>>>>
>>>>>> Yesterday I cleaned out old Spark releases from the mirror system --
>>>>>> we're supposed to only keep the latest release from active branches out on
>>>>>> mirrors. (All releases are available from the Apache archive site.)
>>>>>>
>>>>>> Having done so I realized quickly that the
>>>>>> HiveExternalCatalogVersionsSuite relies on the versions it downloads being
>>>>>> available from mirrors. It has been flaky, as sometimes mirrors are
>>>>>> unreliable. I think now it will not work for any versions except 2.3.1,
>>>>>> 2.2.2, 2.1.3.
>>>>>>
>>>>>> Because we do need to clean those releases out of the mirrors soon
>>>>>> anyway, and because they're flaky sometimes, I propose adding logic to the
>>>>>> test to fall back on downloading from the Apache archive site.
>>>>>>
>>>>>> ... and I'll do that right away to unblock
>>>>>> HiveExternalCatalogVersionsSuite runs. I think it needs to be backported to
>>>>>> other branches as they will still be testing against potentially
>>>>>> non-current Spark releases.
>>>>>>
>>>>>> Sean
>>>>>>
>>>>>

Re: Cleaning Spark releases from mirrors, and the flakiness of HiveExternalCatalogVersionsSuite

Posted by Mark Hamstra <ma...@clearstorydata.com>.
Is there or should there be some checking of digests just to make sure that
we are really testing against the same thing in /tmp/test-spark that we are
distributing from the archive?

On Thu, Jul 19, 2018 at 11:15 AM Sean Owen <sr...@apache.org> wrote:

> Ideally, that list is updated with each release, yes. Non-current releases
> will now always download from archive.apache.org though. But we run into
> rate-limiting problems if that gets pinged too much. So yes good to keep
> the list only to current branches.
>
> It looks like the download is cached in /tmp/test-spark, for what it's
> worth.
>
> On Thu, Jul 19, 2018 at 11:06 AM Felix Cheung <fe...@hotmail.com>
> wrote:
>
>> +1 this has been problematic.
>>
>> Also, this list needs to be updated every time we make a new release?
>>
>> Plus can we cache them on Jenkins, maybe we can avoid downloading the
>> same thing from Apache archive every test run.
>>
>>
>> ------------------------------
>> *From:* Marco Gaido <ma...@gmail.com>
>> *Sent:* Monday, July 16, 2018 11:12 PM
>> *To:* Hyukjin Kwon
>> *Cc:* Sean Owen; dev
>> *Subject:* Re: Cleaning Spark releases from mirrors, and the flakiness
>> of HiveExternalCatalogVersionsSuite
>>
>> +1 too
>>
>> On Tue, 17 Jul 2018, 05:38 Hyukjin Kwon, <gu...@gmail.com> wrote:
>>
>>> +1
>>>
>>> 2018년 7월 17일 (화) 오전 7:34, Sean Owen <sr...@apache.org>님이 작성:
>>>
>>>> Fix is committed to branches back through 2.2.x, where this test was
>>>> added.
>>>>
>>>> There is still some issue; I'm seeing that archive.apache.org is
>>>> rate-limiting downloads and frequently returning 503 errors.
>>>>
>>>> We can help, I guess, by avoiding testing against non-current releases.
>>>> Right now we should be testing against 2.3.1, 2.2.2, 2.1.3, right? 2.0.x is
>>>> now effectively EOL right?
>>>>
>>>> I can make that quick change too if everyone's amenable, in order to
>>>> prevent more failures in this test from master.
>>>>
>>>> On Sun, Jul 15, 2018 at 3:51 PM Sean Owen <sr...@gmail.com> wrote:
>>>>
>>>>> Yesterday I cleaned out old Spark releases from the mirror system --
>>>>> we're supposed to only keep the latest release from active branches out on
>>>>> mirrors. (All releases are available from the Apache archive site.)
>>>>>
>>>>> Having done so I realized quickly that the
>>>>> HiveExternalCatalogVersionsSuite relies on the versions it downloads being
>>>>> available from mirrors. It has been flaky, as sometimes mirrors are
>>>>> unreliable. I think now it will not work for any versions except 2.3.1,
>>>>> 2.2.2, 2.1.3.
>>>>>
>>>>> Because we do need to clean those releases out of the mirrors soon
>>>>> anyway, and because they're flaky sometimes, I propose adding logic to the
>>>>> test to fall back on downloading from the Apache archive site.
>>>>>
>>>>> ... and I'll do that right away to unblock
>>>>> HiveExternalCatalogVersionsSuite runs. I think it needs to be backported to
>>>>> other branches as they will still be testing against potentially
>>>>> non-current Spark releases.
>>>>>
>>>>> Sean
>>>>>
>>>>

Re: Cleaning Spark releases from mirrors, and the flakiness of HiveExternalCatalogVersionsSuite

Posted by Sean Owen <sr...@apache.org>.
Ideally, that list is updated with each release, yes. Non-current releases
will now always download from archive.apache.org though. But we run into
rate-limiting problems if that gets pinged too much. So yes good to keep
the list only to current branches.

It looks like the download is cached in /tmp/test-spark, for what it's
worth.

On Thu, Jul 19, 2018 at 11:06 AM Felix Cheung <fe...@hotmail.com>
wrote:

> +1 this has been problematic.
>
> Also, this list needs to be updated every time we make a new release?
>
> Plus can we cache them on Jenkins, maybe we can avoid downloading the same
> thing from Apache archive every test run.
>
>
> ------------------------------
> *From:* Marco Gaido <ma...@gmail.com>
> *Sent:* Monday, July 16, 2018 11:12 PM
> *To:* Hyukjin Kwon
> *Cc:* Sean Owen; dev
> *Subject:* Re: Cleaning Spark releases from mirrors, and the flakiness of
> HiveExternalCatalogVersionsSuite
>
> +1 too
>
> On Tue, 17 Jul 2018, 05:38 Hyukjin Kwon, <gu...@gmail.com> wrote:
>
>> +1
>>
>> 2018년 7월 17일 (화) 오전 7:34, Sean Owen <sr...@apache.org>님이 작성:
>>
>>> Fix is committed to branches back through 2.2.x, where this test was
>>> added.
>>>
>>> There is still some issue; I'm seeing that archive.apache.org is
>>> rate-limiting downloads and frequently returning 503 errors.
>>>
>>> We can help, I guess, by avoiding testing against non-current releases.
>>> Right now we should be testing against 2.3.1, 2.2.2, 2.1.3, right? 2.0.x is
>>> now effectively EOL right?
>>>
>>> I can make that quick change too if everyone's amenable, in order to
>>> prevent more failures in this test from master.
>>>
>>> On Sun, Jul 15, 2018 at 3:51 PM Sean Owen <sr...@gmail.com> wrote:
>>>
>>>> Yesterday I cleaned out old Spark releases from the mirror system --
>>>> we're supposed to only keep the latest release from active branches out on
>>>> mirrors. (All releases are available from the Apache archive site.)
>>>>
>>>> Having done so I realized quickly that the
>>>> HiveExternalCatalogVersionsSuite relies on the versions it downloads being
>>>> available from mirrors. It has been flaky, as sometimes mirrors are
>>>> unreliable. I think now it will not work for any versions except 2.3.1,
>>>> 2.2.2, 2.1.3.
>>>>
>>>> Because we do need to clean those releases out of the mirrors soon
>>>> anyway, and because they're flaky sometimes, I propose adding logic to the
>>>> test to fall back on downloading from the Apache archive site.
>>>>
>>>> ... and I'll do that right away to unblock
>>>> HiveExternalCatalogVersionsSuite runs. I think it needs to be backported to
>>>> other branches as they will still be testing against potentially
>>>> non-current Spark releases.
>>>>
>>>> Sean
>>>>
>>>

Re: Cleaning Spark releases from mirrors, and the flakiness of HiveExternalCatalogVersionsSuite

Posted by Felix Cheung <fe...@hotmail.com>.
+1 this has been problematic.

Also, this list needs to be updated every time we make a new release?

Plus can we cache them on Jenkins, maybe we can avoid downloading the same thing from Apache archive every test run.


________________________________
From: Marco Gaido <ma...@gmail.com>
Sent: Monday, July 16, 2018 11:12 PM
To: Hyukjin Kwon
Cc: Sean Owen; dev
Subject: Re: Cleaning Spark releases from mirrors, and the flakiness of HiveExternalCatalogVersionsSuite

+1 too

On Tue, 17 Jul 2018, 05:38 Hyukjin Kwon, <gu...@gmail.com>> wrote:
+1

2018년 7월 17일 (화) 오전 7:34, Sean Owen <sr...@apache.org>>님이 작성:
Fix is committed to branches back through 2.2.x, where this test was added.

There is still some issue; I'm seeing that archive.apache.org<http://archive.apache.org> is rate-limiting downloads and frequently returning 503 errors.

We can help, I guess, by avoiding testing against non-current releases. Right now we should be testing against 2.3.1, 2.2.2, 2.1.3, right? 2.0.x is now effectively EOL right?

I can make that quick change too if everyone's amenable, in order to prevent more failures in this test from master.

On Sun, Jul 15, 2018 at 3:51 PM Sean Owen <sr...@gmail.com>> wrote:
Yesterday I cleaned out old Spark releases from the mirror system -- we're supposed to only keep the latest release from active branches out on mirrors. (All releases are available from the Apache archive site.)

Having done so I realized quickly that the HiveExternalCatalogVersionsSuite relies on the versions it downloads being available from mirrors. It has been flaky, as sometimes mirrors are unreliable. I think now it will not work for any versions except 2.3.1, 2.2.2, 2.1.3.

Because we do need to clean those releases out of the mirrors soon anyway, and because they're flaky sometimes, I propose adding logic to the test to fall back on downloading from the Apache archive site.

... and I'll do that right away to unblock HiveExternalCatalogVersionsSuite runs. I think it needs to be backported to other branches as they will still be testing against potentially non-current Spark releases.

Sean

Re: Cleaning Spark releases from mirrors, and the flakiness of HiveExternalCatalogVersionsSuite

Posted by Marco Gaido <ma...@gmail.com>.
+1 too

On Tue, 17 Jul 2018, 05:38 Hyukjin Kwon, <gu...@gmail.com> wrote:

> +1
>
> 2018년 7월 17일 (화) 오전 7:34, Sean Owen <sr...@apache.org>님이 작성:
>
>> Fix is committed to branches back through 2.2.x, where this test was
>> added.
>>
>> There is still some issue; I'm seeing that archive.apache.org is
>> rate-limiting downloads and frequently returning 503 errors.
>>
>> We can help, I guess, by avoiding testing against non-current releases.
>> Right now we should be testing against 2.3.1, 2.2.2, 2.1.3, right? 2.0.x is
>> now effectively EOL right?
>>
>> I can make that quick change too if everyone's amenable, in order to
>> prevent more failures in this test from master.
>>
>> On Sun, Jul 15, 2018 at 3:51 PM Sean Owen <sr...@gmail.com> wrote:
>>
>>> Yesterday I cleaned out old Spark releases from the mirror system --
>>> we're supposed to only keep the latest release from active branches out on
>>> mirrors. (All releases are available from the Apache archive site.)
>>>
>>> Having done so I realized quickly that the
>>> HiveExternalCatalogVersionsSuite relies on the versions it downloads being
>>> available from mirrors. It has been flaky, as sometimes mirrors are
>>> unreliable. I think now it will not work for any versions except 2.3.1,
>>> 2.2.2, 2.1.3.
>>>
>>> Because we do need to clean those releases out of the mirrors soon
>>> anyway, and because they're flaky sometimes, I propose adding logic to the
>>> test to fall back on downloading from the Apache archive site.
>>>
>>> ... and I'll do that right away to unblock
>>> HiveExternalCatalogVersionsSuite runs. I think it needs to be backported to
>>> other branches as they will still be testing against potentially
>>> non-current Spark releases.
>>>
>>> Sean
>>>
>>

Re: Cleaning Spark releases from mirrors, and the flakiness of HiveExternalCatalogVersionsSuite

Posted by Hyukjin Kwon <gu...@gmail.com>.
+1

2018년 7월 17일 (화) 오전 7:34, Sean Owen <sr...@apache.org>님이 작성:

> Fix is committed to branches back through 2.2.x, where this test was added.
>
> There is still some issue; I'm seeing that archive.apache.org is
> rate-limiting downloads and frequently returning 503 errors.
>
> We can help, I guess, by avoiding testing against non-current releases.
> Right now we should be testing against 2.3.1, 2.2.2, 2.1.3, right? 2.0.x is
> now effectively EOL right?
>
> I can make that quick change too if everyone's amenable, in order to
> prevent more failures in this test from master.
>
> On Sun, Jul 15, 2018 at 3:51 PM Sean Owen <sr...@gmail.com> wrote:
>
>> Yesterday I cleaned out old Spark releases from the mirror system --
>> we're supposed to only keep the latest release from active branches out on
>> mirrors. (All releases are available from the Apache archive site.)
>>
>> Having done so I realized quickly that the
>> HiveExternalCatalogVersionsSuite relies on the versions it downloads being
>> available from mirrors. It has been flaky, as sometimes mirrors are
>> unreliable. I think now it will not work for any versions except 2.3.1,
>> 2.2.2, 2.1.3.
>>
>> Because we do need to clean those releases out of the mirrors soon
>> anyway, and because they're flaky sometimes, I propose adding logic to the
>> test to fall back on downloading from the Apache archive site.
>>
>> ... and I'll do that right away to unblock
>> HiveExternalCatalogVersionsSuite runs. I think it needs to be backported to
>> other branches as they will still be testing against potentially
>> non-current Spark releases.
>>
>> Sean
>>
>

Re: Cleaning Spark releases from mirrors, and the flakiness of HiveExternalCatalogVersionsSuite

Posted by Sean Owen <sr...@apache.org>.
Fix is committed to branches back through 2.2.x, where this test was added.

There is still some issue; I'm seeing that archive.apache.org is
rate-limiting downloads and frequently returning 503 errors.

We can help, I guess, by avoiding testing against non-current releases.
Right now we should be testing against 2.3.1, 2.2.2, 2.1.3, right? 2.0.x is
now effectively EOL right?

I can make that quick change too if everyone's amenable, in order to
prevent more failures in this test from master.

On Sun, Jul 15, 2018 at 3:51 PM Sean Owen <sr...@gmail.com> wrote:

> Yesterday I cleaned out old Spark releases from the mirror system -- we're
> supposed to only keep the latest release from active branches out on
> mirrors. (All releases are available from the Apache archive site.)
>
> Having done so I realized quickly that the
> HiveExternalCatalogVersionsSuite relies on the versions it downloads being
> available from mirrors. It has been flaky, as sometimes mirrors are
> unreliable. I think now it will not work for any versions except 2.3.1,
> 2.2.2, 2.1.3.
>
> Because we do need to clean those releases out of the mirrors soon anyway,
> and because they're flaky sometimes, I propose adding logic to the test to
> fall back on downloading from the Apache archive site.
>
> ... and I'll do that right away to unblock
> HiveExternalCatalogVersionsSuite runs. I think it needs to be backported to
> other branches as they will still be testing against potentially
> non-current Spark releases.
>
> Sean
>

Re: Cleaning Spark releases from mirrors, and the flakiness of HiveExternalCatalogVersionsSuite

Posted by Marcelo Vanzin <va...@cloudera.com.INVALID>.
On this topic... when I worked on 2.3.1 and caused this breakage by
deleting and old release, I tried to write some code to make this more
automatic:

https://github.com/vanzin/spark/tree/SPARK-24532

I just found that the code was a little too large and hacky for what
it does (find out the latest releases on each branch). But maybe it
would be worth to do that?

In any case, agree with Mark that checking signatures would be good, eventually.


On Sun, Jul 15, 2018 at 1:51 PM, Sean Owen <sr...@gmail.com> wrote:
> Yesterday I cleaned out old Spark releases from the mirror system -- we're
> supposed to only keep the latest release from active branches out on
> mirrors. (All releases are available from the Apache archive site.)
>
> Having done so I realized quickly that the HiveExternalCatalogVersionsSuite
> relies on the versions it downloads being available from mirrors. It has
> been flaky, as sometimes mirrors are unreliable. I think now it will not
> work for any versions except 2.3.1, 2.2.2, 2.1.3.
>
> Because we do need to clean those releases out of the mirrors soon anyway,
> and because they're flaky sometimes, I propose adding logic to the test to
> fall back on downloading from the Apache archive site.
>
> ... and I'll do that right away to unblock HiveExternalCatalogVersionsSuite
> runs. I think it needs to be backported to other branches as they will still
> be testing against potentially non-current Spark releases.
>
> Sean



-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: Cleaning Spark releases from mirrors, and the flakiness of HiveExternalCatalogVersionsSuite

Posted by Marco Gaido <ma...@gmail.com>.
+1, this was indeed a problem in the past.

On Sun, 15 Jul 2018, 22:56 Reynold Xin, <rx...@databricks.com> wrote:

> Makes sense. Thanks for looking into this.
>
> On Sun, Jul 15, 2018 at 1:51 PM Sean Owen <sr...@gmail.com> wrote:
>
>> Yesterday I cleaned out old Spark releases from the mirror system --
>> we're supposed to only keep the latest release from active branches out on
>> mirrors. (All releases are available from the Apache archive site.)
>>
>> Having done so I realized quickly that the
>> HiveExternalCatalogVersionsSuite relies on the versions it downloads being
>> available from mirrors. It has been flaky, as sometimes mirrors are
>> unreliable. I think now it will not work for any versions except 2.3.1,
>> 2.2.2, 2.1.3.
>>
>> Because we do need to clean those releases out of the mirrors soon
>> anyway, and because they're flaky sometimes, I propose adding logic to the
>> test to fall back on downloading from the Apache archive site.
>>
>> ... and I'll do that right away to unblock
>> HiveExternalCatalogVersionsSuite runs. I think it needs to be backported to
>> other branches as they will still be testing against potentially
>> non-current Spark releases.
>>
>> Sean
>>
>

Re: Cleaning Spark releases from mirrors, and the flakiness of HiveExternalCatalogVersionsSuite

Posted by Reynold Xin <rx...@databricks.com>.
Makes sense. Thanks for looking into this.

On Sun, Jul 15, 2018 at 1:51 PM Sean Owen <sr...@gmail.com> wrote:

> Yesterday I cleaned out old Spark releases from the mirror system -- we're
> supposed to only keep the latest release from active branches out on
> mirrors. (All releases are available from the Apache archive site.)
>
> Having done so I realized quickly that the
> HiveExternalCatalogVersionsSuite relies on the versions it downloads being
> available from mirrors. It has been flaky, as sometimes mirrors are
> unreliable. I think now it will not work for any versions except 2.3.1,
> 2.2.2, 2.1.3.
>
> Because we do need to clean those releases out of the mirrors soon anyway,
> and because they're flaky sometimes, I propose adding logic to the test to
> fall back on downloading from the Apache archive site.
>
> ... and I'll do that right away to unblock
> HiveExternalCatalogVersionsSuite runs. I think it needs to be backported to
> other branches as they will still be testing against potentially
> non-current Spark releases.
>
> Sean
>