You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Nicholas Chammas <ni...@gmail.com> on 2015/11/01 04:17:52 UTC

Downloading Hadoop from s3://spark-related-packages/

https://s3.amazonaws.com/spark-related-packages/

spark-ec2 uses this bucket to download and install HDFS on clusters. Is it
owned by the Spark project or by the AMPLab?

Anyway, it looks like the latest Hadoop install available on there is
Hadoop 2.4.0.

Are there plans to add newer versions of Hadoop for use by spark-ec2 and
similar tools, or should we just be getting that stuff via an Apache mirror
<http://hadoop.apache.org/releases.html>? The latest version is 2.7.1, by
the way.

The problem with the Apache mirrors, if I am not mistaken, is that you
cannot use a single URL that automatically redirects you to a working
mirror to download Hadoop. You have to pick a specific mirror and pray it
doesn't disappear tomorrow.

Nick

Re: Downloading Hadoop from s3://spark-related-packages/

Posted by Nicholas Chammas <ni...@gmail.com>.

not that likely to get an answer as it’s really a support call, not a
bug/task.

The first question is about proper documentation of all the stuff we’ve
been discussing in this thread, so one would think that’s a valid task. It
doesn’t seem right that closer.lua, for example, is undocumented. Either
it’s not meant for public use (and I am not an intended user), or there
should be something out there that explains how to use it.

I’m not looking for much; just some basic info that covers the various
things I’ve had to piece together from mailing lists and Google.

there’s no mirroring, if you install to lots of machines your download time
will be slow. You could automate it though, do something like D/L, upload
to your own bucket, do an s3 GET.

Yeah, this is what I’m probably going to do eventually—just use my own S3
bucket.

It’s disappointing that, at least as far as I can tell, the Apache
foundation doesn’t have a fast CDN or something like that to serve its
files. So users like me are left needing to come up with their own solution
if they regularly download Apache software to many machines in an automated
fashion.

Now, perhaps Apache mirrors are not meant to be used in this way. Perhaps
they’re just meant for people to do the one-off download to their personal
machines and that’s it. That’s totally fine! But that goes back to my first
question from the ticket—there should be a simple doc that spells this out
for us if that’s the case: “Don’t use the mirror network for automated
provisioning/deployments.” That would suffice. But as things stand now, I
have to guess and wonder at this stuff.

Nick

On Thu, Dec 24, 2015 at 5:43 AM Steve Loughran <st...@hortonworks.com>
wrote:

>
> On 24 Dec 2015, at 05:59, Nicholas Chammas <ni...@gmail.com>
> wrote:
>
> FYI: I opened an INFRA ticket with questions about how best to use the
> Apache mirror network.
>
> https://issues.apache.org/jira/browse/INFRA-10999
>
> Nick
>
>
>
> not that likely to get an answer as it's really a support call, not a
> bug/task. You never know though.
>
> There's another way to get at binaries, which is check them out direct
> from SVN
>
> https://dist.apache.org/repos/dist/release/
>
> This is a direct view into how you release things in the ASF (you just
> create a new dir under your project, copy the files and then do an svn
> commit; I believe the replicated servers may just do svn update on their
> local cache.
>
> there's no mirroring, if you install to lots of machines your download
> time will be slow. You could automate it though, do something like D/L,
> upload to your own bucket, do an s3 GET.
>

Re: Downloading Hadoop from s3://spark-related-packages/

Posted by Steve Loughran <st...@hortonworks.com>.

On 24 Dec 2015, at 05:59, Nicholas Chammas <ni...@gmail.com>> wrote:

FYI: I opened an INFRA ticket with questions about how best to use the Apache mirror network.

https://issues.apache.org/jira/browse/INFRA-10999

Nick


not that likely to get an answer as it's really a support call, not a bug/task. You never know though.

There's another way to get at binaries, which is check them out direct from SVN

https://dist.apache.org/repos/dist/release/

This is a direct view into how you release things in the ASF (you just create a new dir under your project, copy the files and then do an svn commit; I believe the replicated servers may just do svn update on their local cache.

there's no mirroring, if you install to lots of machines your download time will be slow. You could automate it though, do something like D/L, upload to your own bucket, do an s3 GET.

Re: Downloading Hadoop from s3://spark-related-packages/

Posted by Nicholas Chammas <ni...@gmail.com>.

FYI: I opened an INFRA ticket with questions about how best to use the
Apache mirror network.

https://issues.apache.org/jira/browse/INFRA-10999

Nick

On Mon, Nov 2, 2015 at 8:00 AM Luciano Resende <lu...@gmail.com> wrote:

> I am getting the same results using closer.lua versus close.cgi, which
> seems to be downloading a page where the user can choose the closest
> mirror. I tried to add parameters to follow redirect without much success.
> There seems to be already a jira for a similar request with infra:
> https://issues.apache.org/jira/browse/INFRA-10240.
>
> A workaround is to use a url pointing to the mirror directly.
>
> curl -O -L
> http://ftp.unicamp.br/pub/apache/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz
>
> I second the lack of documentation on what is available with these
> scripts, I'll see if I can find the source and try to see other options.
>
>
> On Sun, Nov 1, 2015 at 8:40 PM, Shivaram Venkataraman <
> shivaram@eecs.berkeley.edu> wrote:
>
>> I think the lua one at
>>
>> https://svn.apache.org/repos/asf/infrastructure/site/trunk/content/dyn/closer.lua
>> has replaced the cgi one from before. Also it looks like the lua one
>> also supports `action=download` with a filename argument. So you could
>> just do something like
>>
>> wget
>> http://www.apache.org/dyn/closer.lua?filename=hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz&action=download
>>
>> Thanks
>> Shivaram
>>
>> On Sun, Nov 1, 2015 at 3:18 PM, Nicholas Chammas
>> <ni...@gmail.com> wrote:
>> > Oh, sweet! For example:
>> >
>> >
>> http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz?asjson=1
>> >
>> > Thanks for sharing that tip. Looks like you can also use as_json (vs.
>> > asjson).
>> >
>> > Nick
>> >
>> >
>> > On Sun, Nov 1, 2015 at 5:32 PM Shivaram Venkataraman
>> > <sh...@eecs.berkeley.edu> wrote:
>> >>
>> >> On Sun, Nov 1, 2015 at 2:16 PM, Nicholas Chammas
>> >> <ni...@gmail.com> wrote:
>> >> > OK, I’ll focus on the Apache mirrors going forward.
>> >> >
>> >> > The problem with the Apache mirrors, if I am not mistaken, is that
>> you
>> >> > cannot use a single URL that automatically redirects you to a working
>> >> > mirror
>> >> > to download Hadoop. You have to pick a specific mirror and pray it
>> >> > doesn’t
>> >> > disappear tomorrow.
>> >> >
>> >> > They don’t go away, especially http://mirror.ox.ac.uk , and in the
>> us
>> >> > the
>> >> > apache.osuosl.org, osu being a where a lot of the ASF servers are
>> kept.
>> >> >
>> >> > So does Apache offer no way to query a URL and automatically get the
>> >> > closest
>> >> > working mirror? If I’m installing HDFS onto servers in various EC2
>> >> > regions,
>> >> > the best mirror will vary depending on my location.
>> >> >
>> >> Not sure if this is officially documented somewhere but if you pass
>> >> '&asjson=1' you will get back a JSON which has a 'preferred' field set
>> >> to the closest mirror.
>> >>
>> >> Shivaram
>> >> > Nick
>> >> >
>> >> >
>> >> > On Sun, Nov 1, 2015 at 12:25 PM Shivaram Venkataraman
>> >> > <sh...@eecs.berkeley.edu> wrote:
>> >> >>
>> >> >> I think that getting them from the ASF mirrors is a better strategy
>> in
>> >> >> general as it'll remove the overhead of keeping the S3 bucket up to
>> >> >> date. It works in the spark-ec2 case because we only support a
>> limited
>> >> >> number of Hadoop versions from the tool. FWIW I don't have write
>> >> >> access to the bucket and also haven't heard of any plans to support
>> >> >> newer versions in spark-ec2.
>> >> >>
>> >> >> Thanks
>> >> >> Shivaram
>> >> >>
>> >> >> On Sun, Nov 1, 2015 at 2:30 AM, Steve Loughran <
>> stevel@hortonworks.com>
>> >> >> wrote:
>> >> >> >
>> >> >> > On 1 Nov 2015, at 03:17, Nicholas Chammas
>> >> >> > <ni...@gmail.com>
>> >> >> > wrote:
>> >> >> >
>> >> >> > https://s3.amazonaws.com/spark-related-packages/
>> >> >> >
>> >> >> > spark-ec2 uses this bucket to download and install HDFS on
>> clusters.
>> >> >> > Is
>> >> >> > it
>> >> >> > owned by the Spark project or by the AMPLab?
>> >> >> >
>> >> >> > Anyway, it looks like the latest Hadoop install available on
>> there is
>> >> >> > Hadoop
>> >> >> > 2.4.0.
>> >> >> >
>> >> >> > Are there plans to add newer versions of Hadoop for use by
>> spark-ec2
>> >> >> > and
>> >> >> > similar tools, or should we just be getting that stuff via an
>> Apache
>> >> >> > mirror?
>> >> >> > The latest version is 2.7.1, by the way.
>> >> >> >
>> >> >> >
>> >> >> > you should be grabbing the artifacts off the ASF and then
>> verifying
>> >> >> > their
>> >> >> > SHA1 checksums as published on the ASF HTTPS web site
>> >> >> >
>> >> >> >
>> >> >> > The problem with the Apache mirrors, if I am not mistaken, is that
>> >> >> > you
>> >> >> > cannot use a single URL that automatically redirects you to a
>> working
>> >> >> > mirror
>> >> >> > to download Hadoop. You have to pick a specific mirror and pray it
>> >> >> > doesn't
>> >> >> > disappear tomorrow.
>> >> >> >
>> >> >> >
>> >> >> > They don't go away, especially http://mirror.ox.ac.uk , and in
>> the us
>> >> >> > the
>> >> >> > apache.osuosl.org, osu being a where a lot of the ASF servers are
>> >> >> > kept.
>> >> >> >
>> >> >> > full list with availability stats
>> >> >> >
>> >> >> > http://www.apache.org/mirrors/
>> >> >> >
>> >> >> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> For additional commands, e-mail: dev-help@spark.apache.org
>>
>>
>
>
> --
> Luciano Resende
> http://people.apache.org/~lresende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
>

Re: Downloading Hadoop from s3://spark-related-packages/

Posted by Luciano Resende <lu...@gmail.com>.

I am getting the same results using closer.lua versus close.cgi, which
seems to be downloading a page where the user can choose the closest
mirror. I tried to add parameters to follow redirect without much success.
There seems to be already a jira for a similar request with infra:
https://issues.apache.org/jira/browse/INFRA-10240.

A workaround is to use a url pointing to the mirror directly.

curl -O -L
http://ftp.unicamp.br/pub/apache/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz

I second the lack of documentation on what is available with these scripts,
I'll see if I can find the source and try to see other options.


On Sun, Nov 1, 2015 at 8:40 PM, Shivaram Venkataraman <
shivaram@eecs.berkeley.edu> wrote:

> I think the lua one at
>
> https://svn.apache.org/repos/asf/infrastructure/site/trunk/content/dyn/closer.lua
> has replaced the cgi one from before. Also it looks like the lua one
> also supports `action=download` with a filename argument. So you could
> just do something like
>
> wget
> http://www.apache.org/dyn/closer.lua?filename=hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz&action=download
>
> Thanks
> Shivaram
>
> On Sun, Nov 1, 2015 at 3:18 PM, Nicholas Chammas
> <ni...@gmail.com> wrote:
> > Oh, sweet! For example:
> >
> >
> http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz?asjson=1
> >
> > Thanks for sharing that tip. Looks like you can also use as_json (vs.
> > asjson).
> >
> > Nick
> >
> >
> > On Sun, Nov 1, 2015 at 5:32 PM Shivaram Venkataraman
> > <sh...@eecs.berkeley.edu> wrote:
> >>
> >> On Sun, Nov 1, 2015 at 2:16 PM, Nicholas Chammas
> >> <ni...@gmail.com> wrote:
> >> > OK, I’ll focus on the Apache mirrors going forward.
> >> >
> >> > The problem with the Apache mirrors, if I am not mistaken, is that you
> >> > cannot use a single URL that automatically redirects you to a working
> >> > mirror
> >> > to download Hadoop. You have to pick a specific mirror and pray it
> >> > doesn’t
> >> > disappear tomorrow.
> >> >
> >> > They don’t go away, especially http://mirror.ox.ac.uk , and in the us
> >> > the
> >> > apache.osuosl.org, osu being a where a lot of the ASF servers are
> kept.
> >> >
> >> > So does Apache offer no way to query a URL and automatically get the
> >> > closest
> >> > working mirror? If I’m installing HDFS onto servers in various EC2
> >> > regions,
> >> > the best mirror will vary depending on my location.
> >> >
> >> Not sure if this is officially documented somewhere but if you pass
> >> '&asjson=1' you will get back a JSON which has a 'preferred' field set
> >> to the closest mirror.
> >>
> >> Shivaram
> >> > Nick
> >> >
> >> >
> >> > On Sun, Nov 1, 2015 at 12:25 PM Shivaram Venkataraman
> >> > <sh...@eecs.berkeley.edu> wrote:
> >> >>
> >> >> I think that getting them from the ASF mirrors is a better strategy
> in
> >> >> general as it'll remove the overhead of keeping the S3 bucket up to
> >> >> date. It works in the spark-ec2 case because we only support a
> limited
> >> >> number of Hadoop versions from the tool. FWIW I don't have write
> >> >> access to the bucket and also haven't heard of any plans to support
> >> >> newer versions in spark-ec2.
> >> >>
> >> >> Thanks
> >> >> Shivaram
> >> >>
> >> >> On Sun, Nov 1, 2015 at 2:30 AM, Steve Loughran <
> stevel@hortonworks.com>
> >> >> wrote:
> >> >> >
> >> >> > On 1 Nov 2015, at 03:17, Nicholas Chammas
> >> >> > <ni...@gmail.com>
> >> >> > wrote:
> >> >> >
> >> >> > https://s3.amazonaws.com/spark-related-packages/
> >> >> >
> >> >> > spark-ec2 uses this bucket to download and install HDFS on
> clusters.
> >> >> > Is
> >> >> > it
> >> >> > owned by the Spark project or by the AMPLab?
> >> >> >
> >> >> > Anyway, it looks like the latest Hadoop install available on there
> is
> >> >> > Hadoop
> >> >> > 2.4.0.
> >> >> >
> >> >> > Are there plans to add newer versions of Hadoop for use by
> spark-ec2
> >> >> > and
> >> >> > similar tools, or should we just be getting that stuff via an
> Apache
> >> >> > mirror?
> >> >> > The latest version is 2.7.1, by the way.
> >> >> >
> >> >> >
> >> >> > you should be grabbing the artifacts off the ASF and then verifying
> >> >> > their
> >> >> > SHA1 checksums as published on the ASF HTTPS web site
> >> >> >
> >> >> >
> >> >> > The problem with the Apache mirrors, if I am not mistaken, is that
> >> >> > you
> >> >> > cannot use a single URL that automatically redirects you to a
> working
> >> >> > mirror
> >> >> > to download Hadoop. You have to pick a specific mirror and pray it
> >> >> > doesn't
> >> >> > disappear tomorrow.
> >> >> >
> >> >> >
> >> >> > They don't go away, especially http://mirror.ox.ac.uk , and in
> the us
> >> >> > the
> >> >> > apache.osuosl.org, osu being a where a lot of the ASF servers are
> >> >> > kept.
> >> >> >
> >> >> > full list with availability stats
> >> >> >
> >> >> > http://www.apache.org/mirrors/
> >> >> >
> >> >> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>


-- 
Luciano Resende
http://people.apache.org/~lresende
http://twitter.com/lresende1975
http://lresende.blogspot.com/

Re: Downloading Hadoop from s3://spark-related-packages/

Posted by Nicholas Chammas <ni...@gmail.com>.

Hmm, yeah, some Googling confirms this, though there isn't any clear
documentation about this.

Strangely, if I click on the link from your email the download works, but
curl and wget somehow don't get redirected correctly...

Nick

On Sun, Nov 1, 2015 at 6:40 PM Shivaram Venkataraman <
shivaram@eecs.berkeley.edu> wrote:

> I think the lua one at
>
> https://svn.apache.org/repos/asf/infrastructure/site/trunk/content/dyn/closer.lua
> has replaced the cgi one from before. Also it looks like the lua one
> also supports `action=download` with a filename argument. So you could
> just do something like
>
> wget
> http://www.apache.org/dyn/closer.lua?filename=hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz&action=download
>
> Thanks
> Shivaram
>
> On Sun, Nov 1, 2015 at 3:18 PM, Nicholas Chammas
> <ni...@gmail.com> wrote:
> > Oh, sweet! For example:
> >
> >
> http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz?asjson=1
> >
> > Thanks for sharing that tip. Looks like you can also use as_json (vs.
> > asjson).
> >
> > Nick
> >
> >
> > On Sun, Nov 1, 2015 at 5:32 PM Shivaram Venkataraman
> > <sh...@eecs.berkeley.edu> wrote:
> >>
> >> On Sun, Nov 1, 2015 at 2:16 PM, Nicholas Chammas
> >> <ni...@gmail.com> wrote:
> >> > OK, I’ll focus on the Apache mirrors going forward.
> >> >
> >> > The problem with the Apache mirrors, if I am not mistaken, is that you
> >> > cannot use a single URL that automatically redirects you to a working
> >> > mirror
> >> > to download Hadoop. You have to pick a specific mirror and pray it
> >> > doesn’t
> >> > disappear tomorrow.
> >> >
> >> > They don’t go away, especially http://mirror.ox.ac.uk , and in the us
> >> > the
> >> > apache.osuosl.org, osu being a where a lot of the ASF servers are
> kept.
> >> >
> >> > So does Apache offer no way to query a URL and automatically get the
> >> > closest
> >> > working mirror? If I’m installing HDFS onto servers in various EC2
> >> > regions,
> >> > the best mirror will vary depending on my location.
> >> >
> >> Not sure if this is officially documented somewhere but if you pass
> >> '&asjson=1' you will get back a JSON which has a 'preferred' field set
> >> to the closest mirror.
> >>
> >> Shivaram
> >> > Nick
> >> >
> >> >
> >> > On Sun, Nov 1, 2015 at 12:25 PM Shivaram Venkataraman
> >> > <sh...@eecs.berkeley.edu> wrote:
> >> >>
> >> >> I think that getting them from the ASF mirrors is a better strategy
> in
> >> >> general as it'll remove the overhead of keeping the S3 bucket up to
> >> >> date. It works in the spark-ec2 case because we only support a
> limited
> >> >> number of Hadoop versions from the tool. FWIW I don't have write
> >> >> access to the bucket and also haven't heard of any plans to support
> >> >> newer versions in spark-ec2.
> >> >>
> >> >> Thanks
> >> >> Shivaram
> >> >>
> >> >> On Sun, Nov 1, 2015 at 2:30 AM, Steve Loughran <
> stevel@hortonworks.com>
> >> >> wrote:
> >> >> >
> >> >> > On 1 Nov 2015, at 03:17, Nicholas Chammas
> >> >> > <ni...@gmail.com>
> >> >> > wrote:
> >> >> >
> >> >> > https://s3.amazonaws.com/spark-related-packages/
> >> >> >
> >> >> > spark-ec2 uses this bucket to download and install HDFS on
> clusters.
> >> >> > Is
> >> >> > it
> >> >> > owned by the Spark project or by the AMPLab?
> >> >> >
> >> >> > Anyway, it looks like the latest Hadoop install available on there
> is
> >> >> > Hadoop
> >> >> > 2.4.0.
> >> >> >
> >> >> > Are there plans to add newer versions of Hadoop for use by
> spark-ec2
> >> >> > and
> >> >> > similar tools, or should we just be getting that stuff via an
> Apache
> >> >> > mirror?
> >> >> > The latest version is 2.7.1, by the way.
> >> >> >
> >> >> >
> >> >> > you should be grabbing the artifacts off the ASF and then verifying
> >> >> > their
> >> >> > SHA1 checksums as published on the ASF HTTPS web site
> >> >> >
> >> >> >
> >> >> > The problem with the Apache mirrors, if I am not mistaken, is that
> >> >> > you
> >> >> > cannot use a single URL that automatically redirects you to a
> working
> >> >> > mirror
> >> >> > to download Hadoop. You have to pick a specific mirror and pray it
> >> >> > doesn't
> >> >> > disappear tomorrow.
> >> >> >
> >> >> >
> >> >> > They don't go away, especially http://mirror.ox.ac.uk , and in
> the us
> >> >> > the
> >> >> > apache.osuosl.org, osu being a where a lot of the ASF servers are
> >> >> > kept.
> >> >> >
> >> >> > full list with availability stats
> >> >> >
> >> >> > http://www.apache.org/mirrors/
> >> >> >
> >> >> >
>

Re: Downloading Hadoop from s3://spark-related-packages/

Posted by Shivaram Venkataraman <sh...@eecs.berkeley.edu>.

I think the lua one at
https://svn.apache.org/repos/asf/infrastructure/site/trunk/content/dyn/closer.lua
has replaced the cgi one from before. Also it looks like the lua one
also supports `action=download` with a filename argument. So you could
just do something like

wget http://www.apache.org/dyn/closer.lua?filename=hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz&action=download

Thanks
Shivaram

On Sun, Nov 1, 2015 at 3:18 PM, Nicholas Chammas
<ni...@gmail.com> wrote:
> Oh, sweet! For example:
>
> http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz?asjson=1
>
> Thanks for sharing that tip. Looks like you can also use as_json (vs.
> asjson).
>
> Nick
>
>
> On Sun, Nov 1, 2015 at 5:32 PM Shivaram Venkataraman
> <sh...@eecs.berkeley.edu> wrote:
>>
>> On Sun, Nov 1, 2015 at 2:16 PM, Nicholas Chammas
>> <ni...@gmail.com> wrote:
>> > OK, I’ll focus on the Apache mirrors going forward.
>> >
>> > The problem with the Apache mirrors, if I am not mistaken, is that you
>> > cannot use a single URL that automatically redirects you to a working
>> > mirror
>> > to download Hadoop. You have to pick a specific mirror and pray it
>> > doesn’t
>> > disappear tomorrow.
>> >
>> > They don’t go away, especially http://mirror.ox.ac.uk , and in the us
>> > the
>> > apache.osuosl.org, osu being a where a lot of the ASF servers are kept.
>> >
>> > So does Apache offer no way to query a URL and automatically get the
>> > closest
>> > working mirror? If I’m installing HDFS onto servers in various EC2
>> > regions,
>> > the best mirror will vary depending on my location.
>> >
>> Not sure if this is officially documented somewhere but if you pass
>> '&asjson=1' you will get back a JSON which has a 'preferred' field set
>> to the closest mirror.
>>
>> Shivaram
>> > Nick
>> >
>> >
>> > On Sun, Nov 1, 2015 at 12:25 PM Shivaram Venkataraman
>> > <sh...@eecs.berkeley.edu> wrote:
>> >>
>> >> I think that getting them from the ASF mirrors is a better strategy in
>> >> general as it'll remove the overhead of keeping the S3 bucket up to
>> >> date. It works in the spark-ec2 case because we only support a limited
>> >> number of Hadoop versions from the tool. FWIW I don't have write
>> >> access to the bucket and also haven't heard of any plans to support
>> >> newer versions in spark-ec2.
>> >>
>> >> Thanks
>> >> Shivaram
>> >>
>> >> On Sun, Nov 1, 2015 at 2:30 AM, Steve Loughran <st...@hortonworks.com>
>> >> wrote:
>> >> >
>> >> > On 1 Nov 2015, at 03:17, Nicholas Chammas
>> >> > <ni...@gmail.com>
>> >> > wrote:
>> >> >
>> >> > https://s3.amazonaws.com/spark-related-packages/
>> >> >
>> >> > spark-ec2 uses this bucket to download and install HDFS on clusters.
>> >> > Is
>> >> > it
>> >> > owned by the Spark project or by the AMPLab?
>> >> >
>> >> > Anyway, it looks like the latest Hadoop install available on there is
>> >> > Hadoop
>> >> > 2.4.0.
>> >> >
>> >> > Are there plans to add newer versions of Hadoop for use by spark-ec2
>> >> > and
>> >> > similar tools, or should we just be getting that stuff via an Apache
>> >> > mirror?
>> >> > The latest version is 2.7.1, by the way.
>> >> >
>> >> >
>> >> > you should be grabbing the artifacts off the ASF and then verifying
>> >> > their
>> >> > SHA1 checksums as published on the ASF HTTPS web site
>> >> >
>> >> >
>> >> > The problem with the Apache mirrors, if I am not mistaken, is that
>> >> > you
>> >> > cannot use a single URL that automatically redirects you to a working
>> >> > mirror
>> >> > to download Hadoop. You have to pick a specific mirror and pray it
>> >> > doesn't
>> >> > disappear tomorrow.
>> >> >
>> >> >
>> >> > They don't go away, especially http://mirror.ox.ac.uk , and in the us
>> >> > the
>> >> > apache.osuosl.org, osu being a where a lot of the ASF servers are
>> >> > kept.
>> >> >
>> >> > full list with availability stats
>> >> >
>> >> > http://www.apache.org/mirrors/
>> >> >
>> >> >

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: Downloading Hadoop from s3://spark-related-packages/

Posted by Nicholas Chammas <ni...@gmail.com>.

Oh, sweet! For example:

http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz?asjson=1

Thanks for sharing that tip. Looks like you can also use as_json
<https://svn.apache.org/repos/asf/infrastructure/site/trunk/content/dyn/mirrors/mirrors.cgi>
(vs. asjson).

Nick


On Sun, Nov 1, 2015 at 5:32 PM Shivaram Venkataraman <
shivaram@eecs.berkeley.edu> wrote:

> On Sun, Nov 1, 2015 at 2:16 PM, Nicholas Chammas
> <ni...@gmail.com> wrote:
> > OK, I’ll focus on the Apache mirrors going forward.
> >
> > The problem with the Apache mirrors, if I am not mistaken, is that you
> > cannot use a single URL that automatically redirects you to a working
> mirror
> > to download Hadoop. You have to pick a specific mirror and pray it
> doesn’t
> > disappear tomorrow.
> >
> > They don’t go away, especially http://mirror.ox.ac.uk , and in the us
> the
> > apache.osuosl.org, osu being a where a lot of the ASF servers are kept.
> >
> > So does Apache offer no way to query a URL and automatically get the
> closest
> > working mirror? If I’m installing HDFS onto servers in various EC2
> regions,
> > the best mirror will vary depending on my location.
> >
> Not sure if this is officially documented somewhere but if you pass
> '&asjson=1' you will get back a JSON which has a 'preferred' field set
> to the closest mirror.
>
> Shivaram
> > Nick
> >
> >
> > On Sun, Nov 1, 2015 at 12:25 PM Shivaram Venkataraman
> > <sh...@eecs.berkeley.edu> wrote:
> >>
> >> I think that getting them from the ASF mirrors is a better strategy in
> >> general as it'll remove the overhead of keeping the S3 bucket up to
> >> date. It works in the spark-ec2 case because we only support a limited
> >> number of Hadoop versions from the tool. FWIW I don't have write
> >> access to the bucket and also haven't heard of any plans to support
> >> newer versions in spark-ec2.
> >>
> >> Thanks
> >> Shivaram
> >>
> >> On Sun, Nov 1, 2015 at 2:30 AM, Steve Loughran <st...@hortonworks.com>
> >> wrote:
> >> >
> >> > On 1 Nov 2015, at 03:17, Nicholas Chammas <nicholas.chammas@gmail.com
> >
> >> > wrote:
> >> >
> >> > https://s3.amazonaws.com/spark-related-packages/
> >> >
> >> > spark-ec2 uses this bucket to download and install HDFS on clusters.
> Is
> >> > it
> >> > owned by the Spark project or by the AMPLab?
> >> >
> >> > Anyway, it looks like the latest Hadoop install available on there is
> >> > Hadoop
> >> > 2.4.0.
> >> >
> >> > Are there plans to add newer versions of Hadoop for use by spark-ec2
> and
> >> > similar tools, or should we just be getting that stuff via an Apache
> >> > mirror?
> >> > The latest version is 2.7.1, by the way.
> >> >
> >> >
> >> > you should be grabbing the artifacts off the ASF and then verifying
> >> > their
> >> > SHA1 checksums as published on the ASF HTTPS web site
> >> >
> >> >
> >> > The problem with the Apache mirrors, if I am not mistaken, is that you
> >> > cannot use a single URL that automatically redirects you to a working
> >> > mirror
> >> > to download Hadoop. You have to pick a specific mirror and pray it
> >> > doesn't
> >> > disappear tomorrow.
> >> >
> >> >
> >> > They don't go away, especially http://mirror.ox.ac.uk , and in the us
> >> > the
> >> > apache.osuosl.org, osu being a where a lot of the ASF servers are
> kept.
> >> >
> >> > full list with availability stats
> >> >
> >> > http://www.apache.org/mirrors/
> >> >
> >> >
>

Re: Downloading Hadoop from s3://spark-related-packages/

Posted by Shivaram Venkataraman <sh...@eecs.berkeley.edu>.

On Sun, Nov 1, 2015 at 2:16 PM, Nicholas Chammas
<ni...@gmail.com> wrote:
> OK, I’ll focus on the Apache mirrors going forward.
>
> The problem with the Apache mirrors, if I am not mistaken, is that you
> cannot use a single URL that automatically redirects you to a working mirror
> to download Hadoop. You have to pick a specific mirror and pray it doesn’t
> disappear tomorrow.
>
> They don’t go away, especially http://mirror.ox.ac.uk , and in the us the
> apache.osuosl.org, osu being a where a lot of the ASF servers are kept.
>
> So does Apache offer no way to query a URL and automatically get the closest
> working mirror? If I’m installing HDFS onto servers in various EC2 regions,
> the best mirror will vary depending on my location.
>
Not sure if this is officially documented somewhere but if you pass
'&asjson=1' you will get back a JSON which has a 'preferred' field set
to the closest mirror.

Shivaram
> Nick
>
>
> On Sun, Nov 1, 2015 at 12:25 PM Shivaram Venkataraman
> <sh...@eecs.berkeley.edu> wrote:
>>
>> I think that getting them from the ASF mirrors is a better strategy in
>> general as it'll remove the overhead of keeping the S3 bucket up to
>> date. It works in the spark-ec2 case because we only support a limited
>> number of Hadoop versions from the tool. FWIW I don't have write
>> access to the bucket and also haven't heard of any plans to support
>> newer versions in spark-ec2.
>>
>> Thanks
>> Shivaram
>>
>> On Sun, Nov 1, 2015 at 2:30 AM, Steve Loughran <st...@hortonworks.com>
>> wrote:
>> >
>> > On 1 Nov 2015, at 03:17, Nicholas Chammas <ni...@gmail.com>
>> > wrote:
>> >
>> > https://s3.amazonaws.com/spark-related-packages/
>> >
>> > spark-ec2 uses this bucket to download and install HDFS on clusters. Is
>> > it
>> > owned by the Spark project or by the AMPLab?
>> >
>> > Anyway, it looks like the latest Hadoop install available on there is
>> > Hadoop
>> > 2.4.0.
>> >
>> > Are there plans to add newer versions of Hadoop for use by spark-ec2 and
>> > similar tools, or should we just be getting that stuff via an Apache
>> > mirror?
>> > The latest version is 2.7.1, by the way.
>> >
>> >
>> > you should be grabbing the artifacts off the ASF and then verifying
>> > their
>> > SHA1 checksums as published on the ASF HTTPS web site
>> >
>> >
>> > The problem with the Apache mirrors, if I am not mistaken, is that you
>> > cannot use a single URL that automatically redirects you to a working
>> > mirror
>> > to download Hadoop. You have to pick a specific mirror and pray it
>> > doesn't
>> > disappear tomorrow.
>> >
>> >
>> > They don't go away, especially http://mirror.ox.ac.uk , and in the us
>> > the
>> > apache.osuosl.org, osu being a where a lot of the ASF servers are kept.
>> >
>> > full list with availability stats
>> >
>> > http://www.apache.org/mirrors/
>> >
>> >

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: Downloading Hadoop from s3://spark-related-packages/

Posted by Nicholas Chammas <ni...@gmail.com>.

OK, I’ll focus on the Apache mirrors going forward.

The problem with the Apache mirrors, if I am not mistaken, is that you
cannot use a single URL that automatically redirects you to a working
mirror to download Hadoop. You have to pick a specific mirror and pray it
doesn’t disappear tomorrow.

They don’t go away, especially http://mirror.ox.ac.uk , and in the us the
apache.osuosl.org, osu being a where a lot of the ASF servers are kept.

So does Apache offer no way to query a URL and automatically get the
closest working mirror? If I’m installing HDFS onto servers in various EC2
regions, the best mirror will vary depending on my location.

Nick


On Sun, Nov 1, 2015 at 12:25 PM Shivaram Venkataraman <
shivaram@eecs.berkeley.edu> wrote:

> I think that getting them from the ASF mirrors is a better strategy in
> general as it'll remove the overhead of keeping the S3 bucket up to
> date. It works in the spark-ec2 case because we only support a limited
> number of Hadoop versions from the tool. FWIW I don't have write
> access to the bucket and also haven't heard of any plans to support
> newer versions in spark-ec2.
>
> Thanks
> Shivaram
>
> On Sun, Nov 1, 2015 at 2:30 AM, Steve Loughran <st...@hortonworks.com>
> wrote:
> >
> > On 1 Nov 2015, at 03:17, Nicholas Chammas <ni...@gmail.com>
> > wrote:
> >
> > https://s3.amazonaws.com/spark-related-packages/
> >
> > spark-ec2 uses this bucket to download and install HDFS on clusters. Is
> it
> > owned by the Spark project or by the AMPLab?
> >
> > Anyway, it looks like the latest Hadoop install available on there is
> Hadoop
> > 2.4.0.
> >
> > Are there plans to add newer versions of Hadoop for use by spark-ec2 and
> > similar tools, or should we just be getting that stuff via an Apache
> mirror?
> > The latest version is 2.7.1, by the way.
> >
> >
> > you should be grabbing the artifacts off the ASF and then verifying their
> > SHA1 checksums as published on the ASF HTTPS web site
> >
> >
> > The problem with the Apache mirrors, if I am not mistaken, is that you
> > cannot use a single URL that automatically redirects you to a working
> mirror
> > to download Hadoop. You have to pick a specific mirror and pray it
> doesn't
> > disappear tomorrow.
> >
> >
> > They don't go away, especially http://mirror.ox.ac.uk , and in the us
> the
> > apache.osuosl.org, osu being a where a lot of the ASF servers are kept.
> >
> > full list with availability stats
> >
> > http://www.apache.org/mirrors/
> >
> >
>

Re: Downloading Hadoop from s3://spark-related-packages/

Posted by Shivaram Venkataraman <sh...@eecs.berkeley.edu>.

I think that getting them from the ASF mirrors is a better strategy in
general as it'll remove the overhead of keeping the S3 bucket up to
date. It works in the spark-ec2 case because we only support a limited
number of Hadoop versions from the tool. FWIW I don't have write
access to the bucket and also haven't heard of any plans to support
newer versions in spark-ec2.

Thanks
Shivaram

On Sun, Nov 1, 2015 at 2:30 AM, Steve Loughran <st...@hortonworks.com> wrote:
>
> On 1 Nov 2015, at 03:17, Nicholas Chammas <ni...@gmail.com>
> wrote:
>
> https://s3.amazonaws.com/spark-related-packages/
>
> spark-ec2 uses this bucket to download and install HDFS on clusters. Is it
> owned by the Spark project or by the AMPLab?
>
> Anyway, it looks like the latest Hadoop install available on there is Hadoop
> 2.4.0.
>
> Are there plans to add newer versions of Hadoop for use by spark-ec2 and
> similar tools, or should we just be getting that stuff via an Apache mirror?
> The latest version is 2.7.1, by the way.
>
>
> you should be grabbing the artifacts off the ASF and then verifying their
> SHA1 checksums as published on the ASF HTTPS web site
>
>
> The problem with the Apache mirrors, if I am not mistaken, is that you
> cannot use a single URL that automatically redirects you to a working mirror
> to download Hadoop. You have to pick a specific mirror and pray it doesn't
> disappear tomorrow.
>
>
> They don't go away, especially http://mirror.ox.ac.uk , and in the us the
> apache.osuosl.org, osu being a where a lot of the ASF servers are kept.
>
> full list with availability stats
>
> http://www.apache.org/mirrors/
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: Downloading Hadoop from s3://spark-related-packages/

Posted by Steve Loughran <st...@hortonworks.com>.

On 1 Nov 2015, at 03:17, Nicholas Chammas <ni...@gmail.com>> wrote:

https://s3.amazonaws.com/spark-related-packages/

spark-ec2 uses this bucket to download and install HDFS on clusters. Is it owned by the Spark project or by the AMPLab?

Anyway, it looks like the latest Hadoop install available on there is Hadoop 2.4.0.

Are there plans to add newer versions of Hadoop for use by spark-ec2 and similar tools, or should we just be getting that stuff via an Apache mirror<http://hadoop.apache.org/releases.html>? The latest version is 2.7.1, by the way.


you should be grabbing the artifacts off the ASF and then verifying their SHA1 checksums as published on the ASF HTTPS web site


The problem with the Apache mirrors, if I am not mistaken, is that you cannot use a single URL that automatically redirects you to a working mirror to download Hadoop. You have to pick a specific mirror and pray it doesn't disappear tomorrow.


They don't go away, especially http://mirror.ox.ac.uk , and in the us the apache.osuosl.org<http://apache.osuosl.org>, osu being a where a lot of the ASF servers are kept.

full list with availability stats

http://www.apache.org/mirrors/