You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Nicholas Chammas <ni...@gmail.com> on 2015/03/01 23:59:56 UTC

spark-ec2 default to Hadoop 2

https://github.com/apache/spark/blob/fd8d283eeb98e310b1e85ef8c3a8af9e547ab5e0/ec2/spark_ec2.py#L162-L164

Is there any reason we shouldn't update the default Hadoop major version in
spark-ec2 to 2?

Nick

Re: spark-ec2 default to Hadoop 2

Posted by Patrick Wendell <pw...@gmail.com>.

Yeah calling it Hadoop 2 was a very bad naming choice (of mine!), this
was back when CDH4 was the only real distribution available with some
of the newer Hadoop API's and packaging.

I think to not surprise people using this, it's best to keep v1 as the
default. Overall, we try not to change default values too often to
make upgrading easy for people.

- Patrick

On Sun, Mar 1, 2015 at 3:14 PM, Shivaram Venkataraman
<sh...@eecs.berkeley.edu> wrote:
> One reason I wouldn't change the default is that the Hadoop 2 launched by
> spark-ec2 is not a full Hadoop 2 distribution -- Its more of a hybrid
> Hadoop version built using CDH4 (it uses HDFS 2, but not YARN AFAIK).
>
> Also our default Hadoop version in the Spark build is still 1.0.4 [1], so
> it makes sense to stick to that in spark-ec2 as well ?
>
> [1] https://github.com/apache/spark/blob/master/pom.xml#L122
>
> Thanks
> Shivaram
>
> On Sun, Mar 1, 2015 at 2:59 PM, Nicholas Chammas <nicholas.chammas@gmail.com
>> wrote:
>
>>
>> https://github.com/apache/spark/blob/fd8d283eeb98e310b1e85ef8c3a8af9e547ab5e0/ec2/spark_ec2.py#L162-L164
>>
>> Is there any reason we shouldn't update the default Hadoop major version in
>> spark-ec2 to 2?
>>
>> Nick
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: spark-ec2 default to Hadoop 2

Posted by Shivaram Venkataraman <sh...@eecs.berkeley.edu>.

One reason I wouldn't change the default is that the Hadoop 2 launched by
spark-ec2 is not a full Hadoop 2 distribution -- Its more of a hybrid
Hadoop version built using CDH4 (it uses HDFS 2, but not YARN AFAIK).

Also our default Hadoop version in the Spark build is still 1.0.4 [1], so
it makes sense to stick to that in spark-ec2 as well ?

[1] https://github.com/apache/spark/blob/master/pom.xml#L122

Thanks
Shivaram

On Sun, Mar 1, 2015 at 2:59 PM, Nicholas Chammas <nicholas.chammas@gmail.com
> wrote:

>
> https://github.com/apache/spark/blob/fd8d283eeb98e310b1e85ef8c3a8af9e547ab5e0/ec2/spark_ec2.py#L162-L164
>
> Is there any reason we shouldn't update the default Hadoop major version in
> spark-ec2 to 2?
>
> Nick
>

Re: spark-ec2 default to Hadoop 2

Posted by Shivaram Venkataraman <sh...@eecs.berkeley.edu>.

FWIW there is a PR open to add support for Hadoop 2.4 to spark-ec2 scripts
at https://github.com/mesos/spark-ec2/pull/77 -- But it hasnt' received
much review or testing to be merged.

Thanks
Shivaram

On Sun, Mar 1, 2015 at 11:49 PM, Sean Owen <so...@cloudera.com> wrote:

> I agree with that. My anecdotal impression is that Hadoop 1.x usage
> out there is maybe a couple percent, and so we should shift towards
> 2.x at least as defaults.
>
> On Sun, Mar 1, 2015 at 10:59 PM, Nicholas Chammas
> <ni...@gmail.com> wrote:
> >
> https://github.com/apache/spark/blob/fd8d283eeb98e310b1e85ef8c3a8af9e547ab5e0/ec2/spark_ec2.py#L162-L164
> >
> > Is there any reason we shouldn't update the default Hadoop major version
> in
> > spark-ec2 to 2?
> >
> > Nick
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>

Re: spark-ec2 default to Hadoop 2

Posted by Sean Owen <so...@cloudera.com>.

I agree with that. My anecdotal impression is that Hadoop 1.x usage
out there is maybe a couple percent, and so we should shift towards
2.x at least as defaults.

On Sun, Mar 1, 2015 at 10:59 PM, Nicholas Chammas
<ni...@gmail.com> wrote:
> https://github.com/apache/spark/blob/fd8d283eeb98e310b1e85ef8c3a8af9e547ab5e0/ec2/spark_ec2.py#L162-L164
>
> Is there any reason we shouldn't update the default Hadoop major version in
> spark-ec2 to 2?
>
> Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org