You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bigtop.apache.org by Roman Shaposhnik <rv...@apache.org> on 2012/05/02 17:34:47 UTC

Guarding against the use of upstream scripts

Guys,

I've noticed lately that new users of the Bigtop distro fall prey
to thinking that they can simply utilized upstream launcher scripts
as is by running them from under /usr/lib/<component>bin
directories. Something this works. More often it doesn't.
For example, in order for the hadoop scripts to work the users
actually need to
   export  HADOOP_LIBEXEC_DIR
properly.

In case of Zookeeper there's more stuff to be setup in the env.
before you can actually execute upstream scripts.

Most of these issues have to do with a tendency of upstream
scripts to cater to the dev. environment where keeping logs, etc.
in /tmp is perfectly fine. For Bigtop we have to override that and
the place we do that is in /usr/bin/<component> launcher
scripts.

Now the question is -- should we go out of our way to not
let the users shoot themselves in the feet with trying to
launch scripts that may fail in the Bigtop setting?

I could imagine that one approach could be removing an
executable bit from them. That way we can still leverage
the code in our /usr/bin scripts by passing the scripts
as arguments to bash/sourcing, but the users won't be
able to (easily) execute them directly. We can augment this
strategy with moving these upstream script from under
/usr/lib/<component>/bin -> /usr/lib/<component>/libexec
to further strengthen the fact that they are not supposed
to be executed directly.

What do you all think?

Thanks,
Roman.

P.S. I guess there are 2 questions I'm asking really:
   * is this worth attacking
   * what would the most effective approach be

Re: Guarding against the use of upstream scripts

Posted by "David Liu (gmail dev)" <em...@gmail.com>.
Thanks Roman. Sure, I will create a ticket, and work toward a proposal for
discussion. 

David

On 5/2/12 12:08 PM, "Roman Shaposhnik" <rv...@apache.org> wrote:

>On Wed, May 2, 2012 at 10:58 AM, David Liu (gmail dev)
><em...@gmail.com> wrote:
>> Along the same line, I wonder if bigtop community has considered to
>>have a
>> shell that allows single point of entry to any of the upstream systems.
>> Something like mvn or git command. This will introduce something to
>>learn
>> at first but it adds value to start/stop/access help for all systems in
>>a
>> consistent way. It can also shield some changes that happens with
>>upstream
>> systems. Instead of dealing with some many <XXX>_HOME, we could have one
>> BIGTOP_HOME. It's quite nice to be able to say:
>> bigtop -help
>> bigtop -start <xxx|all>
>
>I like the idea very much (darn! how come I didn't think of it ;-)).
>Would you care
>to work on a proposal?
>
>This would be a perfect thing to discuss at our upcoming Bigtop hackathon.
>
>Thanks,
>Roman.



Re: Guarding against the use of upstream scripts

Posted by "David Liu (gmail dev)" <em...@gmail.com>.
I have created a ticket for this feature at
https://issues.apache.org/jira/browse/BIGTOP-594 with the slides presented
yesterday at the Hackathon.

David

On 5/2/12 12:08 PM, "Roman Shaposhnik" <rv...@apache.org> wrote:

>On Wed, May 2, 2012 at 10:58 AM, David Liu (gmail dev)
><em...@gmail.com> wrote:
>> Along the same line, I wonder if bigtop community has considered to
>>have a
>> shell that allows single point of entry to any of the upstream systems.
>> Something like mvn or git command. This will introduce something to
>>learn
>> at first but it adds value to start/stop/access help for all systems in
>>a
>> consistent way. It can also shield some changes that happens with
>>upstream
>> systems. Instead of dealing with some many <XXX>_HOME, we could have one
>> BIGTOP_HOME. It's quite nice to be able to say:
>> bigtop -help
>> bigtop -start <xxx|all>
>
>I like the idea very much (darn! how come I didn't think of it ;-)).
>Would you care
>to work on a proposal?
>
>This would be a perfect thing to discuss at our upcoming Bigtop hackathon.
>
>Thanks,
>Roman.



Re: Guarding against the use of upstream scripts

Posted by Roman Shaposhnik <rv...@apache.org>.
On Wed, May 2, 2012 at 10:58 AM, David Liu (gmail dev)
<em...@gmail.com> wrote:
> Along the same line, I wonder if bigtop community has considered to have a
> shell that allows single point of entry to any of the upstream systems.
> Something like mvn or git command. This will introduce something to learn
> at first but it adds value to start/stop/access help for all systems in a
> consistent way. It can also shield some changes that happens with upstream
> systems. Instead of dealing with some many <XXX>_HOME, we could have one
> BIGTOP_HOME. It's quite nice to be able to say:
> bigtop -help
> bigtop -start <xxx|all>

I like the idea very much (darn! how come I didn't think of it ;-)).
Would you care
to work on a proposal?

This would be a perfect thing to discuss at our upcoming Bigtop hackathon.

Thanks,
Roman.

Re: Guarding against the use of upstream scripts

Posted by "David Liu (gmail dev)" <em...@gmail.com>.
Along the same line, I wonder if bigtop community has considered to have a
shell that allows single point of entry to any of the upstream systems.
Something like mvn or git command. This will introduce something to learn
at first but it adds value to start/stop/access help for all systems in a
consistent way. It can also shield some changes that happens with upstream
systems. Instead of dealing with some many <XXX>_HOME, we could have one
BIGTOP_HOME. It's quite nice to be able to say:
bigtop -help
bigtop -start <xxx|all>
Š

David



On 5/2/12 10:49 AM, "Peter Linnell" <pl...@apache.org> wrote:

>On 05/02/2012 10:46 AM, Bruno Mahé wrote:
>> On 05/02/2012 08:34 AM, Roman Shaposhnik wrote:
>>> Guys,
>>>
>>> I've noticed lately that new users of the Bigtop distro fall prey
>>> to thinking that they can simply utilized upstream launcher scripts
>>> as is by running them from under /usr/lib/<component>bin
>>> directories. Something this works. More often it doesn't.
>>> For example, in order for the hadoop scripts to work the users
>>> actually need to
>>>     export  HADOOP_LIBEXEC_DIR
>>> properly.
>>>
>>> In case of Zookeeper there's more stuff to be setup in the env.
>>> before you can actually execute upstream scripts.
>>>
>>> Most of these issues have to do with a tendency of upstream
>>> scripts to cater to the dev. environment where keeping logs, etc.
>>> in /tmp is perfectly fine. For Bigtop we have to override that and
>>> the place we do that is in /usr/bin/<component>  launcher
>>> scripts.
>>>
>>> Now the question is -- should we go out of our way to not
>>> let the users shoot themselves in the feet with trying to
>>> launch scripts that may fail in the Bigtop setting?
>>>
>>> I could imagine that one approach could be removing an
>>> executable bit from them. That way we can still leverage
>>> the code in our /usr/bin scripts by passing the scripts
>>> as arguments to bash/sourcing, but the users won't be
>>> able to (easily) execute them directly. We can augment this
>>> strategy with moving these upstream script from under
>>> /usr/lib/<component>/bin ->  /usr/lib/<component>/libexec
>>> to further strengthen the fact that they are not supposed
>>> to be executed directly.
>>>
>>> What do you all think?
>>>
>>> Thanks,
>>> Roman.
>>>
>>> P.S. I guess there are 2 questions I'm asking really:
>>>     * is this worth attacking
>>>     * what would the most effective approach be
>>
>>
>> Great idea!
>>
>> I would be a favor of starting with disabling the execution bit and see
>> how it goes.
>> This is easier and less intrusive than moving files around.
>>
>> We can always move the files at a later point if necessary.
>>
>> Thanks,
>> Bruno
>
>+1 As well. I just would add this to a readme or in the release notes
>with an explanation why this is done.
>
>Thanks,
>
>Peter
>



Re: Guarding against the use of upstream scripts

Posted by Peter Linnell <pl...@apache.org>.
On 05/02/2012 10:46 AM, Bruno Mahé wrote:
> On 05/02/2012 08:34 AM, Roman Shaposhnik wrote:
>> Guys,
>>
>> I've noticed lately that new users of the Bigtop distro fall prey
>> to thinking that they can simply utilized upstream launcher scripts
>> as is by running them from under /usr/lib/<component>bin
>> directories. Something this works. More often it doesn't.
>> For example, in order for the hadoop scripts to work the users
>> actually need to
>>     export  HADOOP_LIBEXEC_DIR
>> properly.
>>
>> In case of Zookeeper there's more stuff to be setup in the env.
>> before you can actually execute upstream scripts.
>>
>> Most of these issues have to do with a tendency of upstream
>> scripts to cater to the dev. environment where keeping logs, etc.
>> in /tmp is perfectly fine. For Bigtop we have to override that and
>> the place we do that is in /usr/bin/<component>  launcher
>> scripts.
>>
>> Now the question is -- should we go out of our way to not
>> let the users shoot themselves in the feet with trying to
>> launch scripts that may fail in the Bigtop setting?
>>
>> I could imagine that one approach could be removing an
>> executable bit from them. That way we can still leverage
>> the code in our /usr/bin scripts by passing the scripts
>> as arguments to bash/sourcing, but the users won't be
>> able to (easily) execute them directly. We can augment this
>> strategy with moving these upstream script from under
>> /usr/lib/<component>/bin ->  /usr/lib/<component>/libexec
>> to further strengthen the fact that they are not supposed
>> to be executed directly.
>>
>> What do you all think?
>>
>> Thanks,
>> Roman.
>>
>> P.S. I guess there are 2 questions I'm asking really:
>>     * is this worth attacking
>>     * what would the most effective approach be
>
>
> Great idea!
>
> I would be a favor of starting with disabling the execution bit and see
> how it goes.
> This is easier and less intrusive than moving files around.
>
> We can always move the files at a later point if necessary.
>
> Thanks,
> Bruno

+1 As well. I just would add this to a readme or in the release notes 
with an explanation why this is done.

Thanks,

Peter


Re: Guarding against the use of upstream scripts

Posted by Bruno Mahé <bm...@apache.org>.
On 05/02/2012 08:34 AM, Roman Shaposhnik wrote:
> Guys,
> 
> I've noticed lately that new users of the Bigtop distro fall prey
> to thinking that they can simply utilized upstream launcher scripts
> as is by running them from under /usr/lib/<component>bin
> directories. Something this works. More often it doesn't.
> For example, in order for the hadoop scripts to work the users
> actually need to
>    export  HADOOP_LIBEXEC_DIR
> properly.
> 
> In case of Zookeeper there's more stuff to be setup in the env.
> before you can actually execute upstream scripts.
> 
> Most of these issues have to do with a tendency of upstream
> scripts to cater to the dev. environment where keeping logs, etc.
> in /tmp is perfectly fine. For Bigtop we have to override that and
> the place we do that is in /usr/bin/<component> launcher
> scripts.
> 
> Now the question is -- should we go out of our way to not
> let the users shoot themselves in the feet with trying to
> launch scripts that may fail in the Bigtop setting?
> 
> I could imagine that one approach could be removing an
> executable bit from them. That way we can still leverage
> the code in our /usr/bin scripts by passing the scripts
> as arguments to bash/sourcing, but the users won't be
> able to (easily) execute them directly. We can augment this
> strategy with moving these upstream script from under
> /usr/lib/<component>/bin -> /usr/lib/<component>/libexec
> to further strengthen the fact that they are not supposed
> to be executed directly.
> 
> What do you all think?
> 
> Thanks,
> Roman.
> 
> P.S. I guess there are 2 questions I'm asking really:
>    * is this worth attacking
>    * what would the most effective approach be


Great idea!

I would be a favor of starting with disabling the execution bit and see
how it goes.
This is easier and less intrusive than moving files around.

We can always move the files at a later point if necessary.

Thanks,
Bruno

Re: Guarding against the use of upstream scripts

Posted by Bruno Mahé <bm...@apache.org>.
Please, see my reply inline.
But I find it odd to read at the same time such a strong worded email,
while asking why is this done this way. Usually strong words are used
once the situation is understood and clarified.

On 05/02/2012 11:44 AM, Matt Foley wrote:
> What is the justification for saying "what we are doing in BigTop", when it
> is diverging from what has been done for years in the Components?  It seems
> that this is carelessly discarding what is being practiced at a lot of
> sites, because you'd rather do it this way.


Apache Bigtop (incubating) has made the choice from the very beginning
to be close to what GNU/Linux distributions have been doing and what
sysadmins have been used to. This can differ with the experience one may
have with a tarball development of Apache Hadoop.

Keep in mind also that each component has its own way of doing things.
Each one having its own issues.
Apache Bigtop (incubating) smooth this up and provide an easy and
unified experience to users.


>  Generally speaking, we make
> the Hadoop Ecosystem easier to use by adhering to familiar usage, rather
> than diverging.  Please give reasons for the divergence.

I respectfully disagree.
In order to use pristine Apache Hadoop, one would have to be familiar
with its usage and its configuration. Apache Hadoop experience is only
familiar to people *already* familiar with Apache Hadoop.

So using upstream Apache Hadoop will imply a lot of reading through
forums, documentation and frustration.
For instance, Apache Bigtop (incubating) will pre-set the ulimits for
you, will set up the logging to well-known locations, provide init
scripts and make a pseudo-configuration available to users.

In a word, Apache Bigtop (incubating) has a different use case than
upstream Apache Hadoop and therefore will be different in some areas

Thanks,
Bruno

Re: Guarding against the use of upstream scripts

Posted by Konstantin Boudnik <co...@apache.org>.
As a sys.admin with 8 years in the field I can only agree with the statement:
if one installs a software from a native OS package it is only expected that
the software and the package layout is inlined with the OS guidelines and
expectations.

So, I like proposed idea exactly for that exact reason: BigTop is trying to
provide a native OS experience for users who have chosen to use out stacks.
And let's make these stacks to work nicely with their OS of choice.

I would also like to bring to the attention of the community one simple fact:
BigTop (incubating) doesn't release binary artifacts, thus the structure of
BigTop (incubating) binaries shouldn't be a concern similar to one, that has
been harped back and forth on incubator-general list just recently.

Cos

On Thu, May 03, 2012 at 05:59PM, Andrew Purtell wrote:
> Matt:
> > It seems that this is carelessly discarding what is being practiced at a lot of
> > sites, because you'd rather do it this way. ═Generally speaking, we make
> > the Hadoop Ecosystem easier to use by adhering to familiar usage, rather
> > than diverging. ═Please give reasons for the divergence.
> 
> Roman:
> 
> > Basically there are 2 types of packages on any operating system -- the
> > ones that try to follow the rules of the underlying OS and the ones that
> > strive to be cross-platform at the expense of playing nice with an underlying
> > system. One type typically gets installed into all the right locations from
> > the FHS stand point and the other ones typically have a self-contained
> > trees down at /opt. Both are legitimate. HADOOP-6255 is neither.
> 
> As a longtime Linux administrator and Hadoop user, I feel comfortable stating that Hadoop as distributed by Apache is not packaged to conform to OS conventions. From ad hoc directory layouts to reams of inscrutable XML configuration to start/stop scripts each with slightly different semantics (is it "hdfs namenode start" or "hdfs namenode" or "hbase master start" or "hbase master" or ...) to incompatible copies of core jars littering the classpath... frankly, more like the crazy cousin from the country who moves in with city relatives and unpacks all of their stuff in the living room.
> 
> This is obviously a nonbinding but enthusiastic +1 for the approach I've seen here so far. One day I hope I no longer have to ls / find / grep all over the filesystem to extract unsuspecting dev teams from JAR hell, no matter if Apache Hadoop or Apache Pig or Apache Oozie or Apache HBase (guilty of the "Apache way" too) want to continue packaging all of their dependencies whatever they were at the time of 'mvn package' this week or that.
> 
> Best regards,
> 
> 
> ═ ═ - Andy
> 
> Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)

Re: Guarding against the use of upstream scripts

Posted by Andrew Purtell <ap...@apache.org>.
Matt:
> It seems that this is carelessly discarding what is being practiced at a lot of
> sites, because you'd rather do it this way.  Generally speaking, we make
> the Hadoop Ecosystem easier to use by adhering to familiar usage, rather
> than diverging.  Please give reasons for the divergence.

Roman:

> Basically there are 2 types of packages on any operating system -- the
> ones that try to follow the rules of the underlying OS and the ones that
> strive to be cross-platform at the expense of playing nice with an underlying
> system. One type typically gets installed into all the right locations from
> the FHS stand point and the other ones typically have a self-contained
> trees down at /opt. Both are legitimate. HADOOP-6255 is neither.

As a longtime Linux administrator and Hadoop user, I feel comfortable stating that Hadoop as distributed by Apache is not packaged to conform to OS conventions. From ad hoc directory layouts to reams of inscrutable XML configuration to start/stop scripts each with slightly different semantics (is it "hdfs namenode start" or "hdfs namenode" or "hbase master start" or "hbase master" or ...) to incompatible copies of core jars littering the classpath... frankly, more like the crazy cousin from the country who moves in with city relatives and unpacks all of their stuff in the living room.

This is obviously a nonbinding but enthusiastic +1 for the approach I've seen here so far. One day I hope I no longer have to ls / find / grep all over the filesystem to extract unsuspecting dev teams from JAR hell, no matter if Apache Hadoop or Apache Pig or Apache Oozie or Apache HBase (guilty of the "Apache way" too) want to continue packaging all of their dependencies whatever they were at the time of 'mvn package' this week or that.

Best regards,


    - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)

Re: Guarding against the use of upstream scripts

Posted by Roman Shaposhnik <rv...@apache.org>.
On Wed, May 2, 2012 at 11:44 AM, Matt Foley <mf...@hortonworks.com> wrote:
> What is the justification for saying "what we are doing in BigTop", when it
> is diverging from what has been done for years in the Components?

Matt, we both know that "for years" is a an exaggeration at best. The first
time a layout like the one currently in Hadoop appeared upstream was
when https://issues.apache.org/jira/browse/HADOOP-6255 got a patch.
This was a bit more than a year ago. Components didn't follow until much
later.

> It seems that this is carelessly discarding what is being practiced at a lot of
> sites, because you'd rather do it this way.  Generally speaking, we make
> the Hadoop Ecosystem easier to use by adhering to familiar usage, rather
> than diverging.  Please give reasons for the divergence.

I believe we had this discussion already, but perhaps it would be worth
reiterating: it is my fundamental opinion that HADOOP-6255  was a mistake.
I also believe that this opinion is shared by the Bigtop community.

Basically there are 2 types of packages on any operating system -- the
ones that try to follow the rules of the underlying OS and the ones that
strive to be cross-platform at the expense of playing nice with an underlying
system. One type typically gets installed into all the right locations from
the FHS stand point and the other ones typically have a self-contained
trees down at /opt. Both are legitimate. HADOOP-6255 is neither.

Bigtop, as a community, decided to integrate well with the system. That's
why we have folks like James Page working on the project. It would be
perfectly fine for upstream packaging efforts to pursue HADOOP-6255
as an alternative to system-integration packaging and provide features
like side-by-side parcel installs to facilitate rolling upgrades, etc. Yet
that is not happening. In fact, from where I stand upstream packaging
efforts feel somewhat abandoned.

Of course, we can't really tell you guys how to package your stuff
upstream (even though personally I think it is a wasted effort).

Thanks,
Roman.

Re: Guarding against the use of upstream scripts

Posted by Jos Backus <jo...@catnook.com>.
Fwiw, as far as starting/stopping the Hadoop daemons is concerned, I'm
still (slowly) working on creating run scripts for daemontools{,-encore}. I
plan on making the existing start/stop scripts wrappers around calls to the
daemontools utilities for backward compatibility. One of the the nice
things about daemontools is that it is very portable, unlike many other
solutions in that space such as Upstart, systemd and launchd.

Jos
-- 
Jos Backus
jos at catnook.com

Re: Guarding against the use of upstream scripts

Posted by Matt Foley <mf...@hortonworks.com>.
What is the justification for saying "what we are doing in BigTop", when it
is diverging from what has been done for years in the Components?  It seems
that this is carelessly discarding what is being practiced at a lot of
sites, because you'd rather do it this way.  Generally speaking, we make
the Hadoop Ecosystem easier to use by adhering to familiar usage, rather
than diverging.  Please give reasons for the divergence.

Thanks,
--Matt

On Wed, May 2, 2012 at 11:31 AM, Roman Shaposhnik <rv...@apache.org> wrote:

> On Wed, May 2, 2012 at 11:01 AM, Owen O'Malley <om...@apache.org> wrote:
> > On Wed, May 2, 2012 at 8:34 AM, Roman Shaposhnik <rv...@apache.org> wrote:
> >> Guys,
> >>
> >> I've noticed lately that new users of the Bigtop distro fall prey
> >> to thinking that they can simply utilized upstream launcher scripts
> >> as is by running them from under /usr/lib/<component>bin
> >> directories. Something this works. More often it doesn't.
> >
> > This will break users in two ways:
>
> Just to be clear (and sorry for not being specific in the original email):
> this
> only applies to trunk e.g. Hadoop 2.X based Bigtop.
>
> > 1. Many user scripts use $HADOOP_HOME/bin/X to find script X.
>
> Not sure I understand how this applies to Hadoop 2.X codeline.
> HADOOP_HOME has been deprecated there and in fact now
> that we have split the sub-projects it no longer makes even a
> backward-compatibility sense to maintain the illusion.
>
> > 2. Until those scripts are available in other locations, there is no
> > choice. For users that want to manage the servers manually, having
> > access to the hadoop-daemon.sh is required. Of course, it should be in
> > /usr/sbin/hadoop-daemon.sh, but until it is of course the users needs
> > to reference it in $HADOOP_HOME/sbin/hadoop-daemon.sh.
>
> Right, but the Bigtop layout is completely different anyway. It has
> very little to do with the upstream Hadoop 2.X layout.
>
> Please consider this in the context of what we are doing in Bigtop,
> not what happens with upstream Hadoop.
>
> Thanks,
> Roman.
>

Re: Guarding against the use of upstream scripts

Posted by Roman Shaposhnik <rv...@apache.org>.
On Wed, May 2, 2012 at 11:01 AM, Owen O'Malley <om...@apache.org> wrote:
> On Wed, May 2, 2012 at 8:34 AM, Roman Shaposhnik <rv...@apache.org> wrote:
>> Guys,
>>
>> I've noticed lately that new users of the Bigtop distro fall prey
>> to thinking that they can simply utilized upstream launcher scripts
>> as is by running them from under /usr/lib/<component>bin
>> directories. Something this works. More often it doesn't.
>
> This will break users in two ways:

Just to be clear (and sorry for not being specific in the original email): this
only applies to trunk e.g. Hadoop 2.X based Bigtop.

> 1. Many user scripts use $HADOOP_HOME/bin/X to find script X.

Not sure I understand how this applies to Hadoop 2.X codeline.
HADOOP_HOME has been deprecated there and in fact now
that we have split the sub-projects it no longer makes even a
backward-compatibility sense to maintain the illusion.

> 2. Until those scripts are available in other locations, there is no
> choice. For users that want to manage the servers manually, having
> access to the hadoop-daemon.sh is required. Of course, it should be in
> /usr/sbin/hadoop-daemon.sh, but until it is of course the users needs
> to reference it in $HADOOP_HOME/sbin/hadoop-daemon.sh.

Right, but the Bigtop layout is completely different anyway. It has
very little to do with the upstream Hadoop 2.X layout.

Please consider this in the context of what we are doing in Bigtop,
not what happens with upstream Hadoop.

Thanks,
Roman.

Re: Guarding against the use of upstream scripts

Posted by Owen O'Malley <om...@apache.org>.
On Wed, May 2, 2012 at 8:34 AM, Roman Shaposhnik <rv...@apache.org> wrote:
> Guys,
>
> I've noticed lately that new users of the Bigtop distro fall prey
> to thinking that they can simply utilized upstream launcher scripts
> as is by running them from under /usr/lib/<component>bin
> directories. Something this works. More often it doesn't.

This will break users in two ways:

1. Many user scripts use $HADOOP_HOME/bin/X to find script X.
2. Until those scripts are available in other locations, there is no
choice. For users that want to manage the servers manually, having
access to the hadoop-daemon.sh is required. Of course, it should be in
/usr/sbin/hadoop-daemon.sh, but until it is of course the users needs
to reference it in $HADOOP_HOME/sbin/hadoop-daemon.sh.

-- Owen