You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@slider.apache.org by Jean-Baptiste Note <jb...@gmail.com> on 2015/05/06 15:16:36 UTC

Packaging new apps

Hi folks,

Currently we're using Chef in our organization to deploy a lot of
infrastructure services around Hadoop. Of course it makes a lot of sense to
offer these as self-services on YARN using slider, but i'm looking at a
number of challenges. So please forgive the broad range of questions :)

I'm specifically intersted in deploying the following applications:
* HTTPFS service (see https://github.com/jbnote/httpfs-slider) & helpers
(nginx)
* Opentsdb & helpers (varnish)
* kafka (I had a look at koya)
* druid
* storm (fine, thanks !)
* hbase (fine, thanks !)

I'm facing a lot of issues with those services which are not yet packaged
correctly:

* httpfs/opentsdb are not released as standalone tarballs, contrary to all
services currently packaged. So i've butchered a tarball from Cloudera
RPMs, which is not satisfactory. How would you go about handling this ?

* KOYA has been talked a lot of, however the source i'm looking at (
https://github.com/DataTorrent/koya) is kind of disappointing, and activity
is a bit low -- would anyone know if dataTorrent is still committed to the
project ?

Last but not least, I'm wondering if there would already be a plan to
expose somehow (through an internal or an external service) the registry
through DNS (that's what we really use for service location for HTTPFS &
OpenTSDB). A bash polling script would certainly be sufficient for our
needs for now, but longer-term, we'd need to have a more robust solution.

Thanks a lot, kind regards,
JB

Re: Packaging new apps

Posted by Jean-Baptiste Note <jb...@gmail.com>.
Hi Gour,

Thanks a lot for the detailed answer, and the pointer to tomcat packaging,
which does half the work for httpfs.
I'll try to wrap properly unpacking of the RPM & extraction of the relevant
parts for slider packaging. That was my gripe; other than that, i can
launch httpfs services and flex them: slider is just awesome.

Kind regards,
JB

Re: Packaging new apps

Posted by Gour Saha <gs...@hortonworks.com>.
I might be wrong, but I sense there is a requirement here, where Slider
needs to accept custom application specific config files in it¹s original
raw format (like properties, xml, json, yaml, etc.) in addition to
appConfig.json. Then it is expected to merge them with appConfig.json and
send the complete property bag down to the application containers.

If that is true, or even if I got it all wrong, it would be great if you
can file JIRAs for what you are looking for? It is good to have these kind
of gaps and ideas captured in JIRAs, so that we can make Slider better.


Siyuan,
The instance tag feature has been there since 0.60. Check
https://issues.apache.org/jira/browse/SLIDER-463.

-Gour

On 5/11/15, 10:41 PM, "Thomas Weise" <th...@gmail.com> wrote:

>Jean,
>
>We pulled in your changes and added modifications on top of it. It appears
>we agree that we should not force the user to redefine the default values
>that ship with server.properties. Please see whether the properties merge
>as implemented works on your environment or not. If not, what is the
>Python
>version?
>
>We can find an alternative solution to in-place edit of server properties
>if and when needed. The file is an argument to the start script, hence we
>can do a copy before merge if necessary.
>
>Thomas
>
>
>On Mon, May 11, 2015 at 3:26 PM, hsy541@gmail.com <hs...@gmail.com>
>wrote:
>
>> Hi Jean,
>>
>> Thanks for the change, using instance tag(is it a new feature in the
>>latest
>> version? I didn't see it in the older slider versions) is a really good
>> idea.  it might be good for other's to have a template but not for
>>kafka.
>> Kafka is evolving in quite fast pace. I've seen many property key/val
>> change in last several releases. Our method is keep most properties
>>default
>> and only override the one declared in appConfig.json which is actually
>> supported in current python script(maybe need some change for the latest
>> slider).
>>
>> And  Kafka broker is bundled with local disk once it's launched so in
>>the
>> real world there would be at most one instance for each NM.
>>
>> Best,
>> Siyuan
>>
>>
>>
>> On Mon, May 11, 2015 at 10:16 AM, Jean-Baptiste Note <jb...@gmail.com>
>> wrote:
>>
>> > Hi Thomas,
>> >
>> > According to kafka's documentation:
>> > http://kafka.apache.org/07/configuration.html there should be a
>>default
>> > value for any added property; I would expect the provided
>> server.properties
>> > file to actually reflect those default values.
>> > Therefore, I'd look twice before overconstraining the problem, and
>>would
>> > just generate the file for those and only those dictionary values that
>> have
>> > been set in the appConfig (which currently, my code does not, it
>> configures
>> > too many properties statically, but it can be arranged), relying on
>>the
>> > default properties for the rest.
>> >
>> > If there's really a case to have all properties at hand, I could:
>> > * parse the properties file provided in the tarball
>> > * re-generate the whole conf file with the parsed + overrides
>> >
>> > This, in order to allow for *added* properties (which the current
>> schemes,
>> > either mine or yours, does not look to allow) AND ultimately, allow
>>for
>> the
>> > whole tarball installation to be switched to read-only (which could
>>allow
>> > them to be shared among instances running on the same NM; I don't
>>know if
>> > slider currently does this kind of optimization).
>> >
>> > Maybe guidance from people more familiar with slider than us would be
>> > needed here :)
>> >
>> > Kind regards,
>> > JB
>> >
>>


Re: Packaging new apps

Posted by Thomas Weise <th...@gmail.com>.
Jean,

We pulled in your changes and added modifications on top of it. It appears
we agree that we should not force the user to redefine the default values
that ship with server.properties. Please see whether the properties merge
as implemented works on your environment or not. If not, what is the Python
version?

We can find an alternative solution to in-place edit of server properties
if and when needed. The file is an argument to the start script, hence we
can do a copy before merge if necessary.

Thomas


On Mon, May 11, 2015 at 3:26 PM, hsy541@gmail.com <hs...@gmail.com> wrote:

> Hi Jean,
>
> Thanks for the change, using instance tag(is it a new feature in the latest
> version? I didn't see it in the older slider versions) is a really good
> idea.  it might be good for other's to have a template but not for kafka.
> Kafka is evolving in quite fast pace. I've seen many property key/val
> change in last several releases. Our method is keep most properties default
> and only override the one declared in appConfig.json which is actually
> supported in current python script(maybe need some change for the latest
> slider).
>
> And  Kafka broker is bundled with local disk once it's launched so in the
> real world there would be at most one instance for each NM.
>
> Best,
> Siyuan
>
>
>
> On Mon, May 11, 2015 at 10:16 AM, Jean-Baptiste Note <jb...@gmail.com>
> wrote:
>
> > Hi Thomas,
> >
> > According to kafka's documentation:
> > http://kafka.apache.org/07/configuration.html there should be a default
> > value for any added property; I would expect the provided
> server.properties
> > file to actually reflect those default values.
> > Therefore, I'd look twice before overconstraining the problem, and would
> > just generate the file for those and only those dictionary values that
> have
> > been set in the appConfig (which currently, my code does not, it
> configures
> > too many properties statically, but it can be arranged), relying on the
> > default properties for the rest.
> >
> > If there's really a case to have all properties at hand, I could:
> > * parse the properties file provided in the tarball
> > * re-generate the whole conf file with the parsed + overrides
> >
> > This, in order to allow for *added* properties (which the current
> schemes,
> > either mine or yours, does not look to allow) AND ultimately, allow for
> the
> > whole tarball installation to be switched to read-only (which could allow
> > them to be shared among instances running on the same NM; I don't know if
> > slider currently does this kind of optimization).
> >
> > Maybe guidance from people more familiar with slider than us would be
> > needed here :)
> >
> > Kind regards,
> > JB
> >
>

Re: Packaging new apps

Posted by "hsy541@gmail.com" <hs...@gmail.com>.
Hi Jean,

Thanks for the change, using instance tag(is it a new feature in the latest
version? I didn't see it in the older slider versions) is a really good
idea.  it might be good for other's to have a template but not for kafka.
Kafka is evolving in quite fast pace. I've seen many property key/val
change in last several releases. Our method is keep most properties default
and only override the one declared in appConfig.json which is actually
supported in current python script(maybe need some change for the latest
slider).

And  Kafka broker is bundled with local disk once it's launched so in the
real world there would be at most one instance for each NM.

Best,
Siyuan



On Mon, May 11, 2015 at 10:16 AM, Jean-Baptiste Note <jb...@gmail.com>
wrote:

> Hi Thomas,
>
> According to kafka's documentation:
> http://kafka.apache.org/07/configuration.html there should be a default
> value for any added property; I would expect the provided server.properties
> file to actually reflect those default values.
> Therefore, I'd look twice before overconstraining the problem, and would
> just generate the file for those and only those dictionary values that have
> been set in the appConfig (which currently, my code does not, it configures
> too many properties statically, but it can be arranged), relying on the
> default properties for the rest.
>
> If there's really a case to have all properties at hand, I could:
> * parse the properties file provided in the tarball
> * re-generate the whole conf file with the parsed + overrides
>
> This, in order to allow for *added* properties (which the current schemes,
> either mine or yours, does not look to allow) AND ultimately, allow for the
> whole tarball installation to be switched to read-only (which could allow
> them to be shared among instances running on the same NM; I don't know if
> slider currently does this kind of optimization).
>
> Maybe guidance from people more familiar with slider than us would be
> needed here :)
>
> Kind regards,
> JB
>

Re: Packaging new apps

Posted by Jean-Baptiste Note <jb...@gmail.com>.
Hi Thomas,

According to kafka's documentation:
http://kafka.apache.org/07/configuration.html there should be a default
value for any added property; I would expect the provided server.properties
file to actually reflect those default values.
Therefore, I'd look twice before overconstraining the problem, and would
just generate the file for those and only those dictionary values that have
been set in the appConfig (which currently, my code does not, it configures
too many properties statically, but it can be arranged), relying on the
default properties for the rest.

If there's really a case to have all properties at hand, I could:
* parse the properties file provided in the tarball
* re-generate the whole conf file with the parsed + overrides

This, in order to allow for *added* properties (which the current schemes,
either mine or yours, does not look to allow) AND ultimately, allow for the
whole tarball installation to be switched to read-only (which could allow
them to be shared among instances running on the same NM; I don't know if
slider currently does this kind of optimization).

Maybe guidance from people more familiar with slider than us would be
needed here :)

Kind regards,
JB

Re: Packaging new apps

Posted by Thomas Weise <th...@gmail.com>.
In order to work for different Kafka versions, it would be nice to pick
whatever server.properties the archive comes with and apply all the
properties that are defined in server.xml on top of it. Does that work for
you? We can look into making that merge work then.

Everything else looks great, thanks for the pull request!

Thomas


On Mon, May 11, 2015 at 8:21 AM, Jean-Baptiste Note <jb...@gmail.com>
wrote:

> There's a remark on the pull request about this, with more details than in
> this mail, but basically:
>
> * Other apps seem to regenerate the config files directly through a
> template rather than try to do a merge (you seem to be doing a SED on
> defined properties, however it does not work here, maybe a python version
> issue ?), so that's what I did for server.properties.
>
> Where I come from we use Chef, and redefine all configuration files
> anyways, so I was thinking of duplicating a standard configuration file in
> the appConfig-default.json (kind of duplicated from the tarball -- again
> all other packaged apps are doing it like this), and use Chef to regenerate
> all the appConfig.json in order to deploy infrastructure Kafka (and let
> users do whatever they wish based on the defaults).
> ​
> Kind regards,
> JB
>

Re: Packaging new apps

Posted by Jean-Baptiste Note <jb...@gmail.com>.
There's a remark on the pull request about this, with more details than in
this mail, but basically:

* Other apps seem to regenerate the config files directly through a
template rather than try to do a merge (you seem to be doing a SED on
defined properties, however it does not work here, maybe a python version
issue ?), so that's what I did for server.properties.

Where I come from we use Chef, and redefine all configuration files
anyways, so I was thinking of duplicating a standard configuration file in
the appConfig-default.json (kind of duplicated from the tarball -- again
all other packaged apps are doing it like this), and use Chef to regenerate
all the appConfig.json in order to deploy infrastructure Kafka (and let
users do whatever they wish based on the defaults).
​
Kind regards,
JB

Re: Packaging new apps

Posted by Thomas Weise <th...@gmail.com>.
Excellent, will look the pull request shortly. Any thoughts on merging the
server properties defined into the slider config into the server.properties
that came with the Kafka archive?

Thomas

On Mon, May 11, 2015 at 8:10 AM, Jean-Baptiste Note <jb...@gmail.com>
wrote:

> Hi Thomas,
>
> This is because the app_container_tag is unique under each resource.
> Given your two brokers are on separate resources BROKER0 and BROKER1, they
> get identical (1) container_tag.
>
> You should set them in the same resource (BROKER), and the numbering will
> be sequential. No idea how it behaves on container restart, however this is
> good enough to start and flex a kafka cluster here.
>
> I've sent your a pull request on github showing how I did. There's no
> pretention for actual merge, but if you want it, I can amend for inclusion
> to your leasure.
>
> Kind regards,
> JB
>

Re: Packaging new apps

Posted by Jean-Baptiste Note <jb...@gmail.com>.
Hi Thomas,

This is because the app_container_tag is unique under each resource.
Given your two brokers are on separate resources BROKER0 and BROKER1, they
get identical (1) container_tag.

You should set them in the same resource (BROKER), and the numbering will
be sequential. No idea how it behaves on container restart, however this is
good enough to start and flex a kafka cluster here.

I've sent your a pull request on github showing how I did. There's no
pretention for actual merge, but if you want it, I can amend for inclusion
to your leasure.

Kind regards,
JB

Re: Packaging new apps

Posted by Thomas Weise <th...@gmail.com>.
Hi Jean,

Indeed we would like to use component instances as you outline. So far, I
have not found a way to derive the Kafka server id from the Slider
configuration. I checked on my cluster and I find 2 containers using the
same app_container_tag in the logs:

u'componentName': u'BROKER1',
 u'configurations': {u'BROKER-COMMON': {u'broker.id': u'1',
                                        u'zookeeper.connect':
u'node26:2181,node27:2181,node28:2181'},
                     u'BROKER0': {u'broker.id': u'0'},
                     u'BROKER1': {u'broker.id': u'1'},
                     u'global': {u'app_container_id': u'container
_1430350563654_0416_01_000003',
                                 u'app_container_tag': u'1',

--------------------

u'componentName': u'BROKER0',
 u'configurations': {u'BROKER-COMMON': {u'broker.id': u'0',
                                        u'zookeeper.connect':
u'node26:2181,node27:2181,node28:2181'},
                     u'BROKER0': {u'broker.id': u'0'},
                     u'BROKER1': {u'broker.id': u'1'},
                     u'global': {u'app_container_id': u'container
_1430350563654_0416_01_000009',
                                 u'app_container_tag': u'1',

Any other ideas how to obtain the component instance index that works
across container failures?

Thanks,
Thomas


On Mon, May 11, 2015 at 1:44 AM, Jean-Baptiste Note <jb...@gmail.com>
wrote:

> Hi Thomas,
>
> Thanks a lot for the updates you brought to the main Koya repository.
>
> I saw and can see you're still declaring a resource for each broker. This
> is painful as it means modifying your metainfo & possibly resource.json in
> case you want to grow your cluster, say beyond 10 machines :)
>
> Wouldn't it more logically fit into slider to declare one server.xml
> configuration, one resource type, and actually flex the application / play
> with the instance # to grow it ?
> I saw from Gour's comment that you were concerned about unique id
> generation. Maybe using the app_container_tag would be a good starting
> point ?
> For what it's worth, it seemed to work out properly for me.
>
> Kind regards,
> JB
>

Re: Packaging new apps

Posted by Jean-Baptiste Note <jb...@gmail.com>.
Hi Thomas,

Thanks a lot for the updates you brought to the main Koya repository.

I saw and can see you're still declaring a resource for each broker. This
is painful as it means modifying your metainfo & possibly resource.json in
case you want to grow your cluster, say beyond 10 machines :)

Wouldn't it more logically fit into slider to declare one server.xml
configuration, one resource type, and actually flex the application / play
with the instance # to grow it ?
I saw from Gour's comment that you were concerned about unique id
generation. Maybe using the app_container_tag would be a good starting
point ?
For what it's worth, it seemed to work out properly for me.

Kind regards,
JB

Re: Packaging new apps

Posted by Thomas Weise <th...@gmail.com>.
Jean,

You will see updates in the KOYA repository soon. As part of that we will
move up to the latest release of Slider and also document the configuration
process.

Thanks,
Thomas




On Thu, May 7, 2015 at 5:52 PM, Gour Saha <gs...@hortonworks.com> wrote:

> Hi Jean,
>
> Please see answers inline.
>
> -Gour
>
> On 5/6/15, 6:16 AM, "Jean-Baptiste Note" <jbnote@gmail.com<mailto:
> jbnote@gmail.com>> wrote:
>
> Hi folks,
>
> Currently we're using Chef in our organization to deploy a lot of
> infrastructure services around Hadoop. Of course it makes a lot of sense to
> offer these as self-services on YARN using slider, but i'm looking at a
> number of challenges. So please forgive the broad range of questions :)
>
> I'm specifically intersted in deploying the following applications:
> * HTTPFS service (see https://github.com/jbnote/httpfs-slider) & helpers
> (nginx)
> * Opentsdb & helpers (varnish)
> * kafka (I had a look at koya)
> * druid
> * storm (fine, thanks !)
> * hbase (fine, thanks !)
>
> I'm facing a lot of issues with those services which are not yet packaged
> correctly:
>
> * httpfs/opentsdb are not released as standalone tarballs, contrary to all
> services currently packaged. So i've butchered a tarball from Cloudera
> RPMs, which is not satisfactory. How would you go about handling this ?
>
> Not sure exactly what you mean, by saying "handling this". If you are
> referring to a way to create a Slider package of an app in rpm format, then
> there are challenges, such as rpm install requires root access and YARN
> does not allow that. If you are referring to an issue you are facing with
> deploying the Slider app (now that you have created a tarball), can you
> share what issues you are facing?
>
> You might also want to take a look at this tomcat Slider package. Caution:
> It is not ready for prime-time and has few issues which needs to be
> resolved. But the scripts and metadata files might be a helpful reference.
> https://issues.apache.org/jira/browse/SLIDER-809
>
> https://github.com/apache/incubator-slider/tree/feature/SLIDER-809-tomcat-app-package/app-packages/tomcat
>
>
>
> * KOYA has been talked a lot of, however the source i'm looking at (
> https://github.com/DataTorrent/koya) is kind of disappointing, and
> activity
> is a bit low -- would anyone know if dataTorrent is still committed to the
> project ?
>
> What issues are you facing with KOYA? DataTorrent gave a presentation of
> KOYA and Slider seems to have fit their need so far. They wanted few
> features around data locality (strict placement) which will be there in
> 0.80.0 release AND unique ids which still needs some work to be done.
>
>
> Last but not least, I'm wondering if there would already be a plan to
> expose somehow (through an internal or an external service) the registry
> through DNS (that's what we really use for service location for HTTPFS &
> OpenTSDB). A bash polling script would certainly be sufficient for our
> needs for now, but longer-term, we'd need to have a more robust solution.
>
> Registry and REST APIs on registry comes directly from YARN -
> https://issues.apache.org/jira/browse/YARN-913
> https://issues.apache.org/jira/browse/YARN-2948
>
> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/registry/yarn-registry.html
>
>
>
> Thanks a lot, kind regards,
> JB
>
>

Re: Packaging new apps

Posted by Jean-Baptiste Note <jb...@gmail.com>.
Hi Steve,

Thanks a lot for your reply from your very busy schedule.

Actually we'll get away with a python daemon watching zookeeper and doing
dynamic DNS updates.
This seems easy enough and probably more palatable than duplicating a full
DNS server (i'm on the operations side ;)).
I'll keep you posted as we'll probably share this work.

Kind regards,
JB

Re: Packaging new apps

Posted by Steve Loughran <st...@hortonworks.com>.
> On 8 May 2015, at 01:52, Gour Saha <gs...@hortonworks.com> wrote:
> 
> Last but not least, I'm wondering if there would already be a plan to
> expose somehow (through an internal or an external service) the registry
> through DNS (that's what we really use for service location for HTTPFS &
> OpenTSDB). A bash polling script would certainly be sufficient for our
> needs for now, but longer-term, we'd need to have a more robust solution.
> 
> Registry and REST APIs on registry comes directly from YARN -
> https://issues.apache.org/jira/browse/YARN-913
> https://issues.apache.org/jira/browse/YARN-2948
> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/registry/yarn-registry.html

DNS support is always something that's been considered; it's why the paths in the registry spec are required to be valid DNS names (though the check is actually disabled primarily because usernames aren't and punycoding doesn't address things like spaces in names, just high-unicode characters.

We held back on this originally due to (a) need to scope things for hadoop 2.6 and (b) worries about how operations teams will like more DNS servers popping up in the organisation. I think we can try to do the DNS -it just needs someone to sit down and to it. I'm afraid my todo list is already full

I'd like to wrap up the registry stuff with an HTTP service that can be deployable at a fixed location; we have this in slider but it's there to show its possible more than anything else (because it moves around).




Re: Packaging new apps

Posted by Gour Saha <gs...@hortonworks.com>.
Hi Jean,

Please see answers inline.

-Gour

On 5/6/15, 6:16 AM, "Jean-Baptiste Note" <jb...@gmail.com>> wrote:

Hi folks,

Currently we're using Chef in our organization to deploy a lot of
infrastructure services around Hadoop. Of course it makes a lot of sense to
offer these as self-services on YARN using slider, but i'm looking at a
number of challenges. So please forgive the broad range of questions :)

I'm specifically intersted in deploying the following applications:
* HTTPFS service (see https://github.com/jbnote/httpfs-slider) & helpers
(nginx)
* Opentsdb & helpers (varnish)
* kafka (I had a look at koya)
* druid
* storm (fine, thanks !)
* hbase (fine, thanks !)

I'm facing a lot of issues with those services which are not yet packaged
correctly:

* httpfs/opentsdb are not released as standalone tarballs, contrary to all
services currently packaged. So i've butchered a tarball from Cloudera
RPMs, which is not satisfactory. How would you go about handling this ?

Not sure exactly what you mean, by saying "handling this". If you are referring to a way to create a Slider package of an app in rpm format, then there are challenges, such as rpm install requires root access and YARN does not allow that. If you are referring to an issue you are facing with deploying the Slider app (now that you have created a tarball), can you share what issues you are facing?

You might also want to take a look at this tomcat Slider package. Caution: It is not ready for prime-time and has few issues which needs to be resolved. But the scripts and metadata files might be a helpful reference.
https://issues.apache.org/jira/browse/SLIDER-809
https://github.com/apache/incubator-slider/tree/feature/SLIDER-809-tomcat-app-package/app-packages/tomcat



* KOYA has been talked a lot of, however the source i'm looking at (
https://github.com/DataTorrent/koya) is kind of disappointing, and activity
is a bit low -- would anyone know if dataTorrent is still committed to the
project ?

What issues are you facing with KOYA? DataTorrent gave a presentation of KOYA and Slider seems to have fit their need so far. They wanted few features around data locality (strict placement) which will be there in 0.80.0 release AND unique ids which still needs some work to be done.


Last but not least, I'm wondering if there would already be a plan to
expose somehow (through an internal or an external service) the registry
through DNS (that's what we really use for service location for HTTPFS &
OpenTSDB). A bash polling script would certainly be sufficient for our
needs for now, but longer-term, we'd need to have a more robust solution.

Registry and REST APIs on registry comes directly from YARN -
https://issues.apache.org/jira/browse/YARN-913
https://issues.apache.org/jira/browse/YARN-2948
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/registry/yarn-registry.html



Thanks a lot, kind regards,
JB