You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bigtop.apache.org by Konstantin Boudnik <co...@apache.org> on 2018/10/11 17:25:16 UTC

Packaging things to Docker

Well, finally I came around and started working on the long-awaiting feature
for Bigtop, where one would be able to quickly build a container with an
arbitrary set of components in it for further orchestration.

The ideas was to have components in different layers, so they could be put
combined together for desired effect. Say there are layers with:
  1 hdfs
  2 hive
  3 spark
  4 hbase
  5 ignite
  6 yarn
and so on....

If one wants to assemble a spark only cluster there would be a way to layer up
3 and 1 (ideally, 3's dependency to 1 would be automatically calculated) and
boom - there's an image, which would be put to use. The number of combination
might be greater, of course. E.g. 3-6-1, or 4-2-1-6 and so forth.

Turned out, that I can't "prebuild" those layers as Docker won't allow you to
combine separate images to one ;( However, there's still a way to achieve a
similar effect. All I need to do is to create a set of tar-ball containing all
bits of particular components, i.e. all bits of spark or hive. When an image
needs to be build, these tarballs would be used to layer the software on top
of the base image and each other. In the above example, Dockerfile would look
something like

    FROM ubuntu:16.04
    ADD hdfs-all.tar /tmp
    RUN tar xf /tmp/hdfs-all.tar 
    ADD spark-all.tar /tmp
    RUN tar xf /tmp/spark-all.tar 

Once the images is generated, the orchestration and configuration phases will
kick in. At which point a docker-based cluster would be all ready to go.

Do you guys see any value in this approach comparing to the current
package-based way of managing things? 

Appreciate any thoughts!
--
  Cos

P.S. BTW, I guess I have a decent answer to all those asking for tar-ball
installation artifacts. It is as easy as running 
    dpkg-deb -xv
on all packages and then tar'ing up the resulted set of files.


Re: Packaging things to Docker

Posted by Olaf Flebbe <of...@oflebbe.de>.
hi 

i came to the conclusion as well, but more from a security view: kerberos has a serious impedance mismatch to containers. In order to generate trust you need a stable and predictable adressing/naming of peers. Kerberos uses DNS for it, at least in the default config, but DNS names for docker containers are not FQDN and not predictable. kerberos may be configured not to rely on host or dns names to use other naming services, but thats so advanced topic, i am giving up. without kerberos we are running hadoop insecure, which is --- insecure.


Von meinem iPad gesendet

> Am 13.11.2018 um 04:10 schrieb Evans Ye <ev...@apache.org>:
> 
> Cos,
> 
> How about shifting the focus to make deploy bigtop stack on K8S or Swarm
> super easy?
> OTOH, I came across this talk recently, but haven't have time to walk it
> through:
> 
> Moving the Oath Grid to Docker, Eric Badger, Software Developer Engineer,
> Oath [1]
> 
> Maybe we can learn something form it.
> I'll have time to spend on Bigtop for the rest of 2018 since mid November.
> Let me also try to dig into it.
> 
> Evans
> 
> [1]
> https://www.youtube.com/watch?v=BLennQg7Zww&index=3&list=PLXXo-3b5ZQ1j8NkSAICiUhig6z8s998QQ&t=0s
> 
> 
> Konstantin Boudnik <co...@apache.org> 於 2018年11月13日 週二 上午2:24寫道:
> 
>> well, just to close the loop....
>> 
>> I've dived deeper into this whole Hadoop on containers story. And
>> unfortunately, I have came to the conclusion that it doesn't make that
>> much sense. Building separate images per a component and orchestrating
>> it through k8s or Swarm doesn't solve anything, but adds a lot of
>> hassle. Using this approach as a sort of packaging technique also
>> doesn't add much to a developer or an admin.
>> 
>> We already have this mechanism where one can create a docker image
>> with an arbitrary set of components and deploy a cluster using
>> different images like this. It is good enough for most of the cases
>> where it makes sense to deploy Hadoop stack from containers.
>> 
>> Hence, I decided to pull off of this project. But it someone else can
>> thing of a better way doing this sort of things, I would be happy to
>> join hands.
>> 
>> --
>>  With regards,
>> Konstantin (Cos) Boudnik
>> 2CAC 8312 4870 D885 8616  6115 220F 6980 1F27 E622
>> 
>> Disclaimer: Opinions expressed in this email are those of the author,
>> and do not necessarily represent the views of any company the author
>> might be affiliated with at the moment of writing.
>> 
>> 
>> On Thu, Oct 18, 2018 at 2:26 PM, Konstantin Boudnik <co...@apache.org>
>> wrote:
>>> Indeed, to be heard you don't need to be a committer: they aren't some
>> sort of
>>> privileged class here ;)
>>> 
>>> Anyway, back to this discussion and answering some of the concerns by
>> Evans.
>>> Tarballs aren't a key requirement for this approach: I was using tarballs
>>> built from debs to cut some corners and not to change any of the Puppet
>>> recipes while I am certain my experiment shows something viable. My first
>>> intention was, of course, to use our packages. But I couldn't think of
>> any
>>> clever way to avoid pulling all install-time dependencies without a
>> massive
>>> rewrite of the packages. E.g. it won't be possible to install just Spark
>>> without pulling in YARN or HDFS dependencies. And will undermine the
>> idea of
>>> component-specific images (or layers, as I've called them in the OP).
>>> 
>>> In fact, you know my stance on the whole tarball thing: I've been
>> pushing back
>>> on parcel-like approach since I can remember. I still think it's a
>> horrible
>>> idea to produce tarballs as a first-class artifacts. There's plenty of
>> reasons
>>> for this which are out of the scope of this conversation.
>>> 
>>> Speaking of use-cases: as both Mikhail and pointed out, it is intended
>> for
>>> something like Swarm or K8S (basically, anything that can orchestrate
>>> containers into something meaningful at scale).
>>> 
>>> Much like Mikhail suggested, mixing base-layers would achieve my idea of
>>> piling up components on top of each other in order to create different
>>> special purpose or function roles. I guess it is much like sandbox you've
>>> mentioned, but without the hassle of creating whole stack for each new
>>> combination of the components. I will look more closely to the Swarm
>> thing in
>>> the next day or so.
>>> 
>>> Thanks guys!
>>>  Cos
>>> 
>>> 
>>>> On Tue, Oct 16, 2018 at 01:38AM, Evans Ye wrote:
>>>> To Mikhail:
>>>> It never has to be committer to join the discuss. Welcome to share any
>> idea
>>>> you have :)
>>>> 
>>>> To reply in all:
>>>> This might be tangential, but I just want to bring in more information.
>>>> Currently we have Docker Provisioner and Docker Sandbox(experimental)
>>>> features inside Bigtop.
>>>> Which:
>>>> 1. Provisioner: install RPM/DEB via Puppet on the fly when creating
>> cluster
>>>> 2. Sandbox: pre-install RPM/DEB via Puppet as special purposed stack(say
>>>> HDFS+SPARK) and save as an images
>>>> 
>>>> Non of the above go for tarball because they're built to tale around
>> bigtop
>>>> RPM/DEB packages, which might be the most valuable thing we produce. I
>>>> don't mean when can't ditch packages, but we have to come up with
>>>> considerations to cover the whole picture, say:
>>>> 
>>>> 1. Where does the tarball from? Is it from upstream directly or
>> produced by
>>>> Bigtop with self-patches for compatibility fixes?
>>>> 2. If we'd like to support install from tarball, how will the
>> orchestration
>>>> tool(Puppet) being shaped? Is it going to support both RPM/DEB and
>> tarball
>>>> or just the new one?
>>>> 3. What's the purpose of producing docker images in this new way? If we
>> can
>>>> made it supported to run on K8S, that's a perfect use case!
>>>> 
>>>> Overall, I champion to have this new feature in Bigtop, but just want to
>>>> bring up something more for discussion :)
>>>> 
>>>> Evans
>>>> 
>>>> Mikhail Epikhin <mi...@epikhin.net> 於 2018年10月15日 週一 下午5:04寫道:
>>>> 
>>>>> Looks very interesting!
>>>>> 
>>>>> Sorry for breaking into discussion, i'm not a commiter, just yet
>> another
>>>>> user, but..
>>>>> 
>>>>> As you wrote, docker doesn't fit this well.
>>>>> The problem is that you tried to push all components into one
>> container,
>>>>> and you lost immutability of image.
>>>>> I fully agree and understand this way for production, for more local
>>>>> connectivity, but docker containers doesn't have big difference for
>> run
>>>>> this hive, spark, hdfs on one docker container, or on many different.
>>>>> Anyway, their using network for connectivity, and you compare
>> connectivity
>>>>> inside one container and connectivity between many containers on one
>> local
>>>>> machine.
>>>>> 
>>>>> They all run on one single machine, and if you create own container
>> for
>>>>> each component hdfs, hive, spark, hbase, yarn its good fitting to
>> docker
>>>>> model.
>>>>> 
>>>>> Futher, you can create environment using docker-compose for mixing
>> this
>>>>> base layers [hdfs, hive, spark, hhbase, ignite] as you wish.
>>>>> 
>>>>> Just create a set of base images and templating script for creating
>>>>> docker-compose.yml for connect their.
>>>>> 
>>>>> Futher, if you want to simulate many-nodes cluster -- you can do it
>> just
>>>>> writting new docker-compose.yaml. You can test High Availability, HDFS
>>>>> decommission, or anything you want just write your own
>> docker-compose.yaml.
>>>>> 
>>>>> --
>>>>> Mikhail Epikhin
>>>>> 
>>>>>> On Thu, Oct 11, 2018, at 20:25, Konstantin Boudnik wrote:
>>>>>> Well, finally I came around and started working on the long-awaiting
>>>>> feature
>>>>>> for Bigtop, where one would be able to quickly build a container
>> with an
>>>>>> arbitrary set of components in it for further orchestration.
>>>>>> 
>>>>>> The ideas was to have components in different layers, so they could
>> be
>>>>> put
>>>>>> combined together for desired effect. Say there are layers with:
>>>>>>  1 hdfs
>>>>>>  2 hive
>>>>>>  3 spark
>>>>>>  4 hbase
>>>>>>  5 ignite
>>>>>>  6 yarn
>>>>>> and so on....
>>>>>> 
>>>>>> If one wants to assemble a spark only cluster there would be a way
>> to
>>>>> layer up
>>>>>> 3 and 1 (ideally, 3's dependency to 1 would be automatically
>> calculated)
>>>>> and
>>>>>> boom - there's an image, which would be put to use. The number of
>>>>> combination
>>>>>> might be greater, of course. E.g. 3-6-1, or 4-2-1-6 and so forth.
>>>>>> 
>>>>>> Turned out, that I can't "prebuild" those layers as Docker won't
>> allow
>>>>> you to
>>>>>> combine separate images to one ;( However, there's still a way to
>>>>> achieve a
>>>>>> similar effect. All I need to do is to create a set of tar-ball
>>>>> containing all
>>>>>> bits of particular components, i.e. all bits of spark or hive. When
>> an
>>>>> image
>>>>>> needs to be build, these tarballs would be used to layer the
>> software on
>>>>> top
>>>>>> of the base image and each other. In the above example, Dockerfile
>> would
>>>>> look
>>>>>> something like
>>>>>> 
>>>>>>    FROM ubuntu:16.04
>>>>>>    ADD hdfs-all.tar /tmp
>>>>>>    RUN tar xf /tmp/hdfs-all.tar
>>>>>>    ADD spark-all.tar /tmp
>>>>>>    RUN tar xf /tmp/spark-all.tar
>>>>>> 
>>>>>> Once the images is generated, the orchestration and configuration
>> phases
>>>>> will
>>>>>> kick in. At which point a docker-based cluster would be all ready
>> to go.
>>>>>> 
>>>>>> Do you guys see any value in this approach comparing to the current
>>>>>> package-based way of managing things?
>>>>>> 
>>>>>> Appreciate any thoughts!
>>>>>> --
>>>>>>  Cos
>>>>>> 
>>>>>> P.S. BTW, I guess I have a decent answer to all those asking for
>> tar-ball
>>>>>> installation artifacts. It is as easy as running
>>>>>>    dpkg-deb -xv
>>>>>> on all packages and then tar'ing up the resulted set of files.
>>>>>> 
>>>>>> Email had 1 attachment:
>>>>>> + signature.asc
>>>>>>  1k (application/pgp-signature)
>>>>> 
>> 


Re: Packaging things to Docker

Posted by Evans Ye <ev...@apache.org>.
Cos,

How about shifting the focus to make deploy bigtop stack on K8S or Swarm
super easy?
OTOH, I came across this talk recently, but haven't have time to walk it
through:

Moving the Oath Grid to Docker, Eric Badger, Software Developer Engineer,
Oath [1]

Maybe we can learn something form it.
I'll have time to spend on Bigtop for the rest of 2018 since mid November.
Let me also try to dig into it.

Evans

[1]
https://www.youtube.com/watch?v=BLennQg7Zww&index=3&list=PLXXo-3b5ZQ1j8NkSAICiUhig6z8s998QQ&t=0s


Konstantin Boudnik <co...@apache.org> 於 2018年11月13日 週二 上午2:24寫道:

> well, just to close the loop....
>
> I've dived deeper into this whole Hadoop on containers story. And
> unfortunately, I have came to the conclusion that it doesn't make that
> much sense. Building separate images per a component and orchestrating
> it through k8s or Swarm doesn't solve anything, but adds a lot of
> hassle. Using this approach as a sort of packaging technique also
> doesn't add much to a developer or an admin.
>
> We already have this mechanism where one can create a docker image
> with an arbitrary set of components and deploy a cluster using
> different images like this. It is good enough for most of the cases
> where it makes sense to deploy Hadoop stack from containers.
>
> Hence, I decided to pull off of this project. But it someone else can
> thing of a better way doing this sort of things, I would be happy to
> join hands.
>
> --
>   With regards,
> Konstantin (Cos) Boudnik
> 2CAC 8312 4870 D885 8616  6115 220F 6980 1F27 E622
>
> Disclaimer: Opinions expressed in this email are those of the author,
> and do not necessarily represent the views of any company the author
> might be affiliated with at the moment of writing.
>
>
> On Thu, Oct 18, 2018 at 2:26 PM, Konstantin Boudnik <co...@apache.org>
> wrote:
> > Indeed, to be heard you don't need to be a committer: they aren't some
> sort of
> > privileged class here ;)
> >
> > Anyway, back to this discussion and answering some of the concerns by
> Evans.
> > Tarballs aren't a key requirement for this approach: I was using tarballs
> > built from debs to cut some corners and not to change any of the Puppet
> > recipes while I am certain my experiment shows something viable. My first
> > intention was, of course, to use our packages. But I couldn't think of
> any
> > clever way to avoid pulling all install-time dependencies without a
> massive
> > rewrite of the packages. E.g. it won't be possible to install just Spark
> > without pulling in YARN or HDFS dependencies. And will undermine the
> idea of
> > component-specific images (or layers, as I've called them in the OP).
> >
> > In fact, you know my stance on the whole tarball thing: I've been
> pushing back
> > on parcel-like approach since I can remember. I still think it's a
> horrible
> > idea to produce tarballs as a first-class artifacts. There's plenty of
> reasons
> > for this which are out of the scope of this conversation.
> >
> > Speaking of use-cases: as both Mikhail and pointed out, it is intended
> for
> > something like Swarm or K8S (basically, anything that can orchestrate
> > containers into something meaningful at scale).
> >
> > Much like Mikhail suggested, mixing base-layers would achieve my idea of
> > piling up components on top of each other in order to create different
> > special purpose or function roles. I guess it is much like sandbox you've
> > mentioned, but without the hassle of creating whole stack for each new
> > combination of the components. I will look more closely to the Swarm
> thing in
> > the next day or so.
> >
> > Thanks guys!
> >   Cos
> >
> >
> > On Tue, Oct 16, 2018 at 01:38AM, Evans Ye wrote:
> >> To Mikhail:
> >> It never has to be committer to join the discuss. Welcome to share any
> idea
> >> you have :)
> >>
> >> To reply in all:
> >> This might be tangential, but I just want to bring in more information.
> >> Currently we have Docker Provisioner and Docker Sandbox(experimental)
> >> features inside Bigtop.
> >> Which:
> >> 1. Provisioner: install RPM/DEB via Puppet on the fly when creating
> cluster
> >> 2. Sandbox: pre-install RPM/DEB via Puppet as special purposed stack(say
> >> HDFS+SPARK) and save as an images
> >>
> >> Non of the above go for tarball because they're built to tale around
> bigtop
> >> RPM/DEB packages, which might be the most valuable thing we produce. I
> >> don't mean when can't ditch packages, but we have to come up with
> >> considerations to cover the whole picture, say:
> >>
> >> 1. Where does the tarball from? Is it from upstream directly or
> produced by
> >> Bigtop with self-patches for compatibility fixes?
> >> 2. If we'd like to support install from tarball, how will the
> orchestration
> >> tool(Puppet) being shaped? Is it going to support both RPM/DEB and
> tarball
> >> or just the new one?
> >> 3. What's the purpose of producing docker images in this new way? If we
> can
> >> made it supported to run on K8S, that's a perfect use case!
> >>
> >> Overall, I champion to have this new feature in Bigtop, but just want to
> >> bring up something more for discussion :)
> >>
> >> Evans
> >>
> >> Mikhail Epikhin <mi...@epikhin.net> 於 2018年10月15日 週一 下午5:04寫道:
> >>
> >> > Looks very interesting!
> >> >
> >> > Sorry for breaking into discussion, i'm not a commiter, just yet
> another
> >> > user, but..
> >> >
> >> > As you wrote, docker doesn't fit this well.
> >> > The problem is that you tried to push all components into one
> container,
> >> > and you lost immutability of image.
> >> > I fully agree and understand this way for production, for more local
> >> > connectivity, but docker containers doesn't have big difference for
> run
> >> > this hive, spark, hdfs on one docker container, or on many different.
> >> > Anyway, their using network for connectivity, and you compare
> connectivity
> >> > inside one container and connectivity between many containers on one
> local
> >> > machine.
> >> >
> >> > They all run on one single machine, and if you create own container
> for
> >> > each component hdfs, hive, spark, hbase, yarn its good fitting to
> docker
> >> > model.
> >> >
> >> > Futher, you can create environment using docker-compose for mixing
> this
> >> > base layers [hdfs, hive, spark, hhbase, ignite] as you wish.
> >> >
> >> > Just create a set of base images and templating script for creating
> >> > docker-compose.yml for connect their.
> >> >
> >> > Futher, if you want to simulate many-nodes cluster -- you can do it
> just
> >> > writting new docker-compose.yaml. You can test High Availability, HDFS
> >> > decommission, or anything you want just write your own
> docker-compose.yaml.
> >> >
> >> > --
> >> > Mikhail Epikhin
> >> >
> >> > On Thu, Oct 11, 2018, at 20:25, Konstantin Boudnik wrote:
> >> > > Well, finally I came around and started working on the long-awaiting
> >> > feature
> >> > > for Bigtop, where one would be able to quickly build a container
> with an
> >> > > arbitrary set of components in it for further orchestration.
> >> > >
> >> > > The ideas was to have components in different layers, so they could
> be
> >> > put
> >> > > combined together for desired effect. Say there are layers with:
> >> > >   1 hdfs
> >> > >   2 hive
> >> > >   3 spark
> >> > >   4 hbase
> >> > >   5 ignite
> >> > >   6 yarn
> >> > > and so on....
> >> > >
> >> > > If one wants to assemble a spark only cluster there would be a way
> to
> >> > layer up
> >> > > 3 and 1 (ideally, 3's dependency to 1 would be automatically
> calculated)
> >> > and
> >> > > boom - there's an image, which would be put to use. The number of
> >> > combination
> >> > > might be greater, of course. E.g. 3-6-1, or 4-2-1-6 and so forth.
> >> > >
> >> > > Turned out, that I can't "prebuild" those layers as Docker won't
> allow
> >> > you to
> >> > > combine separate images to one ;( However, there's still a way to
> >> > achieve a
> >> > > similar effect. All I need to do is to create a set of tar-ball
> >> > containing all
> >> > > bits of particular components, i.e. all bits of spark or hive. When
> an
> >> > image
> >> > > needs to be build, these tarballs would be used to layer the
> software on
> >> > top
> >> > > of the base image and each other. In the above example, Dockerfile
> would
> >> > look
> >> > > something like
> >> > >
> >> > >     FROM ubuntu:16.04
> >> > >     ADD hdfs-all.tar /tmp
> >> > >     RUN tar xf /tmp/hdfs-all.tar
> >> > >     ADD spark-all.tar /tmp
> >> > >     RUN tar xf /tmp/spark-all.tar
> >> > >
> >> > > Once the images is generated, the orchestration and configuration
> phases
> >> > will
> >> > > kick in. At which point a docker-based cluster would be all ready
> to go.
> >> > >
> >> > > Do you guys see any value in this approach comparing to the current
> >> > > package-based way of managing things?
> >> > >
> >> > > Appreciate any thoughts!
> >> > > --
> >> > >   Cos
> >> > >
> >> > > P.S. BTW, I guess I have a decent answer to all those asking for
> tar-ball
> >> > > installation artifacts. It is as easy as running
> >> > >     dpkg-deb -xv
> >> > > on all packages and then tar'ing up the resulted set of files.
> >> > >
> >> > > Email had 1 attachment:
> >> > > + signature.asc
> >> > >   1k (application/pgp-signature)
> >> >
>

Re: Packaging things to Docker

Posted by Konstantin Boudnik <co...@apache.org>.
well, just to close the loop....

I've dived deeper into this whole Hadoop on containers story. And
unfortunately, I have came to the conclusion that it doesn't make that
much sense. Building separate images per a component and orchestrating
it through k8s or Swarm doesn't solve anything, but adds a lot of
hassle. Using this approach as a sort of packaging technique also
doesn't add much to a developer or an admin.

We already have this mechanism where one can create a docker image
with an arbitrary set of components and deploy a cluster using
different images like this. It is good enough for most of the cases
where it makes sense to deploy Hadoop stack from containers.

Hence, I decided to pull off of this project. But it someone else can
thing of a better way doing this sort of things, I would be happy to
join hands.

--
  With regards,
Konstantin (Cos) Boudnik
2CAC 8312 4870 D885 8616  6115 220F 6980 1F27 E622

Disclaimer: Opinions expressed in this email are those of the author,
and do not necessarily represent the views of any company the author
might be affiliated with at the moment of writing.


On Thu, Oct 18, 2018 at 2:26 PM, Konstantin Boudnik <co...@apache.org> wrote:
> Indeed, to be heard you don't need to be a committer: they aren't some sort of
> privileged class here ;)
>
> Anyway, back to this discussion and answering some of the concerns by Evans.
> Tarballs aren't a key requirement for this approach: I was using tarballs
> built from debs to cut some corners and not to change any of the Puppet
> recipes while I am certain my experiment shows something viable. My first
> intention was, of course, to use our packages. But I couldn't think of any
> clever way to avoid pulling all install-time dependencies without a massive
> rewrite of the packages. E.g. it won't be possible to install just Spark
> without pulling in YARN or HDFS dependencies. And will undermine the idea of
> component-specific images (or layers, as I've called them in the OP).
>
> In fact, you know my stance on the whole tarball thing: I've been pushing back
> on parcel-like approach since I can remember. I still think it's a horrible
> idea to produce tarballs as a first-class artifacts. There's plenty of reasons
> for this which are out of the scope of this conversation.
>
> Speaking of use-cases: as both Mikhail and pointed out, it is intended for
> something like Swarm or K8S (basically, anything that can orchestrate
> containers into something meaningful at scale).
>
> Much like Mikhail suggested, mixing base-layers would achieve my idea of
> piling up components on top of each other in order to create different
> special purpose or function roles. I guess it is much like sandbox you've
> mentioned, but without the hassle of creating whole stack for each new
> combination of the components. I will look more closely to the Swarm thing in
> the next day or so.
>
> Thanks guys!
>   Cos
>
>
> On Tue, Oct 16, 2018 at 01:38AM, Evans Ye wrote:
>> To Mikhail:
>> It never has to be committer to join the discuss. Welcome to share any idea
>> you have :)
>>
>> To reply in all:
>> This might be tangential, but I just want to bring in more information.
>> Currently we have Docker Provisioner and Docker Sandbox(experimental)
>> features inside Bigtop.
>> Which:
>> 1. Provisioner: install RPM/DEB via Puppet on the fly when creating cluster
>> 2. Sandbox: pre-install RPM/DEB via Puppet as special purposed stack(say
>> HDFS+SPARK) and save as an images
>>
>> Non of the above go for tarball because they're built to tale around bigtop
>> RPM/DEB packages, which might be the most valuable thing we produce. I
>> don't mean when can't ditch packages, but we have to come up with
>> considerations to cover the whole picture, say:
>>
>> 1. Where does the tarball from? Is it from upstream directly or produced by
>> Bigtop with self-patches for compatibility fixes?
>> 2. If we'd like to support install from tarball, how will the orchestration
>> tool(Puppet) being shaped? Is it going to support both RPM/DEB and tarball
>> or just the new one?
>> 3. What's the purpose of producing docker images in this new way? If we can
>> made it supported to run on K8S, that's a perfect use case!
>>
>> Overall, I champion to have this new feature in Bigtop, but just want to
>> bring up something more for discussion :)
>>
>> Evans
>>
>> Mikhail Epikhin <mi...@epikhin.net> 於 2018年10月15日 週一 下午5:04寫道:
>>
>> > Looks very interesting!
>> >
>> > Sorry for breaking into discussion, i'm not a commiter, just yet another
>> > user, but..
>> >
>> > As you wrote, docker doesn't fit this well.
>> > The problem is that you tried to push all components into one container,
>> > and you lost immutability of image.
>> > I fully agree and understand this way for production, for more local
>> > connectivity, but docker containers doesn't have big difference for run
>> > this hive, spark, hdfs on one docker container, or on many different.
>> > Anyway, their using network for connectivity, and you compare connectivity
>> > inside one container and connectivity between many containers on one local
>> > machine.
>> >
>> > They all run on one single machine, and if you create own container for
>> > each component hdfs, hive, spark, hbase, yarn its good fitting to docker
>> > model.
>> >
>> > Futher, you can create environment using docker-compose for mixing this
>> > base layers [hdfs, hive, spark, hhbase, ignite] as you wish.
>> >
>> > Just create a set of base images and templating script for creating
>> > docker-compose.yml for connect their.
>> >
>> > Futher, if you want to simulate many-nodes cluster -- you can do it just
>> > writting new docker-compose.yaml. You can test High Availability, HDFS
>> > decommission, or anything you want just write your own docker-compose.yaml.
>> >
>> > --
>> > Mikhail Epikhin
>> >
>> > On Thu, Oct 11, 2018, at 20:25, Konstantin Boudnik wrote:
>> > > Well, finally I came around and started working on the long-awaiting
>> > feature
>> > > for Bigtop, where one would be able to quickly build a container with an
>> > > arbitrary set of components in it for further orchestration.
>> > >
>> > > The ideas was to have components in different layers, so they could be
>> > put
>> > > combined together for desired effect. Say there are layers with:
>> > >   1 hdfs
>> > >   2 hive
>> > >   3 spark
>> > >   4 hbase
>> > >   5 ignite
>> > >   6 yarn
>> > > and so on....
>> > >
>> > > If one wants to assemble a spark only cluster there would be a way to
>> > layer up
>> > > 3 and 1 (ideally, 3's dependency to 1 would be automatically calculated)
>> > and
>> > > boom - there's an image, which would be put to use. The number of
>> > combination
>> > > might be greater, of course. E.g. 3-6-1, or 4-2-1-6 and so forth.
>> > >
>> > > Turned out, that I can't "prebuild" those layers as Docker won't allow
>> > you to
>> > > combine separate images to one ;( However, there's still a way to
>> > achieve a
>> > > similar effect. All I need to do is to create a set of tar-ball
>> > containing all
>> > > bits of particular components, i.e. all bits of spark or hive. When an
>> > image
>> > > needs to be build, these tarballs would be used to layer the software on
>> > top
>> > > of the base image and each other. In the above example, Dockerfile would
>> > look
>> > > something like
>> > >
>> > >     FROM ubuntu:16.04
>> > >     ADD hdfs-all.tar /tmp
>> > >     RUN tar xf /tmp/hdfs-all.tar
>> > >     ADD spark-all.tar /tmp
>> > >     RUN tar xf /tmp/spark-all.tar
>> > >
>> > > Once the images is generated, the orchestration and configuration phases
>> > will
>> > > kick in. At which point a docker-based cluster would be all ready to go.
>> > >
>> > > Do you guys see any value in this approach comparing to the current
>> > > package-based way of managing things?
>> > >
>> > > Appreciate any thoughts!
>> > > --
>> > >   Cos
>> > >
>> > > P.S. BTW, I guess I have a decent answer to all those asking for tar-ball
>> > > installation artifacts. It is as easy as running
>> > >     dpkg-deb -xv
>> > > on all packages and then tar'ing up the resulted set of files.
>> > >
>> > > Email had 1 attachment:
>> > > + signature.asc
>> > >   1k (application/pgp-signature)
>> >

Re: Packaging things to Docker

Posted by Konstantin Boudnik <co...@apache.org>.
Indeed, to be heard you don't need to be a committer: they aren't some sort of
privileged class here ;)

Anyway, back to this discussion and answering some of the concerns by Evans.
Tarballs aren't a key requirement for this approach: I was using tarballs
built from debs to cut some corners and not to change any of the Puppet
recipes while I am certain my experiment shows something viable. My first
intention was, of course, to use our packages. But I couldn't think of any
clever way to avoid pulling all install-time dependencies without a massive
rewrite of the packages. E.g. it won't be possible to install just Spark
without pulling in YARN or HDFS dependencies. And will undermine the idea of
component-specific images (or layers, as I've called them in the OP).

In fact, you know my stance on the whole tarball thing: I've been pushing back
on parcel-like approach since I can remember. I still think it's a horrible
idea to produce tarballs as a first-class artifacts. There's plenty of reasons
for this which are out of the scope of this conversation.

Speaking of use-cases: as both Mikhail and pointed out, it is intended for
something like Swarm or K8S (basically, anything that can orchestrate
containers into something meaningful at scale). 

Much like Mikhail suggested, mixing base-layers would achieve my idea of
piling up components on top of each other in order to create different
special purpose or function roles. I guess it is much like sandbox you've
mentioned, but without the hassle of creating whole stack for each new
combination of the components. I will look more closely to the Swarm thing in
the next day or so. 

Thanks guys!
  Cos

 
On Tue, Oct 16, 2018 at 01:38AM, Evans Ye wrote:
> To Mikhail:
> It never has to be committer to join the discuss. Welcome to share any idea
> you have :)
> 
> To reply in all:
> This might be tangential, but I just want to bring in more information.
> Currently we have Docker Provisioner and Docker Sandbox(experimental)
> features inside Bigtop.
> Which:
> 1. Provisioner: install RPM/DEB via Puppet on the fly when creating cluster
> 2. Sandbox: pre-install RPM/DEB via Puppet as special purposed stack(say
> HDFS+SPARK) and save as an images
> 
> Non of the above go for tarball because they're built to tale around bigtop
> RPM/DEB packages, which might be the most valuable thing we produce. I
> don't mean when can't ditch packages, but we have to come up with
> considerations to cover the whole picture, say:
> 
> 1. Where does the tarball from? Is it from upstream directly or produced by
> Bigtop with self-patches for compatibility fixes?
> 2. If we'd like to support install from tarball, how will the orchestration
> tool(Puppet) being shaped? Is it going to support both RPM/DEB and tarball
> or just the new one?
> 3. What's the purpose of producing docker images in this new way? If we can
> made it supported to run on K8S, that's a perfect use case!
> 
> Overall, I champion to have this new feature in Bigtop, but just want to
> bring up something more for discussion :)
> 
> Evans
> 
> Mikhail Epikhin <mi...@epikhin.net> 於 2018年10月15日 週一 下午5:04寫道:
> 
> > Looks very interesting!
> >
> > Sorry for breaking into discussion, i'm not a commiter, just yet another
> > user, but..
> >
> > As you wrote, docker doesn't fit this well.
> > The problem is that you tried to push all components into one container,
> > and you lost immutability of image.
> > I fully agree and understand this way for production, for more local
> > connectivity, but docker containers doesn't have big difference for run
> > this hive, spark, hdfs on one docker container, or on many different.
> > Anyway, their using network for connectivity, and you compare connectivity
> > inside one container and connectivity between many containers on one local
> > machine.
> >
> > They all run on one single machine, and if you create own container for
> > each component hdfs, hive, spark, hbase, yarn its good fitting to docker
> > model.
> >
> > Futher, you can create environment using docker-compose for mixing this
> > base layers [hdfs, hive, spark, hhbase, ignite] as you wish.
> >
> > Just create a set of base images and templating script for creating
> > docker-compose.yml for connect their.
> >
> > Futher, if you want to simulate many-nodes cluster -- you can do it just
> > writting new docker-compose.yaml. You can test High Availability, HDFS
> > decommission, or anything you want just write your own docker-compose.yaml.
> >
> > --
> > Mikhail Epikhin
> >
> > On Thu, Oct 11, 2018, at 20:25, Konstantin Boudnik wrote:
> > > Well, finally I came around and started working on the long-awaiting
> > feature
> > > for Bigtop, where one would be able to quickly build a container with an
> > > arbitrary set of components in it for further orchestration.
> > >
> > > The ideas was to have components in different layers, so they could be
> > put
> > > combined together for desired effect. Say there are layers with:
> > >   1 hdfs
> > >   2 hive
> > >   3 spark
> > >   4 hbase
> > >   5 ignite
> > >   6 yarn
> > > and so on....
> > >
> > > If one wants to assemble a spark only cluster there would be a way to
> > layer up
> > > 3 and 1 (ideally, 3's dependency to 1 would be automatically calculated)
> > and
> > > boom - there's an image, which would be put to use. The number of
> > combination
> > > might be greater, of course. E.g. 3-6-1, or 4-2-1-6 and so forth.
> > >
> > > Turned out, that I can't "prebuild" those layers as Docker won't allow
> > you to
> > > combine separate images to one ;( However, there's still a way to
> > achieve a
> > > similar effect. All I need to do is to create a set of tar-ball
> > containing all
> > > bits of particular components, i.e. all bits of spark or hive. When an
> > image
> > > needs to be build, these tarballs would be used to layer the software on
> > top
> > > of the base image and each other. In the above example, Dockerfile would
> > look
> > > something like
> > >
> > >     FROM ubuntu:16.04
> > >     ADD hdfs-all.tar /tmp
> > >     RUN tar xf /tmp/hdfs-all.tar
> > >     ADD spark-all.tar /tmp
> > >     RUN tar xf /tmp/spark-all.tar
> > >
> > > Once the images is generated, the orchestration and configuration phases
> > will
> > > kick in. At which point a docker-based cluster would be all ready to go.
> > >
> > > Do you guys see any value in this approach comparing to the current
> > > package-based way of managing things?
> > >
> > > Appreciate any thoughts!
> > > --
> > >   Cos
> > >
> > > P.S. BTW, I guess I have a decent answer to all those asking for tar-ball
> > > installation artifacts. It is as easy as running
> > >     dpkg-deb -xv
> > > on all packages and then tar'ing up the resulted set of files.
> > >
> > > Email had 1 attachment:
> > > + signature.asc
> > >   1k (application/pgp-signature)
> >

Re: Packaging things to Docker

Posted by Evans Ye <ev...@apache.org>.
To Mikhail:
It never has to be committer to join the discuss. Welcome to share any idea
you have :)

To reply in all:
This might be tangential, but I just want to bring in more information.
Currently we have Docker Provisioner and Docker Sandbox(experimental)
features inside Bigtop.
Which:
1. Provisioner: install RPM/DEB via Puppet on the fly when creating cluster
2. Sandbox: pre-install RPM/DEB via Puppet as special purposed stack(say
HDFS+SPARK) and save as an images

Non of the above go for tarball because they're built to tale around bigtop
RPM/DEB packages, which might be the most valuable thing we produce. I
don't mean when can't ditch packages, but we have to come up with
considerations to cover the whole picture, say:

1. Where does the tarball from? Is it from upstream directly or produced by
Bigtop with self-patches for compatibility fixes?
2. If we'd like to support install from tarball, how will the orchestration
tool(Puppet) being shaped? Is it going to support both RPM/DEB and tarball
or just the new one?
3. What's the purpose of producing docker images in this new way? If we can
made it supported to run on K8S, that's a perfect use case!

Overall, I champion to have this new feature in Bigtop, but just want to
bring up something more for discussion :)

Evans







Mikhail Epikhin <mi...@epikhin.net> 於 2018年10月15日 週一 下午5:04寫道:

> Looks very interesting!
>
> Sorry for breaking into discussion, i'm not a commiter, just yet another
> user, but..
>
> As you wrote, docker doesn't fit this well.
> The problem is that you tried to push all components into one container,
> and you lost immutability of image.
> I fully agree and understand this way for production, for more local
> connectivity, but docker containers doesn't have big difference for run
> this hive, spark, hdfs on one docker container, or on many different.
> Anyway, their using network for connectivity, and you compare connectivity
> inside one container and connectivity between many containers on one local
> machine.
>
> They all run on one single machine, and if you create own container for
> each component hdfs, hive, spark, hbase, yarn its good fitting to docker
> model.
>
> Futher, you can create environment using docker-compose for mixing this
> base layers [hdfs, hive, spark, hhbase, ignite] as you wish.
>
> Just create a set of base images and templating script for creating
> docker-compose.yml for connect their.
>
> Futher, if you want to simulate many-nodes cluster -- you can do it just
> writting new docker-compose.yaml. You can test High Availability, HDFS
> decommission, or anything you want just write your own docker-compose.yaml.
>
> --
> Mikhail Epikhin
>
> On Thu, Oct 11, 2018, at 20:25, Konstantin Boudnik wrote:
> > Well, finally I came around and started working on the long-awaiting
> feature
> > for Bigtop, where one would be able to quickly build a container with an
> > arbitrary set of components in it for further orchestration.
> >
> > The ideas was to have components in different layers, so they could be
> put
> > combined together for desired effect. Say there are layers with:
> >   1 hdfs
> >   2 hive
> >   3 spark
> >   4 hbase
> >   5 ignite
> >   6 yarn
> > and so on....
> >
> > If one wants to assemble a spark only cluster there would be a way to
> layer up
> > 3 and 1 (ideally, 3's dependency to 1 would be automatically calculated)
> and
> > boom - there's an image, which would be put to use. The number of
> combination
> > might be greater, of course. E.g. 3-6-1, or 4-2-1-6 and so forth.
> >
> > Turned out, that I can't "prebuild" those layers as Docker won't allow
> you to
> > combine separate images to one ;( However, there's still a way to
> achieve a
> > similar effect. All I need to do is to create a set of tar-ball
> containing all
> > bits of particular components, i.e. all bits of spark or hive. When an
> image
> > needs to be build, these tarballs would be used to layer the software on
> top
> > of the base image and each other. In the above example, Dockerfile would
> look
> > something like
> >
> >     FROM ubuntu:16.04
> >     ADD hdfs-all.tar /tmp
> >     RUN tar xf /tmp/hdfs-all.tar
> >     ADD spark-all.tar /tmp
> >     RUN tar xf /tmp/spark-all.tar
> >
> > Once the images is generated, the orchestration and configuration phases
> will
> > kick in. At which point a docker-based cluster would be all ready to go.
> >
> > Do you guys see any value in this approach comparing to the current
> > package-based way of managing things?
> >
> > Appreciate any thoughts!
> > --
> >   Cos
> >
> > P.S. BTW, I guess I have a decent answer to all those asking for tar-ball
> > installation artifacts. It is as easy as running
> >     dpkg-deb -xv
> > on all packages and then tar'ing up the resulted set of files.
> >
> > Email had 1 attachment:
> > + signature.asc
> >   1k (application/pgp-signature)
>

Re: Packaging things to Docker

Posted by Mikhail Epikhin <mi...@epikhin.net>.
Looks very interesting!

Sorry for breaking into discussion, i'm not a commiter, just yet another user, but..

As you wrote, docker doesn't fit this well.
The problem is that you tried to push all components into one container, and you lost immutability of image.
I fully agree and understand this way for production, for more local connectivity, but docker containers doesn't have big difference for run this hive, spark, hdfs on one docker container, or on many different. Anyway, their using network for connectivity, and you compare connectivity inside one container and connectivity between many containers on one local machine.

They all run on one single machine, and if you create own container for each component hdfs, hive, spark, hbase, yarn its good fitting to docker model.

Futher, you can create environment using docker-compose for mixing this base layers [hdfs, hive, spark, hhbase, ignite] as you wish.

Just create a set of base images and templating script for creating docker-compose.yml for connect their.

Futher, if you want to simulate many-nodes cluster -- you can do it just writting new docker-compose.yaml. You can test High Availability, HDFS decommission, or anything you want just write your own docker-compose.yaml.

-- 
Mikhail Epikhin

On Thu, Oct 11, 2018, at 20:25, Konstantin Boudnik wrote:
> Well, finally I came around and started working on the long-awaiting feature
> for Bigtop, where one would be able to quickly build a container with an
> arbitrary set of components in it for further orchestration.
> 
> The ideas was to have components in different layers, so they could be put
> combined together for desired effect. Say there are layers with:
>   1 hdfs
>   2 hive
>   3 spark
>   4 hbase
>   5 ignite
>   6 yarn
> and so on....
> 
> If one wants to assemble a spark only cluster there would be a way to layer up
> 3 and 1 (ideally, 3's dependency to 1 would be automatically calculated) and
> boom - there's an image, which would be put to use. The number of combination
> might be greater, of course. E.g. 3-6-1, or 4-2-1-6 and so forth.
> 
> Turned out, that I can't "prebuild" those layers as Docker won't allow you to
> combine separate images to one ;( However, there's still a way to achieve a
> similar effect. All I need to do is to create a set of tar-ball containing all
> bits of particular components, i.e. all bits of spark or hive. When an image
> needs to be build, these tarballs would be used to layer the software on top
> of the base image and each other. In the above example, Dockerfile would look
> something like
> 
>     FROM ubuntu:16.04
>     ADD hdfs-all.tar /tmp
>     RUN tar xf /tmp/hdfs-all.tar 
>     ADD spark-all.tar /tmp
>     RUN tar xf /tmp/spark-all.tar 
> 
> Once the images is generated, the orchestration and configuration phases will
> kick in. At which point a docker-based cluster would be all ready to go.
> 
> Do you guys see any value in this approach comparing to the current
> package-based way of managing things? 
> 
> Appreciate any thoughts!
> --
>   Cos
> 
> P.S. BTW, I guess I have a decent answer to all those asking for tar-ball
> installation artifacts. It is as easy as running 
>     dpkg-deb -xv
> on all packages and then tar'ing up the resulted set of files.
> 
> Email had 1 attachment:
> + signature.asc
>   1k (application/pgp-signature)