You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bigtop.apache.org by Guodong Xu <gu...@linaro.org> on 2019/07/16 04:24:03 UTC

RFC: Building .tar.gz binary packaging for all components in Bigtop

Hi, all

This is a request for comments for a potential new feature. Please have a
look and let me know whether this adds value to the community. Appreciate
your opinions on this.

As most of you know already, so far, Bigtop provides two packaging format:
deb and rpm. Users are recommended to install through apt and yum for their
respective linux distributions.

Yet, there are still users coming to me and asking for the more traditional
.tar.gz packages. I know that shouldn't be a problem for x86 users, since
they can always download .tar.gz bin releases from each component's
official release website. But for users from other architectures, like
Arm64 and powerpc, story is different. .tar.gz are not readily available to
these architectures.

In Bigtop, we are doing the job of building/packaging/testing big data
components in one single place. So, it makes sense that we step in, and
during our process of building each component, we keep a binary .tar.gz
copy for each component we supported.

If you are supportive to the above idea, then next step will be how to
achieve that. I did some research actually. So, my method is to do two
things:
0. this can be a subsequent task of either $target-deb or $target-rpm. It
depends on one of them two.
1. add tar and cp into each component's 'do-component-build' file.
2. add a new set of packaging tasks for each component. This means to add a
new task in genTasks() in packages.gradle.
  Something like
Task t = task "${target}-bin-tar" (
description: "Building .tar.gz binary for $target artifacts",
group: PACKAGES_GROUP) doLast { ...



PS:
An example in 'do-component-build' for hadoop. Other components similar.

diff --git a/bigtop-packages/src/common/hadoop/do-component-build
b/bigtop-packages/src/common/hadoop/do-component-build
index 2a1a6345..cb8ddd9d 100644
--- a/bigtop-packages/src/common/hadoop/do-component-build
+++ b/bigtop-packages/src/common/hadoop/do-component-build
@@ -146,3 +146,6 @@ cp -r target/staging/hadoop-project build/share/doc

 # Copy fuse output to the build directory
 cp
hadoop-hdfs-project/hadoop-hdfs-native-client/target/main/native/fuse-dfs/fuse_dfs
build/bin
+
+# Copy binary build result
+cp hadoop-dist/target/hadoop-${HADOOP_VERSION}.tar.gz ..


Best regards,
-Guodong

Re: RFC: Building .tar.gz binary packaging for all components in Bigtop

Posted by Ganesh Raju <ga...@linaro.org>.
Will do

Thanks
Ganesh

On Tue, Jul 30, 2019 at 9:18 AM Evans Ye <ev...@apache.org> wrote:

> Hey Olaf,
>
> Sorry I jumped into some conclusions so quick without actually getting my
> hands dirty, which is not professional. Probably those trigger points I
> mentioned are not strong enough in technical perspective. But my rationale
> in the back is very simple:
>
> Since all the current PMC members do not have much time in development, we
> probably do not fully discovered the potential of Apache Bigtop. If
> there're contributors who's willing to contribute, we should give them the
> chance to earn the merits through out time and maybe, take on the role to
> push Bigtop to the next level.
>
> I really value your efforts to try out Hadoop 3. And your points are sound.
> So to mitigate the gap here, how about  we have a quick con call to talk to
> each other directly and resolve all the gaps once for all?
>
>
> Hey Ganesh,
> Would you be able to setup a con call to discuss the topics that you and
> the Arm folks planning to contribute to Bigtop? What's on top of my head:
> 1. tar.gz packaging, 2. Hadoop 3, 3. Bigtop blueprints.
>
> And, to make the discussion productive, I really like to know the answer of
> why those users are stick to tar.gz for deployment. I don't think "because
> it's what there're currently using" is a convincing answer.
>
> - Evans
>
>
> Olaf Flebbe <of...@oflebbe.de> 於 2019年7月30日 週二 上午4:22寫道:
>
> > Hi,
> >
> > > 1. People are using docker for packaging and deployment instead of
> > rpm/deb.
> >
> > You can package everything as a container, but it may or may not make
> > sense.
> >
> > > 2. Hadoop 3's messy layout don't fit into rpm/deb conventions. Maybe
> it's
> > > because they think it's the right design for the container era.
> >
> > There is nothing prepared to run hadoop in a container. BTW it would be
> > messy since yarn is supporting compute loads to run in a container. That
> > would mean a nested container runtime. Feasible but unnecessary mind
> > twisting.
> >
> > Hadoop3 does fit in rpm/deb, one only would have to change the packaging
> > layout to a monolith approach. Can be done, not nice, but feasible.
> >
> > I am exhausted in explaining the points over and over again. Go and try,
> > I tried it. For a secure hadoop service you need a known IP address, and
> > a FQDN hostname resolution. You will find ton's of docker bug reports
> > like https://github.com/moby/moby/issues/14282 or
> > https://github.com/moby/moby/issues/29100 only to pick up two. I used
> > lxc to demonstrate setting up a secure hadoop environment at Apache
> > Conference in Budapest for a reason.
> >
> > I am out of this project now. Do whatever you like.
> >
> > Olaf
> >
>


-- 
IRC: ganeshraju@#linaro on irc.freenode.ne <http://irc.freenode.net/>t

Re: RFC: Building .tar.gz binary packaging for all components in Bigtop

Posted by Evans Ye <ev...@apache.org>.
Hey Olaf,

Sorry I jumped into some conclusions so quick without actually getting my
hands dirty, which is not professional. Probably those trigger points I
mentioned are not strong enough in technical perspective. But my rationale
in the back is very simple:

Since all the current PMC members do not have much time in development, we
probably do not fully discovered the potential of Apache Bigtop. If
there're contributors who's willing to contribute, we should give them the
chance to earn the merits through out time and maybe, take on the role to
push Bigtop to the next level.

I really value your efforts to try out Hadoop 3. And your points are sound.
So to mitigate the gap here, how about  we have a quick con call to talk to
each other directly and resolve all the gaps once for all?


Hey Ganesh,
Would you be able to setup a con call to discuss the topics that you and
the Arm folks planning to contribute to Bigtop? What's on top of my head:
1. tar.gz packaging, 2. Hadoop 3, 3. Bigtop blueprints.

And, to make the discussion productive, I really like to know the answer of
why those users are stick to tar.gz for deployment. I don't think "because
it's what there're currently using" is a convincing answer.

- Evans


Olaf Flebbe <of...@oflebbe.de> 於 2019年7月30日 週二 上午4:22寫道:

> Hi,
>
> > 1. People are using docker for packaging and deployment instead of
> rpm/deb.
>
> You can package everything as a container, but it may or may not make
> sense.
>
> > 2. Hadoop 3's messy layout don't fit into rpm/deb conventions. Maybe it's
> > because they think it's the right design for the container era.
>
> There is nothing prepared to run hadoop in a container. BTW it would be
> messy since yarn is supporting compute loads to run in a container. That
> would mean a nested container runtime. Feasible but unnecessary mind
> twisting.
>
> Hadoop3 does fit in rpm/deb, one only would have to change the packaging
> layout to a monolith approach. Can be done, not nice, but feasible.
>
> I am exhausted in explaining the points over and over again. Go and try,
> I tried it. For a secure hadoop service you need a known IP address, and
> a FQDN hostname resolution. You will find ton's of docker bug reports
> like https://github.com/moby/moby/issues/14282 or
> https://github.com/moby/moby/issues/29100 only to pick up two. I used
> lxc to demonstrate setting up a secure hadoop environment at Apache
> Conference in Budapest for a reason.
>
> I am out of this project now. Do whatever you like.
>
> Olaf
>

Re: RFC: Building .tar.gz binary packaging for all components in Bigtop

Posted by Olaf Flebbe <of...@oflebbe.de>.
Hi,

Thanks.

Olaf



Am 05.08.19 um 04:07 schrieb Roman Shaposhnik:
> On Sun, Aug 4, 2019 at 9:42 AM Evans Ye <ev...@apache.org> wrote:
>>
>> It is because your input is valuable so we want to take that into
>> consideration seriously. I personally like constructive discussions even if
>> it doesn't look good. It is actually the beauty of the community. And I'd
>> like to emphasize that all the existing members have gone through a history
>> of contribution which made Bigtop what it is now. At Apache merit does not
>> expire. This is important. So welcome back anytime.
> 
> +1! (this is the reason I still hang around although my day job is as
> far from bigdata and Hadoop as it can possibly be)
> 
> Thanks,
> Roman.
> 

Re: RFC: Building .tar.gz binary packaging for all components in Bigtop

Posted by Roman Shaposhnik <ro...@shaposhnik.org>.
On Sun, Aug 4, 2019 at 9:42 AM Evans Ye <ev...@apache.org> wrote:
>
> It is because your input is valuable so we want to take that into
> consideration seriously. I personally like constructive discussions even if
> it doesn't look good. It is actually the beauty of the community. And I'd
> like to emphasize that all the existing members have gone through a history
> of contribution which made Bigtop what it is now. At Apache merit does not
> expire. This is important. So welcome back anytime.

+1! (this is the reason I still hang around although my day job is as
far from bigdata and Hadoop as it can possibly be)

Thanks,
Roman.

Re: RFC: Building .tar.gz binary packaging for all components in Bigtop

Posted by Evans Ye <ev...@apache.org>.
It is because your input is valuable so we want to take that into
consideration seriously. I personally like constructive discussions even if
it doesn't look good. It is actually the beauty of the community. And I'd
like to emphasize that all the existing members have gone through a history
of contribution which made Bigtop what it is now. At Apache merit does not
expire. This is important. So welcome back anytime.

Evans

Olaf Flebbe <of...@oflebbe.de> 於 2019年8月1日 週四 上午5:54寫道:

> Hi,
>
> out of Bigtop. At least until further notice.
>
> My move is not to stop anyone to do smthg. I cannot and I will not stop
> it. I am missing contributions. There is no need to raise opinions over
> and over again. Just Do it! If someone would have contributed it
> instead, it would have been in for ages.
>
> Olaf
>
>
>
> Am 29.07.19 um 23:54 schrieb Konstantin Boudnik:
> > Hey Olaf.
> >
> > Sorry, not sure how to read this? Out of the Bigtop or this silly
> commotion
> > with tar.gz? Hopefully, just the latter one ;)
> >
> > Cos
> >
> > On Mon, Jul 29, 2019 at 10:21PM, Olaf Flebbe wrote:
> >>
> >> I am out of this project now. Do whatever you like.
> >>
> >> Olaf
>

Re: RFC: Building .tar.gz binary packaging for all components in Bigtop

Posted by Olaf Flebbe <of...@oflebbe.de>.
Hi,

out of Bigtop. At least until further notice.

My move is not to stop anyone to do smthg. I cannot and I will not stop 
it. I am missing contributions. There is no need to raise opinions over 
and over again. Just Do it! If someone would have contributed it 
instead, it would have been in for ages.

Olaf



Am 29.07.19 um 23:54 schrieb Konstantin Boudnik:
> Hey Olaf.
> 
> Sorry, not sure how to read this? Out of the Bigtop or this silly commotion
> with tar.gz? Hopefully, just the latter one ;)
> 
> Cos
> 
> On Mon, Jul 29, 2019 at 10:21PM, Olaf Flebbe wrote:
>>
>> I am out of this project now. Do whatever you like.
>>
>> Olaf

Re: RFC: Building .tar.gz binary packaging for all components in Bigtop

Posted by Olaf Flebbe <of...@oflebbe.de>.
Hi,

> 1. People are using docker for packaging and deployment instead of rpm/deb.

You can package everything as a container, but it may or may not make sense.

> 2. Hadoop 3's messy layout don't fit into rpm/deb conventions. Maybe it's
> because they think it's the right design for the container era.

There is nothing prepared to run hadoop in a container. BTW it would be 
messy since yarn is supporting compute loads to run in a container. That 
would mean a nested container runtime. Feasible but unnecessary mind 
twisting.

Hadoop3 does fit in rpm/deb, one only would have to change the packaging 
layout to a monolith approach. Can be done, not nice, but feasible.

I am exhausted in explaining the points over and over again. Go and try, 
I tried it. For a secure hadoop service you need a known IP address, and 
a FQDN hostname resolution. You will find ton's of docker bug reports 
like https://github.com/moby/moby/issues/14282 or 
https://github.com/moby/moby/issues/29100 only to pick up two. I used 
lxc to demonstrate setting up a secure hadoop environment at Apache 
Conference in Budapest for a reason.

I am out of this project now. Do whatever you like.

Olaf

Re: RFC: Building .tar.gz binary packaging for all components in Bigtop

Posted by Ganesh Raju <ga...@linaro.org>.
Support Evans thinking. Especially if Linaro can contribute towards getting
.tar.gz working, would we get a 'go ahead'  ?

On Sat, Jul 27, 2019 at 1:02 PM Evans Ye <ev...@apache.org> wrote:

> I have been thinking about this for a time. Maybe tar.gz is actually
> something we can do. Considering that the tech trend is container, k8s, and
> cloud, having something easy and not over manufactured for downstream to
> consume and integrate is an advantage of tar.gz. For us we keep focusing on
> integration test and doing open source distribution of hadoop. The binary
> format can be RPM/DEB, but not limited to.
>
> Some triggers pushing me to this end:
> 1. People are using docker for packaging and deployment instead of rpm/deb.
> 2. Hadoop 3's messy layout don't fit into rpm/deb conventions. Maybe it's
> because they think it's the right design for the container era.
>
> Going down to the implementation level, we can do this by branching out
> after do component builds. However all the integration tests will be broken
> at the deployment stage. That means we need an entire new deployment tool.
> Or we can leverage other existing solutions like Ambari?
>
> BTW, I'm also curious how Ambari(hortonworks) integrates Hadoop 3 and
> do.dependency, daemons...
>
>
>
> Konstantin Boudnik <co...@apache.org> 於 2019年7月20日 週六 01:17 寫道:
>
> > +1 on both Olaf's and Evans' points. Tarballs are messy stuff and are
> hard
> > to
> > control in operations: I believe the main reason for the existence of
> them
> > in
> > the official component releases is the simplicity of the media and lesser
> > pressure on the community to integrate the software into a target stack
> (to
> > Olaf's point).
> >
> > To the ARM's folks issue: aren't they able to use structured packages for
> > some
> > reason? Funny, this conversation happens every 2-3 years, like a clock ;)
> >
> > Thanks,
> >   Cos
> >
> > On Sat, Jul 20, 2019 at 01:06AM, Evans Ye wrote:
> > > From technical perspective Olaf's points are reasonable. However I'd
> like
> > > to bridge the gap here so that we can strike a balance of what arm
> folks
> > > need.
> > >
> > > Guodong let's back the story with some technical discussions. For
> > example,
> > > is it because your customer already implemented their deployment tool
> > with
> > > tar.gz? If so that's a strong reason from their perspective.
> > > To me tar.gz has no much pros for production. For Apache projects, all
> > the
> > > release are source code hence tar.gz is the most simple way.
> Furthermore,
> > > most of the projects will provide compiled binary tarball just for
> > > convenient.
> > >
> > > Olaf Flebbe <of...@oflebbe.de> 於 2019年7月19日 週五 上午3:40寫道:
> > >
> > > > Hi
> > > >
> > > > Maybe we have different target groups: For educational users, trying
> to
> > > > figure out how everything works together, the single computer cluster
> > is
> > > > great.
> > > >
> > > > This Tar.gz artefact were never ever been sufficient to run in
> > production.
> > > > They did not contain 64bit libs for instance. They do not provide
> start
> > > > scripts, error prone when you accidently using the wrong java, do not
> > place
> > > > logfiles in suitable places and so on.
> > > >
> > > > This tar artefacts and instructions are POC quality, that's it.
> > > >
> > > > Please be aware that Bigtop changes directory layout, directory
> > > > permissions, configuration and startup of the original code in order
> to
> > > > support large scale installations and automation.
> > > >
> > > > We have a much better alternative: Distribution integrated
> > repositories.
> > > > Youll only have to apt/yum install hadoop-hdfs-datanode and
> everything
> > is
> > > > already setup, including java, runscripts, directory layout and
> users.
> > Then
> > > > you can concentrate on configuring that beast. You do not even need
> no
> > > > special instruction, since it is distribution native -- well, almost.
> > If
> > > > you feel that bigtops runscripts / layout is not suitable , you are
> > very
> > > > welcome to contribute.
> > > >
> > > > If i look at the instructions [1] . they contain tons of settings you
> > > > should do. With our packages you can actually do exactly this,
> without
> > > > bothering to install and configure all the dependencies.
> > > >
> > > > Olaf
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Von meinem iPad gesendet
> > > >
> > > > > Am 18.07.2019 um 08:57 schrieb Guodong Xu <gu...@linaro.org>:
> > > > >
> > > > > Hi, Evans
> > > > >
> > > > > Comments in below.
> > > > >
> > > > >>      On Tue, Jul 16, 2019 at 10:46 PM Evans Ye <
> evansye@apache.org>
> > > > wrote:
> > > > >>
> > > > >> I'm not objecting this, but I'd love to have more discussion to
> > figure
> > > > out
> > > > >> whether this is the right thing to do. What I get from your
> > proposal is
> > > > >> users want to do things which RPM/DEB don't do well while tar.gz
> is
> > > > good at
> > > > >> it. However is it able to do it in another way which is far more
> > > > beneficial
> > > > >> in more scenarios? In general, it's like when users are asking
> for a
> > > > faster
> > > > >> horses, can we come up with cars?
> > > > >>
> > > > >> How about we start with the first step which is to elaborate why
> > users
> > > > >> choose to go for tar.gz?
> > > > >
> > > > >
> > > > > Right, agree. Here is what I learned from users of Arm servers:
> > > > >
> > > > > One background for this is, currently, CDH and Hortonworks both
> have
> > no
> > > > > official release for Arm server yet. So, Bigtop is the only
> > available and
> > > > > verified distribution to them. As you know, with effort from the
> > > > community
> > > > > and Arm Inc., Linaro, Bigtop now has officially supported Arm64.
> > > > >
> > > > > To these users, before they start to use Bigtop on Arm, they are
> > already
> > > > > familiar with each component's individual installation and usage on
> > x86.
> > > > > Most of them are released in .tar.gz format. (i.e. Most apache big
> > data
> > > > > component doesn't release in deb/rpm. So, if we tell users that the
> > only
> > > > > available format in Bigtop is deb/rpm, this just hesitates them).
> > > > >
> > > > > Eg. For Hadoop, the official site for installation is here [1] and
> > here
> > > > > [2], their release format is .tar.gz.
> > > > >
> > > > > So,
> > > > > 1. using .tar.gz is the minimum effort route for them to start
> their
> > > > touch
> > > > > with Bigtop (if we can support .tar.gz).
> > > > > 2. Bigtop has all components tested. That provides very big
> > confidence to
> > > > > the users. So they like to use the binary built from Bigtop on
> Arm64
> > > > > servers.
> > > > >
> > > > > [1]
> > > > >
> > > >
> >
> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html
> > > > > [2] https://www.apache.org/dyn/closer.cgi/hadoop/common/
> > > > >
> > > > > -Guodong
> > > > >
> > > > >
> > > > >>
> > > > >> Guodong Xu <gu...@linaro.org> 於 2019年7月16日 週二 下午12:24寫道:
> > > > >>
> > > > >>> Hi, all
> > > > >>>
> > > > >>> This is a request for comments for a potential new feature.
> Please
> > > > have a
> > > > >>> look and let me know whether this adds value to the community.
> > > > Appreciate
> > > > >>> your opinions on this.
> > > > >>>
> > > > >>> As most of you know already, so far, Bigtop provides two
> packaging
> > > > >> format:
> > > > >>> deb and rpm. Users are recommended to install through apt and yum
> > for
> > > > >> their
> > > > >>> respective linux distributions.
> > > > >>>
> > > > >>> Yet, there are still users coming to me and asking for the more
> > > > >> traditional
> > > > >>> .tar.gz packages. I know that shouldn't be a problem for x86
> users,
> > > > since
> > > > >>> they can always download .tar.gz bin releases from each
> component's
> > > > >>> official release website. But for users from other architectures,
> > like
> > > > >>> Arm64 and powerpc, story is different. .tar.gz are not readily
> > > > available
> > > > >> to
> > > > >>> these architectures.
> > > > >>>
> > > > >>> In Bigtop, we are doing the job of building/packaging/testing big
> > data
> > > > >>> components in one single place. So, it makes sense that we step
> > in, and
> > > > >>> during our process of building each component, we keep a binary
> > .tar.gz
> > > > >>> copy for each component we supported.
> > > > >>>
> > > > >>> If you are supportive to the above idea, then next step will be
> > how to
> > > > >>> achieve that. I did some research actually. So, my method is to
> do
> > two
> > > > >>> things:
> > > > >>> 0. this can be a subsequent task of either $target-deb or
> > $target-rpm.
> > > > It
> > > > >>> depends on one of them two.
> > > > >>> 1. add tar and cp into each component's 'do-component-build'
> file.
> > > > >>> 2. add a new set of packaging tasks for each component. This
> means
> > to
> > > > >> add a
> > > > >>> new task in genTasks() in packages.gradle.
> > > > >>>  Something like
> > > > >>> Task t = task "${target}-bin-tar" (
> > > > >>> description: "Building .tar.gz binary for $target artifacts",
> > > > >>> group: PACKAGES_GROUP) doLast { ...
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>> PS:
> > > > >>> An example in 'do-component-build' for hadoop. Other components
> > > > similar.
> > > > >>>
> > > > >>> diff --git a/bigtop-packages/src/common/hadoop/do-component-build
> > > > >>> b/bigtop-packages/src/common/hadoop/do-component-build
> > > > >>> index 2a1a6345..cb8ddd9d 100644
> > > > >>> --- a/bigtop-packages/src/common/hadoop/do-component-build
> > > > >>> +++ b/bigtop-packages/src/common/hadoop/do-component-build
> > > > >>> @@ -146,3 +146,6 @@ cp -r target/staging/hadoop-project
> > build/share/doc
> > > > >>>
> > > > >>> # Copy fuse output to the build directory
> > > > >>> cp
> > > > >>>
> > > > >>>
> > > > >>
> > > >
> >
> hadoop-hdfs-project/hadoop-hdfs-native-client/target/main/native/fuse-dfs/fuse_dfs
> > > > >>> build/bin
> > > > >>> +
> > > > >>> +# Copy binary build result
> > > > >>> +cp hadoop-dist/target/hadoop-${HADOOP_VERSION}.tar.gz ..
> > > > >>>
> > > > >>>
> > > > >>> Best regards,
> > > > >>> -Guodong
> > > > >>>
> > > > >>
> > > >
> >
>


-- 
IRC: ganeshraju@#linaro on irc.freenode.ne <http://irc.freenode.net/>t

Re: RFC: Building .tar.gz binary packaging for all components in Bigtop

Posted by Evans Ye <ev...@apache.org>.
I have been thinking about this for a time. Maybe tar.gz is actually
something we can do. Considering that the tech trend is container, k8s, and
cloud, having something easy and not over manufactured for downstream to
consume and integrate is an advantage of tar.gz. For us we keep focusing on
integration test and doing open source distribution of hadoop. The binary
format can be RPM/DEB, but not limited to.

Some triggers pushing me to this end:
1. People are using docker for packaging and deployment instead of rpm/deb.
2. Hadoop 3's messy layout don't fit into rpm/deb conventions. Maybe it's
because they think it's the right design for the container era.

Going down to the implementation level, we can do this by branching out
after do component builds. However all the integration tests will be broken
at the deployment stage. That means we need an entire new deployment tool.
Or we can leverage other existing solutions like Ambari?

BTW, I'm also curious how Ambari(hortonworks) integrates Hadoop 3 and
do.dependency, daemons...



Konstantin Boudnik <co...@apache.org> 於 2019年7月20日 週六 01:17 寫道:

> +1 on both Olaf's and Evans' points. Tarballs are messy stuff and are hard
> to
> control in operations: I believe the main reason for the existence of them
> in
> the official component releases is the simplicity of the media and lesser
> pressure on the community to integrate the software into a target stack (to
> Olaf's point).
>
> To the ARM's folks issue: aren't they able to use structured packages for
> some
> reason? Funny, this conversation happens every 2-3 years, like a clock ;)
>
> Thanks,
>   Cos
>
> On Sat, Jul 20, 2019 at 01:06AM, Evans Ye wrote:
> > From technical perspective Olaf's points are reasonable. However I'd like
> > to bridge the gap here so that we can strike a balance of what arm folks
> > need.
> >
> > Guodong let's back the story with some technical discussions. For
> example,
> > is it because your customer already implemented their deployment tool
> with
> > tar.gz? If so that's a strong reason from their perspective.
> > To me tar.gz has no much pros for production. For Apache projects, all
> the
> > release are source code hence tar.gz is the most simple way. Furthermore,
> > most of the projects will provide compiled binary tarball just for
> > convenient.
> >
> > Olaf Flebbe <of...@oflebbe.de> 於 2019年7月19日 週五 上午3:40寫道:
> >
> > > Hi
> > >
> > > Maybe we have different target groups: For educational users, trying to
> > > figure out how everything works together, the single computer cluster
> is
> > > great.
> > >
> > > This Tar.gz artefact were never ever been sufficient to run in
> production.
> > > They did not contain 64bit libs for instance. They do not provide start
> > > scripts, error prone when you accidently using the wrong java, do not
> place
> > > logfiles in suitable places and so on.
> > >
> > > This tar artefacts and instructions are POC quality, that's it.
> > >
> > > Please be aware that Bigtop changes directory layout, directory
> > > permissions, configuration and startup of the original code in order to
> > > support large scale installations and automation.
> > >
> > > We have a much better alternative: Distribution integrated
> repositories.
> > > Youll only have to apt/yum install hadoop-hdfs-datanode and everything
> is
> > > already setup, including java, runscripts, directory layout and users.
> Then
> > > you can concentrate on configuring that beast. You do not even need no
> > > special instruction, since it is distribution native -- well, almost.
> If
> > > you feel that bigtops runscripts / layout is not suitable , you are
> very
> > > welcome to contribute.
> > >
> > > If i look at the instructions [1] . they contain tons of settings you
> > > should do. With our packages you can actually do exactly this, without
> > > bothering to install and configure all the dependencies.
> > >
> > > Olaf
> > >
> > >
> > >
> > >
> > >
> > > Von meinem iPad gesendet
> > >
> > > > Am 18.07.2019 um 08:57 schrieb Guodong Xu <gu...@linaro.org>:
> > > >
> > > > Hi, Evans
> > > >
> > > > Comments in below.
> > > >
> > > >>      On Tue, Jul 16, 2019 at 10:46 PM Evans Ye <ev...@apache.org>
> > > wrote:
> > > >>
> > > >> I'm not objecting this, but I'd love to have more discussion to
> figure
> > > out
> > > >> whether this is the right thing to do. What I get from your
> proposal is
> > > >> users want to do things which RPM/DEB don't do well while tar.gz is
> > > good at
> > > >> it. However is it able to do it in another way which is far more
> > > beneficial
> > > >> in more scenarios? In general, it's like when users are asking for a
> > > faster
> > > >> horses, can we come up with cars?
> > > >>
> > > >> How about we start with the first step which is to elaborate why
> users
> > > >> choose to go for tar.gz?
> > > >
> > > >
> > > > Right, agree. Here is what I learned from users of Arm servers:
> > > >
> > > > One background for this is, currently, CDH and Hortonworks both have
> no
> > > > official release for Arm server yet. So, Bigtop is the only
> available and
> > > > verified distribution to them. As you know, with effort from the
> > > community
> > > > and Arm Inc., Linaro, Bigtop now has officially supported Arm64.
> > > >
> > > > To these users, before they start to use Bigtop on Arm, they are
> already
> > > > familiar with each component's individual installation and usage on
> x86.
> > > > Most of them are released in .tar.gz format. (i.e. Most apache big
> data
> > > > component doesn't release in deb/rpm. So, if we tell users that the
> only
> > > > available format in Bigtop is deb/rpm, this just hesitates them).
> > > >
> > > > Eg. For Hadoop, the official site for installation is here [1] and
> here
> > > > [2], their release format is .tar.gz.
> > > >
> > > > So,
> > > > 1. using .tar.gz is the minimum effort route for them to start their
> > > touch
> > > > with Bigtop (if we can support .tar.gz).
> > > > 2. Bigtop has all components tested. That provides very big
> confidence to
> > > > the users. So they like to use the binary built from Bigtop on Arm64
> > > > servers.
> > > >
> > > > [1]
> > > >
> > >
> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html
> > > > [2] https://www.apache.org/dyn/closer.cgi/hadoop/common/
> > > >
> > > > -Guodong
> > > >
> > > >
> > > >>
> > > >> Guodong Xu <gu...@linaro.org> 於 2019年7月16日 週二 下午12:24寫道:
> > > >>
> > > >>> Hi, all
> > > >>>
> > > >>> This is a request for comments for a potential new feature. Please
> > > have a
> > > >>> look and let me know whether this adds value to the community.
> > > Appreciate
> > > >>> your opinions on this.
> > > >>>
> > > >>> As most of you know already, so far, Bigtop provides two packaging
> > > >> format:
> > > >>> deb and rpm. Users are recommended to install through apt and yum
> for
> > > >> their
> > > >>> respective linux distributions.
> > > >>>
> > > >>> Yet, there are still users coming to me and asking for the more
> > > >> traditional
> > > >>> .tar.gz packages. I know that shouldn't be a problem for x86 users,
> > > since
> > > >>> they can always download .tar.gz bin releases from each component's
> > > >>> official release website. But for users from other architectures,
> like
> > > >>> Arm64 and powerpc, story is different. .tar.gz are not readily
> > > available
> > > >> to
> > > >>> these architectures.
> > > >>>
> > > >>> In Bigtop, we are doing the job of building/packaging/testing big
> data
> > > >>> components in one single place. So, it makes sense that we step
> in, and
> > > >>> during our process of building each component, we keep a binary
> .tar.gz
> > > >>> copy for each component we supported.
> > > >>>
> > > >>> If you are supportive to the above idea, then next step will be
> how to
> > > >>> achieve that. I did some research actually. So, my method is to do
> two
> > > >>> things:
> > > >>> 0. this can be a subsequent task of either $target-deb or
> $target-rpm.
> > > It
> > > >>> depends on one of them two.
> > > >>> 1. add tar and cp into each component's 'do-component-build' file.
> > > >>> 2. add a new set of packaging tasks for each component. This means
> to
> > > >> add a
> > > >>> new task in genTasks() in packages.gradle.
> > > >>>  Something like
> > > >>> Task t = task "${target}-bin-tar" (
> > > >>> description: "Building .tar.gz binary for $target artifacts",
> > > >>> group: PACKAGES_GROUP) doLast { ...
> > > >>>
> > > >>>
> > > >>>
> > > >>> PS:
> > > >>> An example in 'do-component-build' for hadoop. Other components
> > > similar.
> > > >>>
> > > >>> diff --git a/bigtop-packages/src/common/hadoop/do-component-build
> > > >>> b/bigtop-packages/src/common/hadoop/do-component-build
> > > >>> index 2a1a6345..cb8ddd9d 100644
> > > >>> --- a/bigtop-packages/src/common/hadoop/do-component-build
> > > >>> +++ b/bigtop-packages/src/common/hadoop/do-component-build
> > > >>> @@ -146,3 +146,6 @@ cp -r target/staging/hadoop-project
> build/share/doc
> > > >>>
> > > >>> # Copy fuse output to the build directory
> > > >>> cp
> > > >>>
> > > >>>
> > > >>
> > >
> hadoop-hdfs-project/hadoop-hdfs-native-client/target/main/native/fuse-dfs/fuse_dfs
> > > >>> build/bin
> > > >>> +
> > > >>> +# Copy binary build result
> > > >>> +cp hadoop-dist/target/hadoop-${HADOOP_VERSION}.tar.gz ..
> > > >>>
> > > >>>
> > > >>> Best regards,
> > > >>> -Guodong
> > > >>>
> > > >>
> > >
>

Re: RFC: Building .tar.gz binary packaging for all components in Bigtop

Posted by Konstantin Boudnik <co...@apache.org>.
+1 on both Olaf's and Evans' points. Tarballs are messy stuff and are hard to
control in operations: I believe the main reason for the existence of them in
the official component releases is the simplicity of the media and lesser
pressure on the community to integrate the software into a target stack (to
Olaf's point).

To the ARM's folks issue: aren't they able to use structured packages for some
reason? Funny, this conversation happens every 2-3 years, like a clock ;)

Thanks,
  Cos

On Sat, Jul 20, 2019 at 01:06AM, Evans Ye wrote:
> From technical perspective Olaf's points are reasonable. However I'd like
> to bridge the gap here so that we can strike a balance of what arm folks
> need.
> 
> Guodong let's back the story with some technical discussions. For example,
> is it because your customer already implemented their deployment tool with
> tar.gz? If so that's a strong reason from their perspective.
> To me tar.gz has no much pros for production. For Apache projects, all the
> release are source code hence tar.gz is the most simple way. Furthermore,
> most of the projects will provide compiled binary tarball just for
> convenient.
> 
> Olaf Flebbe <of...@oflebbe.de> 於 2019年7月19日 週五 上午3:40寫道:
> 
> > Hi
> >
> > Maybe we have different target groups: For educational users, trying to
> > figure out how everything works together, the single computer cluster is
> > great.
> >
> > This Tar.gz artefact were never ever been sufficient to run in production.
> > They did not contain 64bit libs for instance. They do not provide start
> > scripts, error prone when you accidently using the wrong java, do not place
> > logfiles in suitable places and so on.
> >
> > This tar artefacts and instructions are POC quality, that's it.
> >
> > Please be aware that Bigtop changes directory layout, directory
> > permissions, configuration and startup of the original code in order to
> > support large scale installations and automation.
> >
> > We have a much better alternative: Distribution integrated repositories.
> > Youll only have to apt/yum install hadoop-hdfs-datanode and everything is
> > already setup, including java, runscripts, directory layout and users. Then
> > you can concentrate on configuring that beast. You do not even need no
> > special instruction, since it is distribution native -- well, almost. If
> > you feel that bigtops runscripts / layout is not suitable , you are very
> > welcome to contribute.
> >
> > If i look at the instructions [1] . they contain tons of settings you
> > should do. With our packages you can actually do exactly this, without
> > bothering to install and configure all the dependencies.
> >
> > Olaf
> >
> >
> >
> >
> >
> > Von meinem iPad gesendet
> >
> > > Am 18.07.2019 um 08:57 schrieb Guodong Xu <gu...@linaro.org>:
> > >
> > > Hi, Evans
> > >
> > > Comments in below.
> > >
> > >>      On Tue, Jul 16, 2019 at 10:46 PM Evans Ye <ev...@apache.org>
> > wrote:
> > >>
> > >> I'm not objecting this, but I'd love to have more discussion to figure
> > out
> > >> whether this is the right thing to do. What I get from your proposal is
> > >> users want to do things which RPM/DEB don't do well while tar.gz is
> > good at
> > >> it. However is it able to do it in another way which is far more
> > beneficial
> > >> in more scenarios? In general, it's like when users are asking for a
> > faster
> > >> horses, can we come up with cars?
> > >>
> > >> How about we start with the first step which is to elaborate why users
> > >> choose to go for tar.gz?
> > >
> > >
> > > Right, agree. Here is what I learned from users of Arm servers:
> > >
> > > One background for this is, currently, CDH and Hortonworks both have no
> > > official release for Arm server yet. So, Bigtop is the only available and
> > > verified distribution to them. As you know, with effort from the
> > community
> > > and Arm Inc., Linaro, Bigtop now has officially supported Arm64.
> > >
> > > To these users, before they start to use Bigtop on Arm, they are already
> > > familiar with each component's individual installation and usage on x86.
> > > Most of them are released in .tar.gz format. (i.e. Most apache big data
> > > component doesn't release in deb/rpm. So, if we tell users that the only
> > > available format in Bigtop is deb/rpm, this just hesitates them).
> > >
> > > Eg. For Hadoop, the official site for installation is here [1] and here
> > > [2], their release format is .tar.gz.
> > >
> > > So,
> > > 1. using .tar.gz is the minimum effort route for them to start their
> > touch
> > > with Bigtop (if we can support .tar.gz).
> > > 2. Bigtop has all components tested. That provides very big confidence to
> > > the users. So they like to use the binary built from Bigtop on Arm64
> > > servers.
> > >
> > > [1]
> > >
> > https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html
> > > [2] https://www.apache.org/dyn/closer.cgi/hadoop/common/
> > >
> > > -Guodong
> > >
> > >
> > >>
> > >> Guodong Xu <gu...@linaro.org> 於 2019年7月16日 週二 下午12:24寫道:
> > >>
> > >>> Hi, all
> > >>>
> > >>> This is a request for comments for a potential new feature. Please
> > have a
> > >>> look and let me know whether this adds value to the community.
> > Appreciate
> > >>> your opinions on this.
> > >>>
> > >>> As most of you know already, so far, Bigtop provides two packaging
> > >> format:
> > >>> deb and rpm. Users are recommended to install through apt and yum for
> > >> their
> > >>> respective linux distributions.
> > >>>
> > >>> Yet, there are still users coming to me and asking for the more
> > >> traditional
> > >>> .tar.gz packages. I know that shouldn't be a problem for x86 users,
> > since
> > >>> they can always download .tar.gz bin releases from each component's
> > >>> official release website. But for users from other architectures, like
> > >>> Arm64 and powerpc, story is different. .tar.gz are not readily
> > available
> > >> to
> > >>> these architectures.
> > >>>
> > >>> In Bigtop, we are doing the job of building/packaging/testing big data
> > >>> components in one single place. So, it makes sense that we step in, and
> > >>> during our process of building each component, we keep a binary .tar.gz
> > >>> copy for each component we supported.
> > >>>
> > >>> If you are supportive to the above idea, then next step will be how to
> > >>> achieve that. I did some research actually. So, my method is to do two
> > >>> things:
> > >>> 0. this can be a subsequent task of either $target-deb or $target-rpm.
> > It
> > >>> depends on one of them two.
> > >>> 1. add tar and cp into each component's 'do-component-build' file.
> > >>> 2. add a new set of packaging tasks for each component. This means to
> > >> add a
> > >>> new task in genTasks() in packages.gradle.
> > >>>  Something like
> > >>> Task t = task "${target}-bin-tar" (
> > >>> description: "Building .tar.gz binary for $target artifacts",
> > >>> group: PACKAGES_GROUP) doLast { ...
> > >>>
> > >>>
> > >>>
> > >>> PS:
> > >>> An example in 'do-component-build' for hadoop. Other components
> > similar.
> > >>>
> > >>> diff --git a/bigtop-packages/src/common/hadoop/do-component-build
> > >>> b/bigtop-packages/src/common/hadoop/do-component-build
> > >>> index 2a1a6345..cb8ddd9d 100644
> > >>> --- a/bigtop-packages/src/common/hadoop/do-component-build
> > >>> +++ b/bigtop-packages/src/common/hadoop/do-component-build
> > >>> @@ -146,3 +146,6 @@ cp -r target/staging/hadoop-project build/share/doc
> > >>>
> > >>> # Copy fuse output to the build directory
> > >>> cp
> > >>>
> > >>>
> > >>
> > hadoop-hdfs-project/hadoop-hdfs-native-client/target/main/native/fuse-dfs/fuse_dfs
> > >>> build/bin
> > >>> +
> > >>> +# Copy binary build result
> > >>> +cp hadoop-dist/target/hadoop-${HADOOP_VERSION}.tar.gz ..
> > >>>
> > >>>
> > >>> Best regards,
> > >>> -Guodong
> > >>>
> > >>
> >

Re: RFC: Building .tar.gz binary packaging for all components in Bigtop

Posted by Evans Ye <ev...@apache.org>.
From technical perspective Olaf's points are reasonable. However I'd like
to bridge the gap here so that we can strike a balance of what arm folks
need.

Guodong let's back the story with some technical discussions. For example,
is it because your customer already implemented their deployment tool with
tar.gz? If so that's a strong reason from their perspective.
To me tar.gz has no much pros for production. For Apache projects, all the
release are source code hence tar.gz is the most simple way. Furthermore,
most of the projects will provide compiled binary tarball just for
convenient.

Olaf Flebbe <of...@oflebbe.de> 於 2019年7月19日 週五 上午3:40寫道:

> Hi
>
> Maybe we have different target groups: For educational users, trying to
> figure out how everything works together, the single computer cluster is
> great.
>
> This Tar.gz artefact were never ever been sufficient to run in production.
> They did not contain 64bit libs for instance. They do not provide start
> scripts, error prone when you accidently using the wrong java, do not place
> logfiles in suitable places and so on.
>
> This tar artefacts and instructions are POC quality, that's it.
>
> Please be aware that Bigtop changes directory layout, directory
> permissions, configuration and startup of the original code in order to
> support large scale installations and automation.
>
> We have a much better alternative: Distribution integrated repositories.
> Youll only have to apt/yum install hadoop-hdfs-datanode and everything is
> already setup, including java, runscripts, directory layout and users. Then
> you can concentrate on configuring that beast. You do not even need no
> special instruction, since it is distribution native -- well, almost. If
> you feel that bigtops runscripts / layout is not suitable , you are very
> welcome to contribute.
>
> If i look at the instructions [1] . they contain tons of settings you
> should do. With our packages you can actually do exactly this, without
> bothering to install and configure all the dependencies.
>
> Olaf
>
>
>
>
>
> Von meinem iPad gesendet
>
> > Am 18.07.2019 um 08:57 schrieb Guodong Xu <gu...@linaro.org>:
> >
> > Hi, Evans
> >
> > Comments in below.
> >
> >>      On Tue, Jul 16, 2019 at 10:46 PM Evans Ye <ev...@apache.org>
> wrote:
> >>
> >> I'm not objecting this, but I'd love to have more discussion to figure
> out
> >> whether this is the right thing to do. What I get from your proposal is
> >> users want to do things which RPM/DEB don't do well while tar.gz is
> good at
> >> it. However is it able to do it in another way which is far more
> beneficial
> >> in more scenarios? In general, it's like when users are asking for a
> faster
> >> horses, can we come up with cars?
> >>
> >> How about we start with the first step which is to elaborate why users
> >> choose to go for tar.gz?
> >
> >
> > Right, agree. Here is what I learned from users of Arm servers:
> >
> > One background for this is, currently, CDH and Hortonworks both have no
> > official release for Arm server yet. So, Bigtop is the only available and
> > verified distribution to them. As you know, with effort from the
> community
> > and Arm Inc., Linaro, Bigtop now has officially supported Arm64.
> >
> > To these users, before they start to use Bigtop on Arm, they are already
> > familiar with each component's individual installation and usage on x86.
> > Most of them are released in .tar.gz format. (i.e. Most apache big data
> > component doesn't release in deb/rpm. So, if we tell users that the only
> > available format in Bigtop is deb/rpm, this just hesitates them).
> >
> > Eg. For Hadoop, the official site for installation is here [1] and here
> > [2], their release format is .tar.gz.
> >
> > So,
> > 1. using .tar.gz is the minimum effort route for them to start their
> touch
> > with Bigtop (if we can support .tar.gz).
> > 2. Bigtop has all components tested. That provides very big confidence to
> > the users. So they like to use the binary built from Bigtop on Arm64
> > servers.
> >
> > [1]
> >
> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html
> > [2] https://www.apache.org/dyn/closer.cgi/hadoop/common/
> >
> > -Guodong
> >
> >
> >>
> >> Guodong Xu <gu...@linaro.org> 於 2019年7月16日 週二 下午12:24寫道:
> >>
> >>> Hi, all
> >>>
> >>> This is a request for comments for a potential new feature. Please
> have a
> >>> look and let me know whether this adds value to the community.
> Appreciate
> >>> your opinions on this.
> >>>
> >>> As most of you know already, so far, Bigtop provides two packaging
> >> format:
> >>> deb and rpm. Users are recommended to install through apt and yum for
> >> their
> >>> respective linux distributions.
> >>>
> >>> Yet, there are still users coming to me and asking for the more
> >> traditional
> >>> .tar.gz packages. I know that shouldn't be a problem for x86 users,
> since
> >>> they can always download .tar.gz bin releases from each component's
> >>> official release website. But for users from other architectures, like
> >>> Arm64 and powerpc, story is different. .tar.gz are not readily
> available
> >> to
> >>> these architectures.
> >>>
> >>> In Bigtop, we are doing the job of building/packaging/testing big data
> >>> components in one single place. So, it makes sense that we step in, and
> >>> during our process of building each component, we keep a binary .tar.gz
> >>> copy for each component we supported.
> >>>
> >>> If you are supportive to the above idea, then next step will be how to
> >>> achieve that. I did some research actually. So, my method is to do two
> >>> things:
> >>> 0. this can be a subsequent task of either $target-deb or $target-rpm.
> It
> >>> depends on one of them two.
> >>> 1. add tar and cp into each component's 'do-component-build' file.
> >>> 2. add a new set of packaging tasks for each component. This means to
> >> add a
> >>> new task in genTasks() in packages.gradle.
> >>>  Something like
> >>> Task t = task "${target}-bin-tar" (
> >>> description: "Building .tar.gz binary for $target artifacts",
> >>> group: PACKAGES_GROUP) doLast { ...
> >>>
> >>>
> >>>
> >>> PS:
> >>> An example in 'do-component-build' for hadoop. Other components
> similar.
> >>>
> >>> diff --git a/bigtop-packages/src/common/hadoop/do-component-build
> >>> b/bigtop-packages/src/common/hadoop/do-component-build
> >>> index 2a1a6345..cb8ddd9d 100644
> >>> --- a/bigtop-packages/src/common/hadoop/do-component-build
> >>> +++ b/bigtop-packages/src/common/hadoop/do-component-build
> >>> @@ -146,3 +146,6 @@ cp -r target/staging/hadoop-project build/share/doc
> >>>
> >>> # Copy fuse output to the build directory
> >>> cp
> >>>
> >>>
> >>
> hadoop-hdfs-project/hadoop-hdfs-native-client/target/main/native/fuse-dfs/fuse_dfs
> >>> build/bin
> >>> +
> >>> +# Copy binary build result
> >>> +cp hadoop-dist/target/hadoop-${HADOOP_VERSION}.tar.gz ..
> >>>
> >>>
> >>> Best regards,
> >>> -Guodong
> >>>
> >>
>

Re: RFC: Building .tar.gz binary packaging for all components in Bigtop

Posted by Olaf Flebbe <of...@oflebbe.de>.
Hi

Maybe we have different target groups: For educational users, trying to figure out how everything works together, the single computer cluster is great.

This Tar.gz artefact were never ever been sufficient to run in production. They did not contain 64bit libs for instance. They do not provide start scripts, error prone when you accidently using the wrong java, do not place logfiles in suitable places and so on. 

This tar artefacts and instructions are POC quality, that's it.  

Please be aware that Bigtop changes directory layout, directory permissions, configuration and startup of the original code in order to support large scale installations and automation.

We have a much better alternative: Distribution integrated repositories. Youll only have to apt/yum install hadoop-hdfs-datanode and everything is already setup, including java, runscripts, directory layout and users. Then you can concentrate on configuring that beast. You do not even need no special instruction, since it is distribution native -- well, almost. If you feel that bigtops runscripts / layout is not suitable , you are very welcome to contribute. 

If i look at the instructions [1] . they contain tons of settings you should do. With our packages you can actually do exactly this, without bothering to install and configure all the dependencies. 

Olaf





Von meinem iPad gesendet

> Am 18.07.2019 um 08:57 schrieb Guodong Xu <gu...@linaro.org>:
> 
> Hi, Evans
> 
> Comments in below.
> 
>> 	On Tue, Jul 16, 2019 at 10:46 PM Evans Ye <ev...@apache.org> wrote:
>> 
>> I'm not objecting this, but I'd love to have more discussion to figure out
>> whether this is the right thing to do. What I get from your proposal is
>> users want to do things which RPM/DEB don't do well while tar.gz is good at
>> it. However is it able to do it in another way which is far more beneficial
>> in more scenarios? In general, it's like when users are asking for a faster
>> horses, can we come up with cars?
>> 
>> How about we start with the first step which is to elaborate why users
>> choose to go for tar.gz?
> 
> 
> Right, agree. Here is what I learned from users of Arm servers:
> 
> One background for this is, currently, CDH and Hortonworks both have no
> official release for Arm server yet. So, Bigtop is the only available and
> verified distribution to them. As you know, with effort from the community
> and Arm Inc., Linaro, Bigtop now has officially supported Arm64.
> 
> To these users, before they start to use Bigtop on Arm, they are already
> familiar with each component's individual installation and usage on x86.
> Most of them are released in .tar.gz format. (i.e. Most apache big data
> component doesn't release in deb/rpm. So, if we tell users that the only
> available format in Bigtop is deb/rpm, this just hesitates them).
> 
> Eg. For Hadoop, the official site for installation is here [1] and here
> [2], their release format is .tar.gz.
> 
> So,
> 1. using .tar.gz is the minimum effort route for them to start their touch
> with Bigtop (if we can support .tar.gz).
> 2. Bigtop has all components tested. That provides very big confidence to
> the users. So they like to use the binary built from Bigtop on Arm64
> servers.
> 
> [1]
> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html
> [2] https://www.apache.org/dyn/closer.cgi/hadoop/common/
> 
> -Guodong
> 
> 
>> 
>> Guodong Xu <gu...@linaro.org> 於 2019年7月16日 週二 下午12:24寫道:
>> 
>>> Hi, all
>>> 
>>> This is a request for comments for a potential new feature. Please have a
>>> look and let me know whether this adds value to the community. Appreciate
>>> your opinions on this.
>>> 
>>> As most of you know already, so far, Bigtop provides two packaging
>> format:
>>> deb and rpm. Users are recommended to install through apt and yum for
>> their
>>> respective linux distributions.
>>> 
>>> Yet, there are still users coming to me and asking for the more
>> traditional
>>> .tar.gz packages. I know that shouldn't be a problem for x86 users, since
>>> they can always download .tar.gz bin releases from each component's
>>> official release website. But for users from other architectures, like
>>> Arm64 and powerpc, story is different. .tar.gz are not readily available
>> to
>>> these architectures.
>>> 
>>> In Bigtop, we are doing the job of building/packaging/testing big data
>>> components in one single place. So, it makes sense that we step in, and
>>> during our process of building each component, we keep a binary .tar.gz
>>> copy for each component we supported.
>>> 
>>> If you are supportive to the above idea, then next step will be how to
>>> achieve that. I did some research actually. So, my method is to do two
>>> things:
>>> 0. this can be a subsequent task of either $target-deb or $target-rpm. It
>>> depends on one of them two.
>>> 1. add tar and cp into each component's 'do-component-build' file.
>>> 2. add a new set of packaging tasks for each component. This means to
>> add a
>>> new task in genTasks() in packages.gradle.
>>>  Something like
>>> Task t = task "${target}-bin-tar" (
>>> description: "Building .tar.gz binary for $target artifacts",
>>> group: PACKAGES_GROUP) doLast { ...
>>> 
>>> 
>>> 
>>> PS:
>>> An example in 'do-component-build' for hadoop. Other components similar.
>>> 
>>> diff --git a/bigtop-packages/src/common/hadoop/do-component-build
>>> b/bigtop-packages/src/common/hadoop/do-component-build
>>> index 2a1a6345..cb8ddd9d 100644
>>> --- a/bigtop-packages/src/common/hadoop/do-component-build
>>> +++ b/bigtop-packages/src/common/hadoop/do-component-build
>>> @@ -146,3 +146,6 @@ cp -r target/staging/hadoop-project build/share/doc
>>> 
>>> # Copy fuse output to the build directory
>>> cp
>>> 
>>> 
>> hadoop-hdfs-project/hadoop-hdfs-native-client/target/main/native/fuse-dfs/fuse_dfs
>>> build/bin
>>> +
>>> +# Copy binary build result
>>> +cp hadoop-dist/target/hadoop-${HADOOP_VERSION}.tar.gz ..
>>> 
>>> 
>>> Best regards,
>>> -Guodong
>>> 
>> 

Re: RFC: Building .tar.gz binary packaging for all components in Bigtop

Posted by Guodong Xu <gu...@linaro.org>.
Hi, Evans

Comments in below.

On Tue, Jul 16, 2019 at 10:46 PM Evans Ye <ev...@apache.org> wrote:

> I'm not objecting this, but I'd love to have more discussion to figure out
> whether this is the right thing to do. What I get from your proposal is
> users want to do things which RPM/DEB don't do well while tar.gz is good at
> it. However is it able to do it in another way which is far more beneficial
> in more scenarios? In general, it's like when users are asking for a faster
> horses, can we come up with cars?
>
> How about we start with the first step which is to elaborate why users
> choose to go for tar.gz?


Right, agree. Here is what I learned from users of Arm servers:

One background for this is, currently, CDH and Hortonworks both have no
official release for Arm server yet. So, Bigtop is the only available and
verified distribution to them. As you know, with effort from the community
and Arm Inc., Linaro, Bigtop now has officially supported Arm64.

To these users, before they start to use Bigtop on Arm, they are already
familiar with each component's individual installation and usage on x86.
Most of them are released in .tar.gz format. (i.e. Most apache big data
component doesn't release in deb/rpm. So, if we tell users that the only
available format in Bigtop is deb/rpm, this just hesitates them).

 Eg. For Hadoop, the official site for installation is here [1] and here
[2], their release format is .tar.gz.

So,
1. using .tar.gz is the minimum effort route for them to start their touch
with Bigtop (if we can support .tar.gz).
2. Bigtop has all components tested. That provides very big confidence to
the users. So they like to use the binary built from Bigtop on Arm64
servers.

[1]
https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html
[2] https://www.apache.org/dyn/closer.cgi/hadoop/common/

-Guodong


>
> Guodong Xu <gu...@linaro.org> 於 2019年7月16日 週二 下午12:24寫道:
>
> > Hi, all
> >
> > This is a request for comments for a potential new feature. Please have a
> > look and let me know whether this adds value to the community. Appreciate
> > your opinions on this.
> >
> > As most of you know already, so far, Bigtop provides two packaging
> format:
> > deb and rpm. Users are recommended to install through apt and yum for
> their
> > respective linux distributions.
> >
> > Yet, there are still users coming to me and asking for the more
> traditional
> > .tar.gz packages. I know that shouldn't be a problem for x86 users, since
> > they can always download .tar.gz bin releases from each component's
> > official release website. But for users from other architectures, like
> > Arm64 and powerpc, story is different. .tar.gz are not readily available
> to
> > these architectures.
> >
> > In Bigtop, we are doing the job of building/packaging/testing big data
> > components in one single place. So, it makes sense that we step in, and
> > during our process of building each component, we keep a binary .tar.gz
> > copy for each component we supported.
> >
> > If you are supportive to the above idea, then next step will be how to
> > achieve that. I did some research actually. So, my method is to do two
> > things:
> > 0. this can be a subsequent task of either $target-deb or $target-rpm. It
> > depends on one of them two.
> > 1. add tar and cp into each component's 'do-component-build' file.
> > 2. add a new set of packaging tasks for each component. This means to
> add a
> > new task in genTasks() in packages.gradle.
> >   Something like
> > Task t = task "${target}-bin-tar" (
> > description: "Building .tar.gz binary for $target artifacts",
> > group: PACKAGES_GROUP) doLast { ...
> >
> >
> >
> > PS:
> > An example in 'do-component-build' for hadoop. Other components similar.
> >
> > diff --git a/bigtop-packages/src/common/hadoop/do-component-build
> > b/bigtop-packages/src/common/hadoop/do-component-build
> > index 2a1a6345..cb8ddd9d 100644
> > --- a/bigtop-packages/src/common/hadoop/do-component-build
> > +++ b/bigtop-packages/src/common/hadoop/do-component-build
> > @@ -146,3 +146,6 @@ cp -r target/staging/hadoop-project build/share/doc
> >
> >  # Copy fuse output to the build directory
> >  cp
> >
> >
> hadoop-hdfs-project/hadoop-hdfs-native-client/target/main/native/fuse-dfs/fuse_dfs
> > build/bin
> > +
> > +# Copy binary build result
> > +cp hadoop-dist/target/hadoop-${HADOOP_VERSION}.tar.gz ..
> >
> >
> > Best regards,
> > -Guodong
> >
>

Re: RFC: Building .tar.gz binary packaging for all components in Bigtop

Posted by Evans Ye <ev...@apache.org>.
I'm not objecting this, but I'd love to have more discussion to figure out
whether this is the right thing to do. What I get from your proposal is
users want to do things which RPM/DEB don't do well while tar.gz is good at
it. However is it able to do it in another way which is far more beneficial
in more scenarios? In general, it's like when users are asking for a faster
horses, can we come up with cars?

How about we start with the first step which is to elaborate why users
choose to go for tar.gz?

Guodong Xu <gu...@linaro.org> 於 2019年7月16日 週二 下午12:24寫道:

> Hi, all
>
> This is a request for comments for a potential new feature. Please have a
> look and let me know whether this adds value to the community. Appreciate
> your opinions on this.
>
> As most of you know already, so far, Bigtop provides two packaging format:
> deb and rpm. Users are recommended to install through apt and yum for their
> respective linux distributions.
>
> Yet, there are still users coming to me and asking for the more traditional
> .tar.gz packages. I know that shouldn't be a problem for x86 users, since
> they can always download .tar.gz bin releases from each component's
> official release website. But for users from other architectures, like
> Arm64 and powerpc, story is different. .tar.gz are not readily available to
> these architectures.
>
> In Bigtop, we are doing the job of building/packaging/testing big data
> components in one single place. So, it makes sense that we step in, and
> during our process of building each component, we keep a binary .tar.gz
> copy for each component we supported.
>
> If you are supportive to the above idea, then next step will be how to
> achieve that. I did some research actually. So, my method is to do two
> things:
> 0. this can be a subsequent task of either $target-deb or $target-rpm. It
> depends on one of them two.
> 1. add tar and cp into each component's 'do-component-build' file.
> 2. add a new set of packaging tasks for each component. This means to add a
> new task in genTasks() in packages.gradle.
>   Something like
> Task t = task "${target}-bin-tar" (
> description: "Building .tar.gz binary for $target artifacts",
> group: PACKAGES_GROUP) doLast { ...
>
>
>
> PS:
> An example in 'do-component-build' for hadoop. Other components similar.
>
> diff --git a/bigtop-packages/src/common/hadoop/do-component-build
> b/bigtop-packages/src/common/hadoop/do-component-build
> index 2a1a6345..cb8ddd9d 100644
> --- a/bigtop-packages/src/common/hadoop/do-component-build
> +++ b/bigtop-packages/src/common/hadoop/do-component-build
> @@ -146,3 +146,6 @@ cp -r target/staging/hadoop-project build/share/doc
>
>  # Copy fuse output to the build directory
>  cp
>
> hadoop-hdfs-project/hadoop-hdfs-native-client/target/main/native/fuse-dfs/fuse_dfs
> build/bin
> +
> +# Copy binary build result
> +cp hadoop-dist/target/hadoop-${HADOOP_VERSION}.tar.gz ..
>
>
> Best regards,
> -Guodong
>