You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@bigtop.apache.org by jay vyas <ja...@gmail.com> on 2015/06/15 18:22:14 UTC

Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?

Hi folks.   Every few months, i try to reboot the conversation about the
next generation of bigtop.

There are 3 things which i think we should consider : A backplane (rather
than deploy to machines, the meaning of the term "ecosystem" in a
post-spark in-memory apacolypse, and containerization.

1) BACKPLANE: The new trend is to have a backplane that provides networking
abstractions for you (mesos, kubernetes, yarn, and so on).   Is it time for
us to pick a resource manager?

2) ECOSYSTEM?: Nowadays folks don't necessarily need the whole hadoop
ecosystem, and there is a huge shift to in-memory, monolithic stacks
happening (i.e. gridgain or spark can do what 90% of the hadoop ecosystem
already does, supporting streams, batch,sql all in one).

3) CONTAINERS:  we are doing a great job w/ docker in our build infra.  Is
it time to start experimenting with running docker tarballs ?

Combining 1+2+3 - i could see a useful bigdata upstream distro which (1)
just installed an HCFS implementation (gluster,HDFS,...) along side, say,
(2) mesos as a backplane for the tooling for [[ hbase + spark + ignite ]]
--- and then (3) do the integration testing of available mesos-framework
plugins for ignite and spark underneath.  If other folks are interested,
maybe we could create the "1x" or "in-memory" branch to start hacking on it
sometime ?    Maybe even bring the flink guys in as well, as they are
interested in bigtop packaging.



-- 
jay vyas

Re: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?

Posted by Andrew Purtell <ap...@apache.org>.

> Is it time for us to pick a resource manager?

Not if we want to be like a Debian for big data software.  I'm not sure we
want to limit our reach by being overly opinionated. With my user's hat on,
if we don't package Hadoop and YARN, then I wouldn't have any use for
Bigtop.

> Nowadays folks don't necessarily need the whole hadoop ecosystem, and
there is a huge shift to in-memory, monolithic stacks happening

A Bigtop user would only need to install the packages they would like to
use, right? Is this an argument for exclusion? Exclusion of what?

> Is it time to start experimenting with running docker tarballs ?

This sounds fine as an additional target for builds, but not if it leads
to a proposal to do away with
 the
OS native packaging.
That's useful too. Containers are trendy but not useful or even
appropriate for every environment or use case.
 




On Mon, Jun 15, 2015 at 9:22 AM, jay vyas <ja...@gmail.com>
wrote:

> Hi folks.   Every few months, i try to reboot the conversation about the
> next generation of bigtop.
>
> There are 3 things which i think we should consider : A backplane (rather
> than deploy to machines, the meaning of the term "ecosystem" in a
> post-spark in-memory apacolypse, and containerization.
>
> 1) BACKPLANE: The new trend is to have a backplane that provides
> networking abstractions for you (mesos, kubernetes, yarn, and so on).   Is
> it time for us to pick a resource manager?
>
> 2) ECOSYSTEM?: Nowadays folks don't necessarily need the whole hadoop
> ecosystem, and there is a huge shift to in-memory, monolithic stacks
> happening (i.e. gridgain or spark can do what 90% of the hadoop ecosystem
> already does, supporting streams, batch,sql all in one).
>
> 3) CONTAINERS:  we are doing a great job w/ docker in our build infra.  Is
> it time to start experimenting with running docker tarballs ?
>
> Combining 1+2+3 - i could see a useful bigdata upstream distro which (1)
> just installed an HCFS implementation (gluster,HDFS,...) along side, say,
> (2) mesos as a backplane for the tooling for [[ hbase + spark + ignite ]]
> --- and then (3) do the integration testing of available mesos-framework
> plugins for ignite and spark underneath.  If other folks are interested,
> maybe we could create the "1x" or "in-memory" branch to start hacking on it
> sometime ?    Maybe even bring the flink guys in as well, as they are
> interested in bigtop packaging.
>
>
>
> --
> jay vyas
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?

Posted by Jay Vyas <ja...@gmail.com>.

Your right Bruno , I could, but I have no need of such a thing:)....  

And in any case --- this thread is just about sharing ideas, letting the whole community speak up about their opinions on the future of bigtop. it's not about driving a particular project direction. 

Bigtop is a unique project in that we integrate a lot of tools in a rapidly changing landscape, so it's good to have some feelers out there to see what our users are thinking.  

Thanks all for the feedback, hope to get more!
 
> On Jun 16, 2015, at 2:11 AM, Bruno Mahé <bm...@apache.org> wrote:
> 
>> On 06/15/2015 09:22 AM, jay vyas wrote:
>> Hi folks.   Every few months, i try to reboot the conversation about the next generation of bigtop.
>> 
>> There are 3 things which i think we should consider : A backplane (rather than deploy to machines, the meaning of the term "ecosystem" in a post-spark in-memory apacolypse, and containerization.
>> 
>> 1) BACKPLANE: The new trend is to have a backplane that provides networking abstractions for you (mesos, kubernetes, yarn, and so on).   Is it time for us to pick a resource manager?
>> 
>> 2) ECOSYSTEM?: Nowadays folks don't necessarily need the whole hadoop ecosystem, and there is a huge shift to in-memory, monolithic stacks happening (i.e. gridgain or spark can do what 90% of the hadoop ecosystem already does, supporting streams, batch,sql all in one).
>> 
>> 3) CONTAINERS:  we are doing a great job w/ docker in our build infra.  Is it time to start experimenting with running docker tarballs ?
>> 
>> Combining 1+2+3 - i could see a useful bigdata upstream distro which (1) just installed an HCFS implementation (gluster,HDFS,...) along side, say, (2) mesos as a backplane for the tooling for [[ hbase + spark + ignite ]] --- and then (3) do the integration testing of available mesos-framework plugins for ignite and spark underneath.  If other folks are interested, maybe we could create the "1x" or "in-memory" branch to start hacking on it sometime ?    Maybe even bring the flink guys in as well, as they are interested in bigtop packaging.
>> 
>> 
>> 
>> -- 
>> jay vyas
> 
> 
> I have roughly the same position as Andrew on that matter.
> 
> What prevents you from starting something yourself to start hacking on it?
> 
> 
> Thanks,
> Bruno

Re: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?

Posted by Jay Vyas <ja...@gmail.com>.

Your right Bruno , I could, but I have no need of such a thing:)....  

And in any case --- this thread is just about sharing ideas, letting the whole community speak up about their opinions on the future of bigtop. it's not about driving a particular project direction. 

Bigtop is a unique project in that we integrate a lot of tools in a rapidly changing landscape, so it's good to have some feelers out there to see what our users are thinking.  

Thanks all for the feedback, hope to get more!
 
> On Jun 16, 2015, at 2:11 AM, Bruno Mahé <bm...@apache.org> wrote:
> 
>> On 06/15/2015 09:22 AM, jay vyas wrote:
>> Hi folks.   Every few months, i try to reboot the conversation about the next generation of bigtop.
>> 
>> There are 3 things which i think we should consider : A backplane (rather than deploy to machines, the meaning of the term "ecosystem" in a post-spark in-memory apacolypse, and containerization.
>> 
>> 1) BACKPLANE: The new trend is to have a backplane that provides networking abstractions for you (mesos, kubernetes, yarn, and so on).   Is it time for us to pick a resource manager?
>> 
>> 2) ECOSYSTEM?: Nowadays folks don't necessarily need the whole hadoop ecosystem, and there is a huge shift to in-memory, monolithic stacks happening (i.e. gridgain or spark can do what 90% of the hadoop ecosystem already does, supporting streams, batch,sql all in one).
>> 
>> 3) CONTAINERS:  we are doing a great job w/ docker in our build infra.  Is it time to start experimenting with running docker tarballs ?
>> 
>> Combining 1+2+3 - i could see a useful bigdata upstream distro which (1) just installed an HCFS implementation (gluster,HDFS,...) along side, say, (2) mesos as a backplane for the tooling for [[ hbase + spark + ignite ]] --- and then (3) do the integration testing of available mesos-framework plugins for ignite and spark underneath.  If other folks are interested, maybe we could create the "1x" or "in-memory" branch to start hacking on it sometime ?    Maybe even bring the flink guys in as well, as they are interested in bigtop packaging.
>> 
>> 
>> 
>> -- 
>> jay vyas
> 
> 
> I have roughly the same position as Andrew on that matter.
> 
> What prevents you from starting something yourself to start hacking on it?
> 
> 
> Thanks,
> Bruno

Re: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?

Posted by Bruno Mahé <bm...@apache.org>.

On 06/15/2015 09:22 AM, jay vyas wrote:
> Hi folks.   Every few months, i try to reboot the conversation about 
> the next generation of bigtop.
>
> There are 3 things which i think we should consider : A backplane 
> (rather than deploy to machines, the meaning of the term "ecosystem" 
> in a post-spark in-memory apacolypse, and containerization.
>
> 1) BACKPLANE: The new trend is to have a backplane that provides 
> networking abstractions for you (mesos, kubernetes, yarn, and so 
> on).   Is it time for us to pick a resource manager?
>
> 2) ECOSYSTEM?: Nowadays folks don't necessarily need the whole hadoop 
> ecosystem, and there is a huge shift to in-memory, monolithic stacks 
> happening (i.e. gridgain or spark can do what 90% of the hadoop 
> ecosystem already does, supporting streams, batch,sql all in one).
>
> 3) CONTAINERS:  we are doing a great job w/ docker in our build 
> infra.  Is it time to start experimenting with running docker tarballs ?
>
> Combining 1+2+3 - i could see a useful bigdata upstream distro which 
> (1) just installed an HCFS implementation (gluster,HDFS,...) along 
> side, say, (2) mesos as a backplane for the tooling for [[ hbase + 
> spark + ignite ]] --- and then (3) do the integration testing of 
> available mesos-framework plugins for ignite and spark underneath.  If 
> other folks are interested, maybe we could create the "1x" or 
> "in-memory" branch to start hacking on it sometime ?    Maybe even 
> bring the flink guys in as well, as they are interested in bigtop 
> packaging.
>
>
>
> -- 
> jay vyas


I have roughly the same position as Andrew on that matter.

What prevents you from starting something yourself to start hacking on it?


Thanks,
Bruno

Re: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?

Posted by Andrew Purtell <ap...@apache.org>.

> Is it time for us to pick a resource manager?

Not if we want to be like a Debian for big data software.  I'm not sure we
want to limit our reach by being overly opinionated. With my user's hat on,
if we don't package Hadoop and YARN, then I wouldn't have any use for
Bigtop.

> Nowadays folks don't necessarily need the whole hadoop ecosystem, and
there is a huge shift to in-memory, monolithic stacks happening

A Bigtop user would only need to install the packages they would like to
use, right? Is this an argument for exclusion? Exclusion of what?

> Is it time to start experimenting with running docker tarballs ?

This sounds fine as an additional target for builds, but not if it leads
to a proposal to do away with
 the
OS native packaging.
That's useful too. Containers are trendy but not useful or even
appropriate for every environment or use case.
 




On Mon, Jun 15, 2015 at 9:22 AM, jay vyas <ja...@gmail.com>
wrote:

> Hi folks.   Every few months, i try to reboot the conversation about the
> next generation of bigtop.
>
> There are 3 things which i think we should consider : A backplane (rather
> than deploy to machines, the meaning of the term "ecosystem" in a
> post-spark in-memory apacolypse, and containerization.
>
> 1) BACKPLANE: The new trend is to have a backplane that provides
> networking abstractions for you (mesos, kubernetes, yarn, and so on).   Is
> it time for us to pick a resource manager?
>
> 2) ECOSYSTEM?: Nowadays folks don't necessarily need the whole hadoop
> ecosystem, and there is a huge shift to in-memory, monolithic stacks
> happening (i.e. gridgain or spark can do what 90% of the hadoop ecosystem
> already does, supporting streams, batch,sql all in one).
>
> 3) CONTAINERS:  we are doing a great job w/ docker in our build infra.  Is
> it time to start experimenting with running docker tarballs ?
>
> Combining 1+2+3 - i could see a useful bigdata upstream distro which (1)
> just installed an HCFS implementation (gluster,HDFS,...) along side, say,
> (2) mesos as a backplane for the tooling for [[ hbase + spark + ignite ]]
> --- and then (3) do the integration testing of available mesos-framework
> plugins for ignite and spark underneath.  If other folks are interested,
> maybe we could create the "1x" or "in-memory" branch to start hacking on it
> sometime ?    Maybe even bring the flink guys in as well, as they are
> interested in bigtop packaging.
>
>
>
> --
> jay vyas
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?

Posted by Roman Shaposhnik <ro...@shaposhnik.org>.

On Mon, Jun 15, 2015 at 9:22 AM, jay vyas <ja...@gmail.com> wrote:
> Hi folks.   Every few months, i try to reboot the conversation about the
> next generation of bigtop.
>
> There are 3 things which i think we should consider : A backplane (rather
> than deploy to machines, the meaning of the term "ecosystem" in a post-spark
> in-memory apacolypse, and containerization.
>
> 1) BACKPLANE: The new trend is to have a backplane that provides networking
> abstractions for you (mesos, kubernetes, yarn, and so on).   Is it time for
> us to pick a resource manager?

Let me rephrase the above and see if we're talking about the same thing. To
me your question is really about "what does a datacenter look like to Bigtop".
Today a datacenter looks to Bigtop as a bunch of individual nodes running
some kind of a Linux distribution. What you seem to be asking is that whether
it is time for us to embrace the vision of a datacenter that looks like mesos,
etc. Correct?

Also, I don't think you're suggesting that we drop the bread-n-butter of Bigtop,
but I still need to make sure.

> 2) ECOSYSTEM?: Nowadays folks don't necessarily need the whole hadoop
> ecosystem, and there is a huge shift to in-memory, monolithic stacks
> happening (i.e. gridgain or spark can do what 90% of the hadoop ecosystem
> already does, supporting streams, batch,sql all in one).

Correct. That said, I'm not sure what it means for Bigtop.

> 3) CONTAINERS:  we are doing a great job w/ docker in our build infra.  Is
> it time to start experimenting with running docker tarballs ?

I think it is time, but

> Combining 1+2+3 - i could see a useful bigdata upstream distro which (1)
> just installed an HCFS implementation (gluster,HDFS,...) along side, say,
> (2) mesos as a backplane for the tooling for [[ hbase + spark + ignite ]]
> --- and then (3) do the integration testing of available mesos-framework
> plugins for ignite and spark underneath.  If other folks are interested,
> maybe we could create the "1x" or "in-memory" branch to start hacking on it
> sometime ?    Maybe even bring the flink guys in as well, as they are
> interested in bigtop packaging.

I'm actually very curious about use cases that folks might have around
traditional Hadoop Distributions. What you're articulating above seems
like one of those use cases, but at this point I'm sort of lost as to
what's the most common use case.

Thanks,
Roman.

Re: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?

Posted by Evans Ye <ev...@apache.org>.

I personally like the idea of including mesos, docker stuff as
options we provided to users. No doubt that these solutions can gain more
eyeballs and attract new users/contributors to the bigtop family.
As mentioned above, the use case seems still not clear yet, but there might
be a chance that people will start to adopt them because of bigtop's
support.

Comming back to earth, to be honest I don't need thing like mesos, too. So
including it would be a little bit difficaut since there's no real demand
on it so far. Probably people who is intrested in the technology can start
with an alpha/experimental feature and see if lots of poeple has intrest on
it. The feature should be able to push in to our code base as long as
there's a maintainer who committed to maintain it. I can pair program with
Jay if you'd like to work on it. :)
 2015年6月17日 上午3:02於 "Andrew Purtell" <ap...@apache.org>寫道：

> > thanks andy - i agree with most of your opinions around continuing to
> build
> standard packages.. but can you clarify what was offensive ?  must be a
> misinterpretation somewhere.
>
> Sure.
>
> A bit offensive.
>
> "gridgain or spark can do what 90% of the hadoop ecosystem already does,
> supporting streams, batch,sql all in one" -> This statement deprecates the
> utility of the labors of rest of the Hadoop ecosystem in favor of Gridgain
> and Spark. As a gross generalization it's unlikely to be a helpful
> statement in any case.
>
> It's fine if we all have our favorites, of course. I think we're set up
> well to empirically determine winners and losers, we don't need to make
> partisan statements. Those components that get some user interest in the
> form of contributions that keep them building and happy in Bigtop will stay
> in. Those that do not get the necessary attention will have to be culled
> out over time when and if they fail to compile or pass integration tests.
>
>
> On Mon, Jun 15, 2015 at 11:42 AM, jay vyas <ja...@gmail.com>
> wrote:
>
>> thanks andy - i agree with most of your opinions around continuing to
>> build
>> standard packages.. but can you clarify what was offensive ?  must be a
>> misinterpretation somewhere.
>>
>> 1) To be clear, i am 100% behind supporting standard hadoop build rpms
>> that
>> we have now.   Thats the core product and will be for  the forseeable
>> future, absolutely !
>>
>> 2) The idea (and its just an idea i want to throw out - to keep us on our
>> toes), is that some folks may be interested in hacking around, in a
>> separate branch - on some bleeding edge bigdata deployments - which
>> attempts to incorporate resource managers and  containers as first-class
>> citizens.
>>
>> Again this is all just ideas - not in any way meant to derail the
>> packaging
>> efforts - but rather - just to gauge folks interest level in the bleeding
>> edge, docker, mesos, simplified  processing stacks, and so on.
>>
>>
>>
>> On Mon, Jun 15, 2015 at 12:39 PM, Andrew Purtell <ap...@apache.org>
>> wrote:
>>
>> > > gridgain or spark can do what 90% of the hadoop ecosystem already
>> does,
>> > supporting streams, batch,sql all in one)
>> >
>> > If something like this becomes the official position of the Bigtop
>> > project, some day, then it will turn off people. I can see where you are
>> > coming from, I think. Correct me if I'm wrong: We have limited
>> bandwidth,
>> > we should move away from Roman et. al.'s vision of Bigtop as an
>> inclusive
>> > distribution of big data packages, and instead become highly opinionated
>> > and tightly focused. If that's accurate, I can sum up my concern as
>> > follows: To the degree we become more opinionated, the less we may have
>> to
>> > look at in terms of inclusion - both software and user communities. For
>> > example, I find the above quoted statement a bit offensive as a
>> participant
>> > on not-Spark and not-Gridgain projects. I roll my eyes sometimes at the
>> > Docker over-hype. Is there still a place for me here?
>> >
>> >
>> >
>> > On Mon, Jun 15, 2015 at 9:22 AM, jay vyas <ja...@gmail.com>
>> > wrote:
>> >
>> >> Hi folks.   Every few months, i try to reboot the conversation about
>> the
>> >> next generation of bigtop.
>> >>
>> >> There are 3 things which i think we should consider : A backplane
>> (rather
>> >> than deploy to machines, the meaning of the term "ecosystem" in a
>> >> post-spark in-memory apacolypse, and containerization.
>> >>
>> >> 1) BACKPLANE: The new trend is to have a backplane that provides
>> >> networking abstractions for you (mesos, kubernetes, yarn, and so on).
>>  Is
>> >> it time for us to pick a resource manager?
>> >>
>> >> 2) ECOSYSTEM?: Nowadays folks don't necessarily need the whole hadoop
>> >> ecosystem, and there is a huge shift to in-memory, monolithic stacks
>> >> happening (i.e. gridgain or spark can do what 90% of the hadoop
>> ecosystem
>> >> already does, supporting streams, batch,sql all in one).
>> >>
>> >> 3) CONTAINERS:  we are doing a great job w/ docker in our build infra.
>> >> Is it time to start experimenting with running docker tarballs ?
>> >>
>> >> Combining 1+2+3 - i could see a useful bigdata upstream distro which
>> (1)
>> >> just installed an HCFS implementation (gluster,HDFS,...) along side,
>> say,
>> >> (2) mesos as a backplane for the tooling for [[ hbase + spark + ignite
>> ]]
>> >> --- and then (3) do the integration testing of available
>> mesos-framework
>> >> plugins for ignite and spark underneath.  If other folks are
>> interested,
>> >> maybe we could create the "1x" or "in-memory" branch to start hacking
>> on it
>> >> sometime ?    Maybe even bring the flink guys in as well, as they are
>> >> interested in bigtop packaging.
>> >>
>> >>
>> >>
>> >> --
>> >> jay vyas
>> >>
>> >
>> >
>> >
>> > --
>> > Best regards,
>> >
>> >    - Andy
>> >
>> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
>> > (via Tom White)
>> >
>>
>>
>>
>> --
>> jay vyas
>>
>
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>

RE: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?

Posted by Evans Ye <ev...@apache.org>.

Awesome！
Glad to know other people's thought.
I think one of the major thing above those is the documentation. We have
done so many work making bigtop 1.0 a great software. Now we just need to
show people how to use it and what it is capable of doing.

I'll be able to write down some guide in bigtop provisioner and CI infra
stuff. But just want to push 1.0 release forward now with my limited cycle.

I'd also love to see more CI job we can run in our public Jenkins as show
cases.

Somke tests
Integration tests
Puppet recipes
BigpetStore
...

On the other hand, there are stil lots of patch available Jira out there.
We should have 1.0 release soon and get to those nice feature onboard. :)
Building on conversations pre/during/post Apachecon and looking at the post
1.0 bigtop focus and efforts, want to lay out a few things, get peoples
comments.  Seems to be some consensus that the project can look towards
serving end application/data developers more going forward, while
continuing the tradition of the projects build/pkg/test/deploy roots.

I have spent the past couple months, and heavily the past 3 or so weeks,
talking to many different potential end users at meetups, conferences,
etc.., also having some great conversations with commercial open source
vendors that are interested in what a "future bigtop" can be and what it
could provide to users.

I believe we need to put some focused effort into few foundational things
to put the project in a position to move faster and attract a wider range
of users as well as new contributors.

-----------
CI "2.0"
-----------

Start of this is already underway based on the work roman started last year
and continuing effort with new setup and enhancement on bigtop AWS
infrastructure, Evans has been pushing this along into the 1.0 release.
Speed of getting new packages built and up to date needs to increase so
releases can happen at a regular clip.., even looking towards user friendly
"ad-hoc" bigtop builds where users could quickly choose the 2,3,4,etc
components they want and have a stack around that.

Related to this, hoping the group can come to some idea/agreement on some
semver style versioning for the project post 1.0.  I think this could set a
path forward for releases that can happen faster, while not holding up the
whole train if a single "smaller" component has a couple issues that
cant/wont be resolved by the main stakeholders or interested parties in
said component.  An example might be new pig or sqoop having issues.., the
1.2 release would still go out the door with 1.2.1 coming days/weeks later
once new pig or sqoop was fixed up.

---------------------------------------------
Proper package repository hosting
---------------------------------------------

I put together a little test setup based on the 0.8 assets, we can probably
build off of that with 1.0, working towards the CI automatically posting
nightly (or just-in-time) builds off latest so people can play around.
Debs/rpms seem should be the focal pt of output for the project assets,
everything else is additive and builds off of that (ie: user who says "I am
not a puppet shop so don’t care about the modules.., but do my own
automation and if you point me to some sane repositories I can do the rest
myself with couple decent getting started steps")

-----------------------------------------------------------------
Greatly increasing the UX and getting started content
-----------------------------------------------------------------

This is the big one.., new website, focused docs and getting started
examples for end users, other specific content for contributors.  I will be
starting to put some cycles into new website jira probably starting next
week, will try to scoot through it and start posting some working examples
for feedback once something basic is in place.  For those interested in
helping out on doc work and getting started content let me know.., looking
at subjects like:

   -Developer getting started
         -using the packages
         -using puppet modules and deployment options
         -deploying reference example stacks
         -setting up your own big data CI
         -etc

   -Contributing to Bigtop:
         -how to submit your first patch/pull-request
         -adding new component (step by step, canned learning component
example, etc)
         -adding tests to an existing component (steps, canned hello world
example test, etc)
         -writing your own test data generator
         -etc

Those are some thoughts and couple initial focal areas that are driving me
around bigtop participation

-----Original Message-----
From: Andrew Purtell [mailto:apurtell@apache.org]
Sent: Tuesday, June 16, 2015 12:02 PM
To: dev@bigtop.apache.org
Cc: user@bigtop.apache.org
Subject: Re: Rebooting the conversation on the Future of bigtop:
Abstracting the backplane ? Containers?

> thanks andy - i agree with most of your opinions around continuing to
build
standard packages.. but can you clarify what was offensive ?  must be a
misinterpretation somewhere.

Sure.

A bit offensive.

"gridgain or spark can do what 90% of the hadoop ecosystem already does,
supporting streams, batch,sql all in one" -> This statement deprecates the
utility of the labors of rest of the Hadoop ecosystem in favor of Gridgain
and Spark. As a gross generalization it's unlikely to be a helpful
statement in any case.

It's fine if we all have our favorites, of course. I think we're set up
well to empirically determine winners and losers, we don't need to make
partisan statements. Those components that get some user interest in the
form of contributions that keep them building and happy in Bigtop will stay
in. Those that do not get the necessary attention will have to be culled
out over time when and if they fail to compile or pass integration tests.

On Mon, Jun 15, 2015 at 11:42 AM, jay vyas <ja...@gmail.com>
wrote:

> thanks andy - i agree with most of your opinions around continuing to
> build standard packages.. but can you clarify what was offensive ?
> must be a misinterpretation somewhere.
>
> 1) To be clear, i am 100% behind supporting standard hadoop build rpms
that
> we have now.   Thats the core product and will be for  the forseeable
> future, absolutely !
>
> 2) The idea (and its just an idea i want to throw out - to keep us on
> our toes), is that some folks may be interested in hacking around, in
> a separate branch - on some bleeding edge bigdata deployments - which
> attempts to incorporate resource managers and  containers as
> first-class citizens.
>
> Again this is all just ideas - not in any way meant to derail the
> packaging efforts - but rather - just to gauge folks interest level in
> the bleeding edge, docker, mesos, simplified  processing stacks, and so
on.
>
>
>
> On Mon, Jun 15, 2015 at 12:39 PM, Andrew Purtell <ap...@apache.org>
> wrote:
>
> > > gridgain or spark can do what 90% of the hadoop ecosystem already
> > > does,
> > supporting streams, batch,sql all in one)
> >
> > If something like this becomes the official position of the Bigtop
> > project, some day, then it will turn off people. I can see where you
> > are coming from, I think. Correct me if I'm wrong: We have limited
> > bandwidth, we should move away from Roman et. al.'s vision of Bigtop
> > as an inclusive distribution of big data packages, and instead
> > become highly opinionated and tightly focused. If that's accurate, I
> > can sum up my concern as
> > follows: To the degree we become more opinionated, the less we may
> > have
> to
> > look at in terms of inclusion - both software and user communities.
> > For example, I find the above quoted statement a bit offensive as a
> participant
> > on not-Spark and not-Gridgain projects. I roll my eyes sometimes at
> > the Docker over-hype. Is there still a place for me here?
> >
> >
> >
> > On Mon, Jun 15, 2015 at 9:22 AM, jay vyas
> > <ja...@gmail.com>
> > wrote:
> >
> >> Hi folks.   Every few months, i try to reboot the conversation about
the
> >> next generation of bigtop.
> >>
> >> There are 3 things which i think we should consider : A backplane
> (rather
> >> than deploy to machines, the meaning of the term "ecosystem" in a
> >> post-spark in-memory apacolypse, and containerization.
> >>
> >> 1) BACKPLANE: The new trend is to have a backplane that provides
> >> networking abstractions for you (mesos, kubernetes, yarn, and so on).
>  Is
> >> it time for us to pick a resource manager?
> >>
> >> 2) ECOSYSTEM?: Nowadays folks don't necessarily need the whole
> >> hadoop ecosystem, and there is a huge shift to in-memory,
> >> monolithic stacks happening (i.e. gridgain or spark can do what 90%
> >> of the hadoop
> ecosystem
> >> already does, supporting streams, batch,sql all in one).
> >>
> >> 3) CONTAINERS:  we are doing a great job w/ docker in our build infra.
> >> Is it time to start experimenting with running docker tarballs ?
> >>
> >> Combining 1+2+3 - i could see a useful bigdata upstream distro
> >> which (1) just installed an HCFS implementation (gluster,HDFS,...)
> >> along side,
> say,
> >> (2) mesos as a backplane for the tooling for [[ hbase + spark +
> >> ignite
> ]]
> >> --- and then (3) do the integration testing of available
> >> mesos-framework plugins for ignite and spark underneath.  If other
> >> folks are interested, maybe we could create the "1x" or "in-memory"
> >> branch to start hacking
> on it
> >> sometime ?    Maybe even bring the flink guys in as well, as they are
> >> interested in bigtop packaging.
> >>
> >>
> >>
> >> --
> >> jay vyas
> >>
> >
> >
> >
> > --
> > Best regards,
> >
> >    - Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet
> > Hein (via Tom White)
> >
>
>
>
> --
> jay vyas
>

--
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?

Posted by Andrew Purtell <ap...@apache.org>.

For as long as we support building components from GitHub repository by
SHA, we must support the local install steps in do-component-build scripts.
Otherwise the result cannot be transitively consistent.

We should not assume a Bigtop user will be building with a BOM full of
conveniently already released artifacts in public Maven repos (see above),
or even with direct access to public networks. It would be inconvenient but
I could see an extra 'getting started' step of setting up a local Nexus or
similar. However, when building from SHAs on a dev workstation the local
Maven cache seems actually the best option. Alternatively, we can declare
this use case is no longer supported. That would make me sad.

Would also be great if we can continue supporting Bigtop package builds on
build servers and developer workstations without requiring any specific
containerization technology (or any containerization at all). Giving people
an option to use Docker specific techniques is fine as long as it is
totally optional.

On Fri, Jun 19, 2015 at 11:26 PM, Bruno Mahé <bm...@apache.org> wrote:

>  Echoing both Nate and Evans, I would not limit ourselves based on the
> technology used for the build.
>
> However, I am not sure to completely follow option 3. We are doing that
> already for packages. For instance if package A depends on Apache
> Zookeeper., then the package A does depend on Apache Zookeeper and includes
> symlinks to the Apache Zookeeper library provided by the Apache Zookeeper
> package.
>
>
> Thanks,
> Bruno
>
>
>
> On 06/19/2015 12:47 PM, nate@reactor8.com wrote:
>
>  Echoing Evans, think we should not be worried about stateless vs
> non-stateless containers.., seems core idea and need to is optimize the
> build process and maximize re-use whether on host or container machines or
> build environments.
>
>
>
> Added sub-task with Olaf’s idea to Evans umbrella CI task, currently
> marked it for 1.1:
>
>
>
> https://issues.apache.org/jira/browse/BIGTOP-1906
>
>
>
>
>
>
>
> *From:* Evans Ye [mailto:evansye@apache.org <ev...@apache.org>]
> *Sent:* Friday, June 19, 2015 7:16 AM
> *To:* user@bigtop.apache.org
> *Subject:* Re: Rebooting the conversation on the Future of bigtop:
> Abstracting the backplane ? Containers?
>
>
>
> I thnk it's not a problem that container is not stateless. No matter how
> we should have CI jobs that builds all the artifacts and store them as
> official repos.
> You point out an important thing that is the mvn install is the key
> feature to propergate self patched components around. If we disable this
> than there's no reason to build jars by ourselves. I'm +1 to option 2.
>
> 2015年6月19日 上午5:59於 "Olaf Flebbe" <of...@oflebbe.de>寫道：
>
>
> > Am 18.06.2015 um 23:57 schrieb jay vyas <ja...@gmail.com>:
> >
> > You can easily share the artifacts with a docker shared volume
> >
> > in the container "EXPORT M2_HOME=/container/m2/"
> >
> > follwed by
> >
> > "docker build -v ~/.m2/ /container/m2/ ........ "
> >
> > This will put the mvn jars into the host rather than the guest
> conatainer, so that they persist.
> >
> >
>
> Thats not the point. Containers are not stateless any more.
>
> Olaf
>
>
>

Re: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?

Posted by Bruno Mahé <bm...@apache.org>.

Echoing both Nate and Evans, I would not limit ourselves based on the 
technology used for the build.

However, I am not sure to completely follow option 3. We are doing that 
already for packages. For instance if package A depends on Apache 
Zookeeper., then the package A does depend on Apache Zookeeper and 
includes symlinks to the Apache Zookeeper library provided by the Apache 
Zookeeper package.


Thanks,
Bruno


On 06/19/2015 12:47 PM, nate@reactor8.com wrote:
>
> Echoing Evans, think we should not be worried about stateless vs 
> non-stateless containers.., seems core idea and need to is optimize 
> the build process and maximize re-use whether on host or container 
> machines or build environments.
>
> Added sub-task with Olaf’s idea to Evans umbrella CI task, currently 
> marked it for 1.1:
>
> https://issues.apache.org/jira/browse/BIGTOP-1906
>
> *From:*Evans Ye [mailto:evansye@apache.org]
> *Sent:* Friday, June 19, 2015 7:16 AM
> *To:* user@bigtop.apache.org
> *Subject:* Re: Rebooting the conversation on the Future of bigtop: 
> Abstracting the backplane ? Containers?
>
> I thnk it's not a problem that container is not stateless. No matter 
> how we should have CI jobs that builds all the artifacts and store 
> them as official repos.
> You point out an important thing that is the mvn install is the key 
> feature to propergate self patched components around. If we disable 
> this than there's no reason to build jars by ourselves. I'm +1 to 
> option 2.
>
> 2015年6月19日 上午5:59於 "Olaf Flebbe" <of@oflebbe.de 
> <ma...@oflebbe.de>>寫道：
>
>
>     > Am 18.06.2015 um 23:57 schrieb jay vyas
>     <jayunit100.apache@gmail.com <ma...@gmail.com>>:
>     >
>     > You can easily share the artifacts with a docker shared volume
>     >
>     > in the container "EXPORT M2_HOME=/container/m2/"
>     >
>     > follwed by
>     >
>     > "docker build -v ~/.m2/ /container/m2/ ........ "
>     >
>     > This will put the mvn jars into the host rather than the guest
>     conatainer, so that they persist.
>     >
>     >
>
>     Thats not the point. Containers are not stateless any more.
>
>     Olaf
>

RE: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?

Posted by na...@reactor8.com.

Echoing Evans, think we should not be worried about stateless vs non-stateless containers.., seems core idea and need to is optimize the build process and maximize re-use whether on host or container machines or build environments.

 

Added sub-task with Olaf’s idea to Evans umbrella CI task, currently marked it for 1.1:

 

https://issues.apache.org/jira/browse/BIGTOP-1906

 

 

 

From: Evans Ye [mailto:evansye@apache.org] 
Sent: Friday, June 19, 2015 7:16 AM
To: user@bigtop.apache.org
Subject: Re: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?

 

I thnk it's not a problem that container is not stateless. No matter how we should have CI jobs that builds all the artifacts and store them as official repos. 
You point out an important thing that is the mvn install is the key feature to propergate self patched components around. If we disable this than there's no reason to build jars by ourselves. I'm +1 to option 2.

2015年6月19日 上午5:59於 "Olaf Flebbe" <of@oflebbe.de <ma...@oflebbe.de> >寫道：


> Am 18.06.2015 um 23:57 schrieb jay vyas <jayunit100.apache@gmail.com <ma...@gmail.com> >:
>
> You can easily share the artifacts with a docker shared volume
>
> in the container "EXPORT M2_HOME=/container/m2/"
>
> follwed by
>
> "docker build -v ~/.m2/ /container/m2/ ........ "
>
> This will put the mvn jars into the host rather than the guest conatainer, so that they persist.
>
>

Thats not the point. Containers are not stateless any more.

Olaf

Re: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?

Posted by Evans Ye <ev...@apache.org>.

I thnk it's not a problem that container is not stateless. No matter how we
should have CI jobs that builds all the artifacts and store them as
official repos.
You point out an important thing that is the mvn install is the key feature
to propergate self patched components around. If we disable this than
there's no reason to build jars by ourselves. I'm +1 to option 2.
2015年6月19日 上午5:59於 "Olaf Flebbe" <of...@oflebbe.de>寫道：

>
> > Am 18.06.2015 um 23:57 schrieb jay vyas <ja...@gmail.com>:
> >
> > You can easily share the artifacts with a docker shared volume
> >
> > in the container "EXPORT M2_HOME=/container/m2/"
> >
> > follwed by
> >
> > "docker build -v ~/.m2/ /container/m2/ ........ "
> >
> > This will put the mvn jars into the host rather than the guest
> conatainer, so that they persist.
> >
> >
>
> Thats not the point. Containers are not stateless any more.
>
> Olaf
>

Re: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?

Posted by Olaf Flebbe <of...@oflebbe.de>.

> Am 18.06.2015 um 23:57 schrieb jay vyas <ja...@gmail.com>:
> 
> You can easily share the artifacts with a docker shared volume
> 
> in the container "EXPORT M2_HOME=/container/m2/"
> 
> follwed by
> 
> "docker build -v ~/.m2/ /container/m2/ ........ "
> 
> This will put the mvn jars into the host rather than the guest conatainer, so that they persist.
> 
> 

Thats not the point. Containers are not stateless any more.

Olaf

Re: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?

Posted by jay vyas <ja...@gmail.com>.

You can easily share the artifacts with a docker shared volume

in the container "EXPORT M2_HOME=/container/m2/"

follwed by

"docker build -v ~/.m2/ /container/m2/ ........ "

This will put the mvn jars into the host rather than the guest conatainer,
so that they persist.




On Thu, Jun 18, 2015 at 5:32 PM, Olaf Flebbe <of...@oflebbe.de> wrote:

> Thanks Nate
>
> for this focused writeup!
>
> Yeah maybe it is time to reboot our brains ...
>
> Additionaly to the points of nate I would like to attack this in bigtop
> 1.1.0:
>
> …………..
> Building from source or downloading ?
> ……………
>
> However we have a substancial problem hidden deep in th CI „2.0“ approach
> using containers
>
> You may know that we place artifacts (i.e. jars) we built with bigtop into
> the local maven cache ~/.m2. (look for mvn install in do-component-build).
> The idea is that later maven builds will pick these artifacts and use them
> rather downloading them from maven central.
>
> Placing artifacts into ~/.m2 will not have any effect if we use CI
> containers the way we do now: The maven cache ~/.m2 is lost when the
> container ends.
>
> [This triggered misfeature in JIRA BIGTOP-1893, BTW:  gradle rpm/apt
> behaved differently from a container build with artifacts from maven
> central.]
>
> Option 1)  Remove mvn install from all do-component-builds
>
> Results:
>
> + We compile projects the way the upstream-developer does.
> - local fixes and configurations will not propagated
>
> Questions:
> If we do not try to reuse our build-artifacts within compile we have to
> ask ourself "why do we compile projects at all?“.
>
> We can build a great test wether someone else has touched / manipulated
> the maven central cache if we compare artifacts, but is this the really the
> point of compiling ourselves?
>
>
> Option 2) Use mvn install and reuse artifacts even in containers.
>
> Consequences:
>
> - Containers are not stateless any more
>
> - We have to add depencies to CI jobs so they run in order
>
> - single components may break the whole compile process.
>
> - Compile does not scale any more
>
> My Opinion:
> The way we do now "mvn install“ ,  simply tainting the maven cache seems
> not a really controlled way to propagate artifacts to me.
>
> Option 3) Use 1) but reuse artifacts in packages by placing symlinks and
> dependencies between them.
>
> - Packages will break with subtile problems if we do symlink artifacts
> from different releases.
>
> ----
> Neither Option 1, Option 2 nor Option 3 seems a clever way to fix the
> problem. Would like to hear comments regarding this issue:
>
>
> In my humble opinion we should follow Option 2 with all the grave
> consequences. But maybe reworking mvn install by placing the artifacts with
> a bigtop specific name / groupid into the maven cache and upload them to
> maven central .
>
> Olaf
>
>
>
>
>
>
>
>
>
>
> > Am 18.06.2015 um 08:26 schrieb nate@reactor8.com:
> >
> > Building on conversations pre/during/post Apachecon and looking at the
> post 1.0 bigtop focus and efforts, want to lay out a few things, get
> peoples comments.  Seems to be some consensus that the project can look
> towards serving end application/data developers more going forward, while
> continuing the tradition of the projects build/pkg/test/deploy roots.
> >
> > I have spent the past couple months, and heavily the past 3 or so weeks,
> talking to many different potential end users at meetups, conferences,
> etc.., also having some great conversations with commercial open source
> vendors that are interested in what a "future bigtop" can be and what it
> could provide to users.
> >
> > I believe we need to put some focused effort into few foundational
> things to put the project in a position to move faster and attract a wider
> range of users as well as new contributors.
> >
> > -----------
> > CI "2.0"
> > -----------
> >
> > Start of this is already underway based on the work roman started last
> year and continuing effort with new setup and enhancement on bigtop AWS
> infrastructure, Evans has been pushing this along into the 1.0 release.
> Speed of getting new packages built and up to date needs to increase so
> releases can happen at a regular clip.., even looking towards user friendly
> "ad-hoc" bigtop builds where users could quickly choose the 2,3,4,etc
> components they want and have a stack around that.
> >
> > Related to this, hoping the group can come to some idea/agreement on
> some semver style versioning for the project post 1.0.  I think this could
> set a path forward for releases that can happen faster, while not holding
> up the whole train if a single "smaller" component has a couple issues that
> cant/wont be resolved by the main stakeholders or interested parties in
> said component.  An example might be new pig or sqoop having issues.., the
> 1.2 release would still go out the door with 1.2.1 coming days/weeks later
> once new pig or sqoop was fixed up.
> >
> > ---------------------------------------------
> > Proper package repository hosting
> > ---------------------------------------------
> >
> > I put together a little test setup based on the 0.8 assets, we can
> probably build off of that with 1.0, working towards the CI automatically
> posting nightly (or just-in-time) builds off latest so people can play
> around.  Debs/rpms seem should be the focal pt of output for the project
> assets, everything else is additive and builds off of that (ie: user who
> says "I am not a puppet shop so don’t care about the modules.., but do my
> own automation and if you point me to some sane repositories I can do the
> rest myself with couple decent getting started steps")
> >
> > -----------------------------------------------------------------
> > Greatly increasing the UX and getting started content
> > -----------------------------------------------------------------
> >
> > This is the big one.., new website, focused docs and getting started
> examples for end users, other specific content for contributors.  I will be
> starting to put some cycles into new website jira probably starting next
> week, will try to scoot through it and start posting some working examples
> for feedback once something basic is in place.  For those interested in
> helping out on doc work and getting started content let me know.., looking
> at subjects like:
> >
> >   -Developer getting started
> >         -using the packages
> >         -using puppet modules and deployment options
> >         -deploying reference example stacks
> >         -setting up your own big data CI
> >         -etc
> >
> >   -Contributing to Bigtop:
> >         -how to submit your first patch/pull-request
> >         -adding new component (step by step, canned learning component
> example, etc)
> >         -adding tests to an existing component (steps, canned hello
> world example test, etc)
> >         -writing your own test data generator
> >         -etc
> >
> > Those are some thoughts and couple initial focal areas that are driving
> me around bigtop participation
> >
> >
> >
> > -----Original Message-----
> > From: Andrew Purtell [mailto:apurtell@apache.org]
> > Sent: Tuesday, June 16, 2015 12:02 PM
> > To: dev@bigtop.apache.org
> > Cc: user@bigtop.apache.org
> > Subject: Re: Rebooting the conversation on the Future of bigtop:
> Abstracting the backplane ? Containers?
> >
> >> thanks andy - i agree with most of your opinions around continuing to
> > build
> > standard packages.. but can you clarify what was offensive ?  must be a
> misinterpretation somewhere.
> >
> > Sure.
> >
> > A bit offensive.
> >
> > "gridgain or spark can do what 90% of the hadoop ecosystem already does,
> supporting streams, batch,sql all in one" -> This statement deprecates the
> utility of the labors of rest of the Hadoop ecosystem in favor of Gridgain
> and Spark. As a gross generalization it's unlikely to be a helpful
> statement in any case.
> >
> > It's fine if we all have our favorites, of course. I think we're set up
> well to empirically determine winners and losers, we don't need to make
> partisan statements. Those components that get some user interest in the
> form of contributions that keep them building and happy in Bigtop will stay
> in. Those that do not get the necessary attention will have to be culled
> out over time when and if they fail to compile or pass integration tests.
> >
> >
> > On Mon, Jun 15, 2015 at 11:42 AM, jay vyas <ja...@gmail.com>
> > wrote:
> >
> >> thanks andy - i agree with most of your opinions around continuing to
> >> build standard packages.. but can you clarify what was offensive ?
> >> must be a misinterpretation somewhere.
> >>
> >> 1) To be clear, i am 100% behind supporting standard hadoop build rpms
> that
> >> we have now.   Thats the core product and will be for  the forseeable
> >> future, absolutely !
> >>
> >> 2) The idea (and its just an idea i want to throw out - to keep us on
> >> our toes), is that some folks may be interested in hacking around, in
> >> a separate branch - on some bleeding edge bigdata deployments - which
> >> attempts to incorporate resource managers and  containers as
> >> first-class citizens.
> >>
> >> Again this is all just ideas - not in any way meant to derail the
> >> packaging efforts - but rather - just to gauge folks interest level in
> >> the bleeding edge, docker, mesos, simplified  processing stacks, and so
> on.
> >>
> >>
> >>
> >> On Mon, Jun 15, 2015 at 12:39 PM, Andrew Purtell <ap...@apache.org>
> >> wrote:
> >>
> >>>> gridgain or spark can do what 90% of the hadoop ecosystem already
> >>>> does,
> >>> supporting streams, batch,sql all in one)
> >>>
> >>> If something like this becomes the official position of the Bigtop
> >>> project, some day, then it will turn off people. I can see where you
> >>> are coming from, I think. Correct me if I'm wrong: We have limited
> >>> bandwidth, we should move away from Roman et. al.'s vision of Bigtop
> >>> as an inclusive distribution of big data packages, and instead
> >>> become highly opinionated and tightly focused. If that's accurate, I
> >>> can sum up my concern as
> >>> follows: To the degree we become more opinionated, the less we may
> >>> have
> >> to
> >>> look at in terms of inclusion - both software and user communities.
> >>> For example, I find the above quoted statement a bit offensive as a
> >> participant
> >>> on not-Spark and not-Gridgain projects. I roll my eyes sometimes at
> >>> the Docker over-hype. Is there still a place for me here?
> >>>
> >>>
> >>>
> >>> On Mon, Jun 15, 2015 at 9:22 AM, jay vyas
> >>> <ja...@gmail.com>
> >>> wrote:
> >>>
> >>>> Hi folks.   Every few months, i try to reboot the conversation about
> the
> >>>> next generation of bigtop.
> >>>>
> >>>> There are 3 things which i think we should consider : A backplane
> >> (rather
> >>>> than deploy to machines, the meaning of the term "ecosystem" in a
> >>>> post-spark in-memory apacolypse, and containerization.
> >>>>
> >>>> 1) BACKPLANE: The new trend is to have a backplane that provides
> >>>> networking abstractions for you (mesos, kubernetes, yarn, and so on).
> >> Is
> >>>> it time for us to pick a resource manager?
> >>>>
> >>>> 2) ECOSYSTEM?: Nowadays folks don't necessarily need the whole
> >>>> hadoop ecosystem, and there is a huge shift to in-memory,
> >>>> monolithic stacks happening (i.e. gridgain or spark can do what 90%
> >>>> of the hadoop
> >> ecosystem
> >>>> already does, supporting streams, batch,sql all in one).
> >>>>
> >>>> 3) CONTAINERS:  we are doing a great job w/ docker in our build infra.
> >>>> Is it time to start experimenting with running docker tarballs ?
> >>>>
> >>>> Combining 1+2+3 - i could see a useful bigdata upstream distro
> >>>> which (1) just installed an HCFS implementation (gluster,HDFS,...)
> >>>> along side,
> >> say,
> >>>> (2) mesos as a backplane for the tooling for [[ hbase + spark +
> >>>> ignite
> >> ]]
> >>>> --- and then (3) do the integration testing of available
> >>>> mesos-framework plugins for ignite and spark underneath.  If other
> >>>> folks are interested, maybe we could create the "1x" or "in-memory"
> >>>> branch to start hacking
> >> on it
> >>>> sometime ?    Maybe even bring the flink guys in as well, as they are
> >>>> interested in bigtop packaging.
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> jay vyas
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Best regards,
> >>>
> >>>   - Andy
> >>>
> >>> Problems worthy of attack prove their worth by hitting back. - Piet
> >>> Hein (via Tom White)
> >>>
> >>
> >>
> >>
> >> --
> >> jay vyas
> >>
> >
> >
> >
> > --
> > Best regards,
> >
> >   - Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
> >
>
>


-- 
jay vyas

Re: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?

Posted by Olaf Flebbe <of...@oflebbe.de>.

Thanks Nate

for this focused writeup!

Yeah maybe it is time to reboot our brains ...

Additionaly to the points of nate I would like to attack this in bigtop 1.1.0:

…………..
Building from source or downloading ?
……………

However we have a substancial problem hidden deep in th CI „2.0“ approach using containers

You may know that we place artifacts (i.e. jars) we built with bigtop into the local maven cache ~/.m2. (look for mvn install in do-component-build). The idea is that later maven builds will pick these artifacts and use them rather downloading them from maven central.

Placing artifacts into ~/.m2 will not have any effect if we use CI containers the way we do now: The maven cache ~/.m2 is lost when the container ends.

[This triggered misfeature in JIRA BIGTOP-1893, BTW:  gradle rpm/apt behaved differently from a container build with artifacts from maven central.]

Option 1)  Remove mvn install from all do-component-builds

Results:

+ We compile projects the way the upstream-developer does.
- local fixes and configurations will not propagated

Questions:
If we do not try to reuse our build-artifacts within compile we have to ask ourself "why do we compile projects at all?“.

We can build a great test wether someone else has touched / manipulated the maven central cache if we compare artifacts, but is this the really the point of compiling ourselves?


Option 2) Use mvn install and reuse artifacts even in containers.

Consequences:

- Containers are not stateless any more

- We have to add depencies to CI jobs so they run in order

- single components may break the whole compile process.

- Compile does not scale any more

My Opinion:
The way we do now "mvn install“ ,  simply tainting the maven cache seems not a really controlled way to propagate artifacts to me.

Option 3) Use 1) but reuse artifacts in packages by placing symlinks and dependencies between them.

- Packages will break with subtile problems if we do symlink artifacts  from different releases.

----
Neither Option 1, Option 2 nor Option 3 seems a clever way to fix the problem. Would like to hear comments regarding this issue:


In my humble opinion we should follow Option 2 with all the grave consequences. But maybe reworking mvn install by placing the artifacts with a bigtop specific name / groupid into the maven cache and upload them to maven central .

Olaf










> Am 18.06.2015 um 08:26 schrieb nate@reactor8.com:
> 
> Building on conversations pre/during/post Apachecon and looking at the post 1.0 bigtop focus and efforts, want to lay out a few things, get peoples comments.  Seems to be some consensus that the project can look towards serving end application/data developers more going forward, while continuing the tradition of the projects build/pkg/test/deploy roots.
> 
> I have spent the past couple months, and heavily the past 3 or so weeks, talking to many different potential end users at meetups, conferences, etc.., also having some great conversations with commercial open source vendors that are interested in what a "future bigtop" can be and what it could provide to users.
> 
> I believe we need to put some focused effort into few foundational things to put the project in a position to move faster and attract a wider range of users as well as new contributors.
> 
> -----------
> CI "2.0"
> -----------
> 
> Start of this is already underway based on the work roman started last year and continuing effort with new setup and enhancement on bigtop AWS infrastructure, Evans has been pushing this along into the 1.0 release.  Speed of getting new packages built and up to date needs to increase so releases can happen at a regular clip.., even looking towards user friendly "ad-hoc" bigtop builds where users could quickly choose the 2,3,4,etc components they want and have a stack around that.
> 
> Related to this, hoping the group can come to some idea/agreement on some semver style versioning for the project post 1.0.  I think this could set a path forward for releases that can happen faster, while not holding up the whole train if a single "smaller" component has a couple issues that cant/wont be resolved by the main stakeholders or interested parties in said component.  An example might be new pig or sqoop having issues.., the 1.2 release would still go out the door with 1.2.1 coming days/weeks later once new pig or sqoop was fixed up.
> 
> ---------------------------------------------
> Proper package repository hosting
> ---------------------------------------------
> 
> I put together a little test setup based on the 0.8 assets, we can probably build off of that with 1.0, working towards the CI automatically posting nightly (or just-in-time) builds off latest so people can play around.  Debs/rpms seem should be the focal pt of output for the project assets, everything else is additive and builds off of that (ie: user who says "I am not a puppet shop so don’t care about the modules.., but do my own automation and if you point me to some sane repositories I can do the rest myself with couple decent getting started steps")
> 
> -----------------------------------------------------------------
> Greatly increasing the UX and getting started content
> -----------------------------------------------------------------
> 
> This is the big one.., new website, focused docs and getting started examples for end users, other specific content for contributors.  I will be starting to put some cycles into new website jira probably starting next week, will try to scoot through it and start posting some working examples for feedback once something basic is in place.  For those interested in helping out on doc work and getting started content let me know.., looking at subjects like:
> 
>   -Developer getting started
>         -using the packages
>         -using puppet modules and deployment options
>         -deploying reference example stacks
>         -setting up your own big data CI
>         -etc
> 
>   -Contributing to Bigtop:
>         -how to submit your first patch/pull-request
>         -adding new component (step by step, canned learning component example, etc)
>         -adding tests to an existing component (steps, canned hello world example test, etc)
>         -writing your own test data generator
>         -etc
> 
> Those are some thoughts and couple initial focal areas that are driving me around bigtop participation
> 
> 
> 
> -----Original Message-----
> From: Andrew Purtell [mailto:apurtell@apache.org]
> Sent: Tuesday, June 16, 2015 12:02 PM
> To: dev@bigtop.apache.org
> Cc: user@bigtop.apache.org
> Subject: Re: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?
> 
>> thanks andy - i agree with most of your opinions around continuing to
> build
> standard packages.. but can you clarify what was offensive ?  must be a misinterpretation somewhere.
> 
> Sure.
> 
> A bit offensive.
> 
> "gridgain or spark can do what 90% of the hadoop ecosystem already does, supporting streams, batch,sql all in one" -> This statement deprecates the utility of the labors of rest of the Hadoop ecosystem in favor of Gridgain and Spark. As a gross generalization it's unlikely to be a helpful statement in any case.
> 
> It's fine if we all have our favorites, of course. I think we're set up well to empirically determine winners and losers, we don't need to make partisan statements. Those components that get some user interest in the form of contributions that keep them building and happy in Bigtop will stay in. Those that do not get the necessary attention will have to be culled out over time when and if they fail to compile or pass integration tests.
> 
> 
> On Mon, Jun 15, 2015 at 11:42 AM, jay vyas <ja...@gmail.com>
> wrote:
> 
>> thanks andy - i agree with most of your opinions around continuing to
>> build standard packages.. but can you clarify what was offensive ?
>> must be a misinterpretation somewhere.
>> 
>> 1) To be clear, i am 100% behind supporting standard hadoop build rpms that
>> we have now.   Thats the core product and will be for  the forseeable
>> future, absolutely !
>> 
>> 2) The idea (and its just an idea i want to throw out - to keep us on
>> our toes), is that some folks may be interested in hacking around, in
>> a separate branch - on some bleeding edge bigdata deployments - which
>> attempts to incorporate resource managers and  containers as
>> first-class citizens.
>> 
>> Again this is all just ideas - not in any way meant to derail the
>> packaging efforts - but rather - just to gauge folks interest level in
>> the bleeding edge, docker, mesos, simplified  processing stacks, and so on.
>> 
>> 
>> 
>> On Mon, Jun 15, 2015 at 12:39 PM, Andrew Purtell <ap...@apache.org>
>> wrote:
>> 
>>>> gridgain or spark can do what 90% of the hadoop ecosystem already
>>>> does,
>>> supporting streams, batch,sql all in one)
>>> 
>>> If something like this becomes the official position of the Bigtop
>>> project, some day, then it will turn off people. I can see where you
>>> are coming from, I think. Correct me if I'm wrong: We have limited
>>> bandwidth, we should move away from Roman et. al.'s vision of Bigtop
>>> as an inclusive distribution of big data packages, and instead
>>> become highly opinionated and tightly focused. If that's accurate, I
>>> can sum up my concern as
>>> follows: To the degree we become more opinionated, the less we may
>>> have
>> to
>>> look at in terms of inclusion - both software and user communities.
>>> For example, I find the above quoted statement a bit offensive as a
>> participant
>>> on not-Spark and not-Gridgain projects. I roll my eyes sometimes at
>>> the Docker over-hype. Is there still a place for me here?
>>> 
>>> 
>>> 
>>> On Mon, Jun 15, 2015 at 9:22 AM, jay vyas
>>> <ja...@gmail.com>
>>> wrote:
>>> 
>>>> Hi folks.   Every few months, i try to reboot the conversation about the
>>>> next generation of bigtop.
>>>> 
>>>> There are 3 things which i think we should consider : A backplane
>> (rather
>>>> than deploy to machines, the meaning of the term "ecosystem" in a
>>>> post-spark in-memory apacolypse, and containerization.
>>>> 
>>>> 1) BACKPLANE: The new trend is to have a backplane that provides
>>>> networking abstractions for you (mesos, kubernetes, yarn, and so on).
>> Is
>>>> it time for us to pick a resource manager?
>>>> 
>>>> 2) ECOSYSTEM?: Nowadays folks don't necessarily need the whole
>>>> hadoop ecosystem, and there is a huge shift to in-memory,
>>>> monolithic stacks happening (i.e. gridgain or spark can do what 90%
>>>> of the hadoop
>> ecosystem
>>>> already does, supporting streams, batch,sql all in one).
>>>> 
>>>> 3) CONTAINERS:  we are doing a great job w/ docker in our build infra.
>>>> Is it time to start experimenting with running docker tarballs ?
>>>> 
>>>> Combining 1+2+3 - i could see a useful bigdata upstream distro
>>>> which (1) just installed an HCFS implementation (gluster,HDFS,...)
>>>> along side,
>> say,
>>>> (2) mesos as a backplane for the tooling for [[ hbase + spark +
>>>> ignite
>> ]]
>>>> --- and then (3) do the integration testing of available
>>>> mesos-framework plugins for ignite and spark underneath.  If other
>>>> folks are interested, maybe we could create the "1x" or "in-memory"
>>>> branch to start hacking
>> on it
>>>> sometime ?    Maybe even bring the flink guys in as well, as they are
>>>> interested in bigtop packaging.
>>>> 
>>>> 
>>>> 
>>>> --
>>>> jay vyas
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Best regards,
>>> 
>>>   - Andy
>>> 
>>> Problems worthy of attack prove their worth by hitting back. - Piet
>>> Hein (via Tom White)
>>> 
>> 
>> 
>> 
>> --
>> jay vyas
>> 
> 
> 
> 
> --
> Best regards,
> 
>   - Andy
> 
> Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
>

RE: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?

Posted by na...@reactor8.com.

Building on conversations pre/during/post Apachecon and looking at the post 1.0 bigtop focus and efforts, want to lay out a few things, get peoples comments.  Seems to be some consensus that the project can look towards serving end application/data developers more going forward, while continuing the tradition of the projects build/pkg/test/deploy roots.

I have spent the past couple months, and heavily the past 3 or so weeks, talking to many different potential end users at meetups, conferences, etc.., also having some great conversations with commercial open source vendors that are interested in what a "future bigtop" can be and what it could provide to users.

I believe we need to put some focused effort into few foundational things to put the project in a position to move faster and attract a wider range of users as well as new contributors.

-----------
CI "2.0"
-----------

Start of this is already underway based on the work roman started last year and continuing effort with new setup and enhancement on bigtop AWS infrastructure, Evans has been pushing this along into the 1.0 release.  Speed of getting new packages built and up to date needs to increase so releases can happen at a regular clip.., even looking towards user friendly "ad-hoc" bigtop builds where users could quickly choose the 2,3,4,etc components they want and have a stack around that.

Related to this, hoping the group can come to some idea/agreement on some semver style versioning for the project post 1.0.  I think this could set a path forward for releases that can happen faster, while not holding up the whole train if a single "smaller" component has a couple issues that cant/wont be resolved by the main stakeholders or interested parties in said component.  An example might be new pig or sqoop having issues.., the 1.2 release would still go out the door with 1.2.1 coming days/weeks later once new pig or sqoop was fixed up.

---------------------------------------------
Proper package repository hosting
---------------------------------------------

I put together a little test setup based on the 0.8 assets, we can probably build off of that with 1.0, working towards the CI automatically posting nightly (or just-in-time) builds off latest so people can play around.  Debs/rpms seem should be the focal pt of output for the project assets, everything else is additive and builds off of that (ie: user who says "I am not a puppet shop so don’t care about the modules.., but do my own automation and if you point me to some sane repositories I can do the rest myself with couple decent getting started steps")

-----------------------------------------------------------------
Greatly increasing the UX and getting started content
-----------------------------------------------------------------

This is the big one.., new website, focused docs and getting started examples for end users, other specific content for contributors.  I will be starting to put some cycles into new website jira probably starting next week, will try to scoot through it and start posting some working examples for feedback once something basic is in place.  For those interested in helping out on doc work and getting started content let me know.., looking at subjects like:

   -Developer getting started
         -using the packages
         -using puppet modules and deployment options
         -deploying reference example stacks
         -setting up your own big data CI
         -etc

   -Contributing to Bigtop:
         -how to submit your first patch/pull-request
         -adding new component (step by step, canned learning component example, etc)
         -adding tests to an existing component (steps, canned hello world example test, etc)
         -writing your own test data generator
         -etc

Those are some thoughts and couple initial focal areas that are driving me around bigtop participation

-----Original Message-----
From: Andrew Purtell [mailto:apurtell@apache.org] 
Sent: Tuesday, June 16, 2015 12:02 PM
To: dev@bigtop.apache.org
Cc: user@bigtop.apache.org
Subject: Re: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?

> thanks andy - i agree with most of your opinions around continuing to
build
standard packages.. but can you clarify what was offensive ?  must be a misinterpretation somewhere.

Sure.

A bit offensive.

"gridgain or spark can do what 90% of the hadoop ecosystem already does, supporting streams, batch,sql all in one" -> This statement deprecates the utility of the labors of rest of the Hadoop ecosystem in favor of Gridgain and Spark. As a gross generalization it's unlikely to be a helpful statement in any case.

It's fine if we all have our favorites, of course. I think we're set up well to empirically determine winners and losers, we don't need to make partisan statements. Those components that get some user interest in the form of contributions that keep them building and happy in Bigtop will stay in. Those that do not get the necessary attention will have to be culled out over time when and if they fail to compile or pass integration tests.

On Mon, Jun 15, 2015 at 11:42 AM, jay vyas <ja...@gmail.com>
wrote:

> thanks andy - i agree with most of your opinions around continuing to 
> build standard packages.. but can you clarify what was offensive ?  
> must be a misinterpretation somewhere.
>
> 1) To be clear, i am 100% behind supporting standard hadoop build rpms that
> we have now.   Thats the core product and will be for  the forseeable
> future, absolutely !
>
> 2) The idea (and its just an idea i want to throw out - to keep us on 
> our toes), is that some folks may be interested in hacking around, in 
> a separate branch - on some bleeding edge bigdata deployments - which 
> attempts to incorporate resource managers and  containers as 
> first-class citizens.
>
> Again this is all just ideas - not in any way meant to derail the 
> packaging efforts - but rather - just to gauge folks interest level in 
> the bleeding edge, docker, mesos, simplified  processing stacks, and so on.
>
>
>
> On Mon, Jun 15, 2015 at 12:39 PM, Andrew Purtell <ap...@apache.org>
> wrote:
>
> > > gridgain or spark can do what 90% of the hadoop ecosystem already 
> > > does,
> > supporting streams, batch,sql all in one)
> >
> > If something like this becomes the official position of the Bigtop 
> > project, some day, then it will turn off people. I can see where you 
> > are coming from, I think. Correct me if I'm wrong: We have limited 
> > bandwidth, we should move away from Roman et. al.'s vision of Bigtop 
> > as an inclusive distribution of big data packages, and instead 
> > become highly opinionated and tightly focused. If that's accurate, I 
> > can sum up my concern as
> > follows: To the degree we become more opinionated, the less we may 
> > have
> to
> > look at in terms of inclusion - both software and user communities. 
> > For example, I find the above quoted statement a bit offensive as a
> participant
> > on not-Spark and not-Gridgain projects. I roll my eyes sometimes at 
> > the Docker over-hype. Is there still a place for me here?
> >
> >
> >
> > On Mon, Jun 15, 2015 at 9:22 AM, jay vyas 
> > <ja...@gmail.com>
> > wrote:
> >
> >> Hi folks.   Every few months, i try to reboot the conversation about the
> >> next generation of bigtop.
> >>
> >> There are 3 things which i think we should consider : A backplane
> (rather
> >> than deploy to machines, the meaning of the term "ecosystem" in a 
> >> post-spark in-memory apacolypse, and containerization.
> >>
> >> 1) BACKPLANE: The new trend is to have a backplane that provides 
> >> networking abstractions for you (mesos, kubernetes, yarn, and so on).
>  Is
> >> it time for us to pick a resource manager?
> >>
> >> 2) ECOSYSTEM?: Nowadays folks don't necessarily need the whole 
> >> hadoop ecosystem, and there is a huge shift to in-memory, 
> >> monolithic stacks happening (i.e. gridgain or spark can do what 90% 
> >> of the hadoop
> ecosystem
> >> already does, supporting streams, batch,sql all in one).
> >>
> >> 3) CONTAINERS:  we are doing a great job w/ docker in our build infra.
> >> Is it time to start experimenting with running docker tarballs ?
> >>
> >> Combining 1+2+3 - i could see a useful bigdata upstream distro 
> >> which (1) just installed an HCFS implementation (gluster,HDFS,...) 
> >> along side,
> say,
> >> (2) mesos as a backplane for the tooling for [[ hbase + spark + 
> >> ignite
> ]]
> >> --- and then (3) do the integration testing of available 
> >> mesos-framework plugins for ignite and spark underneath.  If other 
> >> folks are interested, maybe we could create the "1x" or "in-memory" 
> >> branch to start hacking
> on it
> >> sometime ?    Maybe even bring the flink guys in as well, as they are
> >> interested in bigtop packaging.
> >>
> >>
> >>
> >> --
> >> jay vyas
> >>
> >
> >
> >
> > --
> > Best regards,
> >
> >    - Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet 
> > Hein (via Tom White)
> >
>
>
>
> --
> jay vyas
>

--
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)

RE: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?

Posted by na...@reactor8.com.

Building on conversations pre/during/post Apachecon and looking at the post 1.0 bigtop focus and efforts, want to lay out a few things, get peoples comments.  Seems to be some consensus that the project can look towards serving end application/data developers more going forward, while continuing the tradition of the projects build/pkg/test/deploy roots.

I have spent the past couple months, and heavily the past 3 or so weeks, talking to many different potential end users at meetups, conferences, etc.., also having some great conversations with commercial open source vendors that are interested in what a "future bigtop" can be and what it could provide to users.

I believe we need to put some focused effort into few foundational things to put the project in a position to move faster and attract a wider range of users as well as new contributors.

-----------
CI "2.0"
-----------

Start of this is already underway based on the work roman started last year and continuing effort with new setup and enhancement on bigtop AWS infrastructure, Evans has been pushing this along into the 1.0 release.  Speed of getting new packages built and up to date needs to increase so releases can happen at a regular clip.., even looking towards user friendly "ad-hoc" bigtop builds where users could quickly choose the 2,3,4,etc components they want and have a stack around that.

Related to this, hoping the group can come to some idea/agreement on some semver style versioning for the project post 1.0.  I think this could set a path forward for releases that can happen faster, while not holding up the whole train if a single "smaller" component has a couple issues that cant/wont be resolved by the main stakeholders or interested parties in said component.  An example might be new pig or sqoop having issues.., the 1.2 release would still go out the door with 1.2.1 coming days/weeks later once new pig or sqoop was fixed up.

---------------------------------------------
Proper package repository hosting
---------------------------------------------

I put together a little test setup based on the 0.8 assets, we can probably build off of that with 1.0, working towards the CI automatically posting nightly (or just-in-time) builds off latest so people can play around.  Debs/rpms seem should be the focal pt of output for the project assets, everything else is additive and builds off of that (ie: user who says "I am not a puppet shop so don’t care about the modules.., but do my own automation and if you point me to some sane repositories I can do the rest myself with couple decent getting started steps")

-----------------------------------------------------------------
Greatly increasing the UX and getting started content
-----------------------------------------------------------------

This is the big one.., new website, focused docs and getting started examples for end users, other specific content for contributors.  I will be starting to put some cycles into new website jira probably starting next week, will try to scoot through it and start posting some working examples for feedback once something basic is in place.  For those interested in helping out on doc work and getting started content let me know.., looking at subjects like:

   -Developer getting started
         -using the packages
         -using puppet modules and deployment options
         -deploying reference example stacks
         -setting up your own big data CI
         -etc

   -Contributing to Bigtop:
         -how to submit your first patch/pull-request
         -adding new component (step by step, canned learning component example, etc)
         -adding tests to an existing component (steps, canned hello world example test, etc)
         -writing your own test data generator
         -etc

Those are some thoughts and couple initial focal areas that are driving me around bigtop participation

-----Original Message-----
From: Andrew Purtell [mailto:apurtell@apache.org] 
Sent: Tuesday, June 16, 2015 12:02 PM
To: dev@bigtop.apache.org
Cc: user@bigtop.apache.org
Subject: Re: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?

> thanks andy - i agree with most of your opinions around continuing to
build
standard packages.. but can you clarify what was offensive ?  must be a misinterpretation somewhere.

Sure.

A bit offensive.

"gridgain or spark can do what 90% of the hadoop ecosystem already does, supporting streams, batch,sql all in one" -> This statement deprecates the utility of the labors of rest of the Hadoop ecosystem in favor of Gridgain and Spark. As a gross generalization it's unlikely to be a helpful statement in any case.

It's fine if we all have our favorites, of course. I think we're set up well to empirically determine winners and losers, we don't need to make partisan statements. Those components that get some user interest in the form of contributions that keep them building and happy in Bigtop will stay in. Those that do not get the necessary attention will have to be culled out over time when and if they fail to compile or pass integration tests.

On Mon, Jun 15, 2015 at 11:42 AM, jay vyas <ja...@gmail.com>
wrote:

> thanks andy - i agree with most of your opinions around continuing to 
> build standard packages.. but can you clarify what was offensive ?  
> must be a misinterpretation somewhere.
>
> 1) To be clear, i am 100% behind supporting standard hadoop build rpms that
> we have now.   Thats the core product and will be for  the forseeable
> future, absolutely !
>
> 2) The idea (and its just an idea i want to throw out - to keep us on 
> our toes), is that some folks may be interested in hacking around, in 
> a separate branch - on some bleeding edge bigdata deployments - which 
> attempts to incorporate resource managers and  containers as 
> first-class citizens.
>
> Again this is all just ideas - not in any way meant to derail the 
> packaging efforts - but rather - just to gauge folks interest level in 
> the bleeding edge, docker, mesos, simplified  processing stacks, and so on.
>
>
>
> On Mon, Jun 15, 2015 at 12:39 PM, Andrew Purtell <ap...@apache.org>
> wrote:
>
> > > gridgain or spark can do what 90% of the hadoop ecosystem already 
> > > does,
> > supporting streams, batch,sql all in one)
> >
> > If something like this becomes the official position of the Bigtop 
> > project, some day, then it will turn off people. I can see where you 
> > are coming from, I think. Correct me if I'm wrong: We have limited 
> > bandwidth, we should move away from Roman et. al.'s vision of Bigtop 
> > as an inclusive distribution of big data packages, and instead 
> > become highly opinionated and tightly focused. If that's accurate, I 
> > can sum up my concern as
> > follows: To the degree we become more opinionated, the less we may 
> > have
> to
> > look at in terms of inclusion - both software and user communities. 
> > For example, I find the above quoted statement a bit offensive as a
> participant
> > on not-Spark and not-Gridgain projects. I roll my eyes sometimes at 
> > the Docker over-hype. Is there still a place for me here?
> >
> >
> >
> > On Mon, Jun 15, 2015 at 9:22 AM, jay vyas 
> > <ja...@gmail.com>
> > wrote:
> >
> >> Hi folks.   Every few months, i try to reboot the conversation about the
> >> next generation of bigtop.
> >>
> >> There are 3 things which i think we should consider : A backplane
> (rather
> >> than deploy to machines, the meaning of the term "ecosystem" in a 
> >> post-spark in-memory apacolypse, and containerization.
> >>
> >> 1) BACKPLANE: The new trend is to have a backplane that provides 
> >> networking abstractions for you (mesos, kubernetes, yarn, and so on).
>  Is
> >> it time for us to pick a resource manager?
> >>
> >> 2) ECOSYSTEM?: Nowadays folks don't necessarily need the whole 
> >> hadoop ecosystem, and there is a huge shift to in-memory, 
> >> monolithic stacks happening (i.e. gridgain or spark can do what 90% 
> >> of the hadoop
> ecosystem
> >> already does, supporting streams, batch,sql all in one).
> >>
> >> 3) CONTAINERS:  we are doing a great job w/ docker in our build infra.
> >> Is it time to start experimenting with running docker tarballs ?
> >>
> >> Combining 1+2+3 - i could see a useful bigdata upstream distro 
> >> which (1) just installed an HCFS implementation (gluster,HDFS,...) 
> >> along side,
> say,
> >> (2) mesos as a backplane for the tooling for [[ hbase + spark + 
> >> ignite
> ]]
> >> --- and then (3) do the integration testing of available 
> >> mesos-framework plugins for ignite and spark underneath.  If other 
> >> folks are interested, maybe we could create the "1x" or "in-memory" 
> >> branch to start hacking
> on it
> >> sometime ?    Maybe even bring the flink guys in as well, as they are
> >> interested in bigtop packaging.
> >>
> >>
> >>
> >> --
> >> jay vyas
> >>
> >
> >
> >
> > --
> > Best regards,
> >
> >    - Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet 
> > Hein (via Tom White)
> >
>
>
>
> --
> jay vyas
>

--
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)

Re: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?

Posted by Andrew Purtell <ap...@apache.org>.

> thanks andy - i agree with most of your opinions around continuing to
build
standard packages.. but can you clarify what was offensive ?  must be a
misinterpretation somewhere.

Sure.

A bit offensive.

"gridgain or spark can do what 90% of the hadoop ecosystem already does,
supporting streams, batch,sql all in one" -> This statement deprecates the
utility of the labors of rest of the Hadoop ecosystem in favor of Gridgain
and Spark. As a gross generalization it's unlikely to be a helpful
statement in any case.

It's fine if we all have our favorites, of course. I think we're set up
well to empirically determine winners and losers, we don't need to make
partisan statements. Those components that get some user interest in the
form of contributions that keep them building and happy in Bigtop will stay
in. Those that do not get the necessary attention will have to be culled
out over time when and if they fail to compile or pass integration tests.


On Mon, Jun 15, 2015 at 11:42 AM, jay vyas <ja...@gmail.com>
wrote:

> thanks andy - i agree with most of your opinions around continuing to build
> standard packages.. but can you clarify what was offensive ?  must be a
> misinterpretation somewhere.
>
> 1) To be clear, i am 100% behind supporting standard hadoop build rpms that
> we have now.   Thats the core product and will be for  the forseeable
> future, absolutely !
>
> 2) The idea (and its just an idea i want to throw out - to keep us on our
> toes), is that some folks may be interested in hacking around, in a
> separate branch - on some bleeding edge bigdata deployments - which
> attempts to incorporate resource managers and  containers as first-class
> citizens.
>
> Again this is all just ideas - not in any way meant to derail the packaging
> efforts - but rather - just to gauge folks interest level in the bleeding
> edge, docker, mesos, simplified  processing stacks, and so on.
>
>
>
> On Mon, Jun 15, 2015 at 12:39 PM, Andrew Purtell <ap...@apache.org>
> wrote:
>
> > > gridgain or spark can do what 90% of the hadoop ecosystem already does,
> > supporting streams, batch,sql all in one)
> >
> > If something like this becomes the official position of the Bigtop
> > project, some day, then it will turn off people. I can see where you are
> > coming from, I think. Correct me if I'm wrong: We have limited bandwidth,
> > we should move away from Roman et. al.'s vision of Bigtop as an inclusive
> > distribution of big data packages, and instead become highly opinionated
> > and tightly focused. If that's accurate, I can sum up my concern as
> > follows: To the degree we become more opinionated, the less we may have
> to
> > look at in terms of inclusion - both software and user communities. For
> > example, I find the above quoted statement a bit offensive as a
> participant
> > on not-Spark and not-Gridgain projects. I roll my eyes sometimes at the
> > Docker over-hype. Is there still a place for me here?
> >
> >
> >
> > On Mon, Jun 15, 2015 at 9:22 AM, jay vyas <ja...@gmail.com>
> > wrote:
> >
> >> Hi folks.   Every few months, i try to reboot the conversation about the
> >> next generation of bigtop.
> >>
> >> There are 3 things which i think we should consider : A backplane
> (rather
> >> than deploy to machines, the meaning of the term "ecosystem" in a
> >> post-spark in-memory apacolypse, and containerization.
> >>
> >> 1) BACKPLANE: The new trend is to have a backplane that provides
> >> networking abstractions for you (mesos, kubernetes, yarn, and so on).
>  Is
> >> it time for us to pick a resource manager?
> >>
> >> 2) ECOSYSTEM?: Nowadays folks don't necessarily need the whole hadoop
> >> ecosystem, and there is a huge shift to in-memory, monolithic stacks
> >> happening (i.e. gridgain or spark can do what 90% of the hadoop
> ecosystem
> >> already does, supporting streams, batch,sql all in one).
> >>
> >> 3) CONTAINERS:  we are doing a great job w/ docker in our build infra.
> >> Is it time to start experimenting with running docker tarballs ?
> >>
> >> Combining 1+2+3 - i could see a useful bigdata upstream distro which (1)
> >> just installed an HCFS implementation (gluster,HDFS,...) along side,
> say,
> >> (2) mesos as a backplane for the tooling for [[ hbase + spark + ignite
> ]]
> >> --- and then (3) do the integration testing of available mesos-framework
> >> plugins for ignite and spark underneath.  If other folks are interested,
> >> maybe we could create the "1x" or "in-memory" branch to start hacking
> on it
> >> sometime ?    Maybe even bring the flink guys in as well, as they are
> >> interested in bigtop packaging.
> >>
> >>
> >>
> >> --
> >> jay vyas
> >>
> >
> >
> >
> > --
> > Best regards,
> >
> >    - Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > (via Tom White)
> >
>
>
>
> --
> jay vyas
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?

Posted by Andrew Purtell <ap...@apache.org>.

> thanks andy - i agree with most of your opinions around continuing to
build
standard packages.. but can you clarify what was offensive ?  must be a
misinterpretation somewhere.

Sure.

A bit offensive.

"gridgain or spark can do what 90% of the hadoop ecosystem already does,
supporting streams, batch,sql all in one" -> This statement deprecates the
utility of the labors of rest of the Hadoop ecosystem in favor of Gridgain
and Spark. As a gross generalization it's unlikely to be a helpful
statement in any case.

It's fine if we all have our favorites, of course. I think we're set up
well to empirically determine winners and losers, we don't need to make
partisan statements. Those components that get some user interest in the
form of contributions that keep them building and happy in Bigtop will stay
in. Those that do not get the necessary attention will have to be culled
out over time when and if they fail to compile or pass integration tests.


On Mon, Jun 15, 2015 at 11:42 AM, jay vyas <ja...@gmail.com>
wrote:

> thanks andy - i agree with most of your opinions around continuing to build
> standard packages.. but can you clarify what was offensive ?  must be a
> misinterpretation somewhere.
>
> 1) To be clear, i am 100% behind supporting standard hadoop build rpms that
> we have now.   Thats the core product and will be for  the forseeable
> future, absolutely !
>
> 2) The idea (and its just an idea i want to throw out - to keep us on our
> toes), is that some folks may be interested in hacking around, in a
> separate branch - on some bleeding edge bigdata deployments - which
> attempts to incorporate resource managers and  containers as first-class
> citizens.
>
> Again this is all just ideas - not in any way meant to derail the packaging
> efforts - but rather - just to gauge folks interest level in the bleeding
> edge, docker, mesos, simplified  processing stacks, and so on.
>
>
>
> On Mon, Jun 15, 2015 at 12:39 PM, Andrew Purtell <ap...@apache.org>
> wrote:
>
> > > gridgain or spark can do what 90% of the hadoop ecosystem already does,
> > supporting streams, batch,sql all in one)
> >
> > If something like this becomes the official position of the Bigtop
> > project, some day, then it will turn off people. I can see where you are
> > coming from, I think. Correct me if I'm wrong: We have limited bandwidth,
> > we should move away from Roman et. al.'s vision of Bigtop as an inclusive
> > distribution of big data packages, and instead become highly opinionated
> > and tightly focused. If that's accurate, I can sum up my concern as
> > follows: To the degree we become more opinionated, the less we may have
> to
> > look at in terms of inclusion - both software and user communities. For
> > example, I find the above quoted statement a bit offensive as a
> participant
> > on not-Spark and not-Gridgain projects. I roll my eyes sometimes at the
> > Docker over-hype. Is there still a place for me here?
> >
> >
> >
> > On Mon, Jun 15, 2015 at 9:22 AM, jay vyas <ja...@gmail.com>
> > wrote:
> >
> >> Hi folks.   Every few months, i try to reboot the conversation about the
> >> next generation of bigtop.
> >>
> >> There are 3 things which i think we should consider : A backplane
> (rather
> >> than deploy to machines, the meaning of the term "ecosystem" in a
> >> post-spark in-memory apacolypse, and containerization.
> >>
> >> 1) BACKPLANE: The new trend is to have a backplane that provides
> >> networking abstractions for you (mesos, kubernetes, yarn, and so on).
>  Is
> >> it time for us to pick a resource manager?
> >>
> >> 2) ECOSYSTEM?: Nowadays folks don't necessarily need the whole hadoop
> >> ecosystem, and there is a huge shift to in-memory, monolithic stacks
> >> happening (i.e. gridgain or spark can do what 90% of the hadoop
> ecosystem
> >> already does, supporting streams, batch,sql all in one).
> >>
> >> 3) CONTAINERS:  we are doing a great job w/ docker in our build infra.
> >> Is it time to start experimenting with running docker tarballs ?
> >>
> >> Combining 1+2+3 - i could see a useful bigdata upstream distro which (1)
> >> just installed an HCFS implementation (gluster,HDFS,...) along side,
> say,
> >> (2) mesos as a backplane for the tooling for [[ hbase + spark + ignite
> ]]
> >> --- and then (3) do the integration testing of available mesos-framework
> >> plugins for ignite and spark underneath.  If other folks are interested,
> >> maybe we could create the "1x" or "in-memory" branch to start hacking
> on it
> >> sometime ?    Maybe even bring the flink guys in as well, as they are
> >> interested in bigtop packaging.
> >>
> >>
> >>
> >> --
> >> jay vyas
> >>
> >
> >
> >
> > --
> > Best regards,
> >
> >    - Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > (via Tom White)
> >
>
>
>
> --
> jay vyas
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?

Posted by jay vyas <ja...@gmail.com>.

thanks andy - i agree with most of your opinions around continuing to build
standard packages.. but can you clarify what was offensive ?  must be a
misinterpretation somewhere.

1) To be clear, i am 100% behind supporting standard hadoop build rpms that
we have now.   Thats the core product and will be for  the forseeable
future, absolutely !

2) The idea (and its just an idea i want to throw out - to keep us on our
toes), is that some folks may be interested in hacking around, in a
separate branch - on some bleeding edge bigdata deployments - which
attempts to incorporate resource managers and  containers as first-class
citizens.

Again this is all just ideas - not in any way meant to derail the packaging
efforts - but rather - just to gauge folks interest level in the bleeding
edge, docker, mesos, simplified  processing stacks, and so on.



On Mon, Jun 15, 2015 at 12:39 PM, Andrew Purtell <ap...@apache.org>
wrote:

> > gridgain or spark can do what 90% of the hadoop ecosystem already does,
> supporting streams, batch,sql all in one)
>
> If something like this becomes the official position of the Bigtop
> project, some day, then it will turn off people. I can see where you are
> coming from, I think. Correct me if I'm wrong: We have limited bandwidth,
> we should move away from Roman et. al.'s vision of Bigtop as an inclusive
> distribution of big data packages, and instead become highly opinionated
> and tightly focused. If that's accurate, I can sum up my concern as
> follows: To the degree we become more opinionated, the less we may have to
> look at in terms of inclusion - both software and user communities. For
> example, I find the above quoted statement a bit offensive as a participant
> on not-Spark and not-Gridgain projects. I roll my eyes sometimes at the
> Docker over-hype. Is there still a place for me here?
>
>
>
> On Mon, Jun 15, 2015 at 9:22 AM, jay vyas <ja...@gmail.com>
> wrote:
>
>> Hi folks.   Every few months, i try to reboot the conversation about the
>> next generation of bigtop.
>>
>> There are 3 things which i think we should consider : A backplane (rather
>> than deploy to machines, the meaning of the term "ecosystem" in a
>> post-spark in-memory apacolypse, and containerization.
>>
>> 1) BACKPLANE: The new trend is to have a backplane that provides
>> networking abstractions for you (mesos, kubernetes, yarn, and so on).   Is
>> it time for us to pick a resource manager?
>>
>> 2) ECOSYSTEM?: Nowadays folks don't necessarily need the whole hadoop
>> ecosystem, and there is a huge shift to in-memory, monolithic stacks
>> happening (i.e. gridgain or spark can do what 90% of the hadoop ecosystem
>> already does, supporting streams, batch,sql all in one).
>>
>> 3) CONTAINERS:  we are doing a great job w/ docker in our build infra.
>> Is it time to start experimenting with running docker tarballs ?
>>
>> Combining 1+2+3 - i could see a useful bigdata upstream distro which (1)
>> just installed an HCFS implementation (gluster,HDFS,...) along side, say,
>> (2) mesos as a backplane for the tooling for [[ hbase + spark + ignite ]]
>> --- and then (3) do the integration testing of available mesos-framework
>> plugins for ignite and spark underneath.  If other folks are interested,
>> maybe we could create the "1x" or "in-memory" branch to start hacking on it
>> sometime ?    Maybe even bring the flink guys in as well, as they are
>> interested in bigtop packaging.
>>
>>
>>
>> --
>> jay vyas
>>
>
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>



-- 
jay vyas

Re: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?

Posted by jay vyas <ja...@gmail.com>.

thanks andy - i agree with most of your opinions around continuing to build
standard packages.. but can you clarify what was offensive ?  must be a
misinterpretation somewhere.

1) To be clear, i am 100% behind supporting standard hadoop build rpms that
we have now.   Thats the core product and will be for  the forseeable
future, absolutely !

2) The idea (and its just an idea i want to throw out - to keep us on our
toes), is that some folks may be interested in hacking around, in a
separate branch - on some bleeding edge bigdata deployments - which
attempts to incorporate resource managers and  containers as first-class
citizens.

Again this is all just ideas - not in any way meant to derail the packaging
efforts - but rather - just to gauge folks interest level in the bleeding
edge, docker, mesos, simplified  processing stacks, and so on.



On Mon, Jun 15, 2015 at 12:39 PM, Andrew Purtell <ap...@apache.org>
wrote:

> > gridgain or spark can do what 90% of the hadoop ecosystem already does,
> supporting streams, batch,sql all in one)
>
> If something like this becomes the official position of the Bigtop
> project, some day, then it will turn off people. I can see where you are
> coming from, I think. Correct me if I'm wrong: We have limited bandwidth,
> we should move away from Roman et. al.'s vision of Bigtop as an inclusive
> distribution of big data packages, and instead become highly opinionated
> and tightly focused. If that's accurate, I can sum up my concern as
> follows: To the degree we become more opinionated, the less we may have to
> look at in terms of inclusion - both software and user communities. For
> example, I find the above quoted statement a bit offensive as a participant
> on not-Spark and not-Gridgain projects. I roll my eyes sometimes at the
> Docker over-hype. Is there still a place for me here?
>
>
>
> On Mon, Jun 15, 2015 at 9:22 AM, jay vyas <ja...@gmail.com>
> wrote:
>
>> Hi folks.   Every few months, i try to reboot the conversation about the
>> next generation of bigtop.
>>
>> There are 3 things which i think we should consider : A backplane (rather
>> than deploy to machines, the meaning of the term "ecosystem" in a
>> post-spark in-memory apacolypse, and containerization.
>>
>> 1) BACKPLANE: The new trend is to have a backplane that provides
>> networking abstractions for you (mesos, kubernetes, yarn, and so on).   Is
>> it time for us to pick a resource manager?
>>
>> 2) ECOSYSTEM?: Nowadays folks don't necessarily need the whole hadoop
>> ecosystem, and there is a huge shift to in-memory, monolithic stacks
>> happening (i.e. gridgain or spark can do what 90% of the hadoop ecosystem
>> already does, supporting streams, batch,sql all in one).
>>
>> 3) CONTAINERS:  we are doing a great job w/ docker in our build infra.
>> Is it time to start experimenting with running docker tarballs ?
>>
>> Combining 1+2+3 - i could see a useful bigdata upstream distro which (1)
>> just installed an HCFS implementation (gluster,HDFS,...) along side, say,
>> (2) mesos as a backplane for the tooling for [[ hbase + spark + ignite ]]
>> --- and then (3) do the integration testing of available mesos-framework
>> plugins for ignite and spark underneath.  If other folks are interested,
>> maybe we could create the "1x" or "in-memory" branch to start hacking on it
>> sometime ?    Maybe even bring the flink guys in as well, as they are
>> interested in bigtop packaging.
>>
>>
>>
>> --
>> jay vyas
>>
>
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>



-- 
jay vyas

Re: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?

Posted by Andrew Purtell <ap...@apache.org>.

> gridgain or spark can do what 90% of the hadoop ecosystem already does,
supporting streams, batch,sql all in one)

If something like this becomes the official position of the Bigtop project,
some day, then it will turn off people. I can see where you are coming
from, I think. Correct me if I'm wrong: We have limited bandwidth, we
should move away from Roman et. al.'s vision of Bigtop as an inclusive
distribution of big data packages, and instead become highly opinionated
and tightly focused. If that's accurate, I can sum up my concern as
follows: To the degree we become more opinionated, the less we may have to
look at in terms of inclusion - both software and user communities. For
example, I find the above quoted statement a bit offensive as a participant
on not-Spark and not-Gridgain projects. I roll my eyes sometimes at the
Docker over-hype. Is there still a place for me here?

On Mon, Jun 15, 2015 at 9:22 AM, jay vyas <ja...@gmail.com>
wrote:

> Hi folks.   Every few months, i try to reboot the conversation about the
> next generation of bigtop.
>
> There are 3 things which i think we should consider : A backplane (rather
> than deploy to machines, the meaning of the term "ecosystem" in a
> post-spark in-memory apacolypse, and containerization.
>
> 1) BACKPLANE: The new trend is to have a backplane that provides
> networking abstractions for you (mesos, kubernetes, yarn, and so on).   Is
> it time for us to pick a resource manager?
>
> 2) ECOSYSTEM?: Nowadays folks don't necessarily need the whole hadoop
> ecosystem, and there is a huge shift to in-memory, monolithic stacks
> happening (i.e. gridgain or spark can do what 90% of the hadoop ecosystem
> already does, supporting streams, batch,sql all in one).
>
> 3) CONTAINERS:  we are doing a great job w/ docker in our build infra.  Is
> it time to start experimenting with running docker tarballs ?
>
> Combining 1+2+3 - i could see a useful bigdata upstream distro which (1)
> just installed an HCFS implementation (gluster,HDFS,...) along side, say,
> (2) mesos as a backplane for the tooling for [[ hbase + spark + ignite ]]
> --- and then (3) do the integration testing of available mesos-framework
> plugins for ignite and spark underneath.  If other folks are interested,
> maybe we could create the "1x" or "in-memory" branch to start hacking on it
> sometime ?    Maybe even bring the flink guys in as well, as they are
> interested in bigtop packaging.
>
>
>
> --
> jay vyas
>

-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?

Posted by Roman Shaposhnik <ro...@shaposhnik.org>.

On Mon, Jun 15, 2015 at 9:22 AM, jay vyas <ja...@gmail.com> wrote:
> Hi folks.   Every few months, i try to reboot the conversation about the
> next generation of bigtop.
>
> There are 3 things which i think we should consider : A backplane (rather
> than deploy to machines, the meaning of the term "ecosystem" in a post-spark
> in-memory apacolypse, and containerization.
>
> 1) BACKPLANE: The new trend is to have a backplane that provides networking
> abstractions for you (mesos, kubernetes, yarn, and so on).   Is it time for
> us to pick a resource manager?

Let me rephrase the above and see if we're talking about the same thing. To
me your question is really about "what does a datacenter look like to Bigtop".
Today a datacenter looks to Bigtop as a bunch of individual nodes running
some kind of a Linux distribution. What you seem to be asking is that whether
it is time for us to embrace the vision of a datacenter that looks like mesos,
etc. Correct?

Also, I don't think you're suggesting that we drop the bread-n-butter of Bigtop,
but I still need to make sure.

> 2) ECOSYSTEM?: Nowadays folks don't necessarily need the whole hadoop
> ecosystem, and there is a huge shift to in-memory, monolithic stacks
> happening (i.e. gridgain or spark can do what 90% of the hadoop ecosystem
> already does, supporting streams, batch,sql all in one).

Correct. That said, I'm not sure what it means for Bigtop.

> 3) CONTAINERS:  we are doing a great job w/ docker in our build infra.  Is
> it time to start experimenting with running docker tarballs ?

I think it is time, but

> Combining 1+2+3 - i could see a useful bigdata upstream distro which (1)
> just installed an HCFS implementation (gluster,HDFS,...) along side, say,
> (2) mesos as a backplane for the tooling for [[ hbase + spark + ignite ]]
> --- and then (3) do the integration testing of available mesos-framework
> plugins for ignite and spark underneath.  If other folks are interested,
> maybe we could create the "1x" or "in-memory" branch to start hacking on it
> sometime ?    Maybe even bring the flink guys in as well, as they are
> interested in bigtop packaging.

I'm actually very curious about use cases that folks might have around
traditional Hadoop Distributions. What you're articulating above seems
like one of those use cases, but at this point I'm sort of lost as to
what's the most common use case.

Thanks,
Roman.

Re: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?

Posted by Andrew Purtell <ap...@apache.org>.

> gridgain or spark can do what 90% of the hadoop ecosystem already does,
supporting streams, batch,sql all in one)

If something like this becomes the official position of the Bigtop project,
some day, then it will turn off people. I can see where you are coming
from, I think. Correct me if I'm wrong: We have limited bandwidth, we
should move away from Roman et. al.'s vision of Bigtop as an inclusive
distribution of big data packages, and instead become highly opinionated
and tightly focused. If that's accurate, I can sum up my concern as
follows: To the degree we become more opinionated, the less we may have to
look at in terms of inclusion - both software and user communities. For
example, I find the above quoted statement a bit offensive as a participant
on not-Spark and not-Gridgain projects. I roll my eyes sometimes at the
Docker over-hype. Is there still a place for me here?

On Mon, Jun 15, 2015 at 9:22 AM, jay vyas <ja...@gmail.com>
wrote:

> Hi folks.   Every few months, i try to reboot the conversation about the
> next generation of bigtop.
>
> There are 3 things which i think we should consider : A backplane (rather
> than deploy to machines, the meaning of the term "ecosystem" in a
> post-spark in-memory apacolypse, and containerization.
>
> 1) BACKPLANE: The new trend is to have a backplane that provides
> networking abstractions for you (mesos, kubernetes, yarn, and so on).   Is
> it time for us to pick a resource manager?
>
> 2) ECOSYSTEM?: Nowadays folks don't necessarily need the whole hadoop
> ecosystem, and there is a huge shift to in-memory, monolithic stacks
> happening (i.e. gridgain or spark can do what 90% of the hadoop ecosystem
> already does, supporting streams, batch,sql all in one).
>
> 3) CONTAINERS:  we are doing a great job w/ docker in our build infra.  Is
> it time to start experimenting with running docker tarballs ?
>
> Combining 1+2+3 - i could see a useful bigdata upstream distro which (1)
> just installed an HCFS implementation (gluster,HDFS,...) along side, say,
> (2) mesos as a backplane for the tooling for [[ hbase + spark + ignite ]]
> --- and then (3) do the integration testing of available mesos-framework
> plugins for ignite and spark underneath.  If other folks are interested,
> maybe we could create the "1x" or "in-memory" branch to start hacking on it
> sometime ?    Maybe even bring the flink guys in as well, as they are
> interested in bigtop packaging.
>
>
>
> --
> jay vyas
>

-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?

Posted by Bruno Mahé <bm...@apache.org>.

On 06/15/2015 09:22 AM, jay vyas wrote:
> Hi folks.   Every few months, i try to reboot the conversation about 
> the next generation of bigtop.
>
> There are 3 things which i think we should consider : A backplane 
> (rather than deploy to machines, the meaning of the term "ecosystem" 
> in a post-spark in-memory apacolypse, and containerization.
>
> 1) BACKPLANE: The new trend is to have a backplane that provides 
> networking abstractions for you (mesos, kubernetes, yarn, and so 
> on).   Is it time for us to pick a resource manager?
>
> 2) ECOSYSTEM?: Nowadays folks don't necessarily need the whole hadoop 
> ecosystem, and there is a huge shift to in-memory, monolithic stacks 
> happening (i.e. gridgain or spark can do what 90% of the hadoop 
> ecosystem already does, supporting streams, batch,sql all in one).
>
> 3) CONTAINERS:  we are doing a great job w/ docker in our build 
> infra.  Is it time to start experimenting with running docker tarballs ?
>
> Combining 1+2+3 - i could see a useful bigdata upstream distro which 
> (1) just installed an HCFS implementation (gluster,HDFS,...) along 
> side, say, (2) mesos as a backplane for the tooling for [[ hbase + 
> spark + ignite ]] --- and then (3) do the integration testing of 
> available mesos-framework plugins for ignite and spark underneath.  If 
> other folks are interested, maybe we could create the "1x" or 
> "in-memory" branch to start hacking on it sometime ?    Maybe even 
> bring the flink guys in as well, as they are interested in bigtop 
> packaging.
>
>
>
> -- 
> jay vyas


I have roughly the same position as Andrew on that matter.

What prevents you from starting something yourself to start hacking on it?


Thanks,
Bruno