You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@bigtop.apache.org by "Youngwoo Kim (김영우)" <yw...@apache.org> on 2019/07/17 07:09:21 UTC

[DISCUSS] Roadmap for the next release.1.5? 2.0?

Hi folks,

After 1.4.0 release, there is no discussion for the next release yet. so I
believe we need to share the ideas and prioritize the items for
development.

And also https://issues.apache.org/jira/browse/BIGTOP-3123 and
https://docs.google.com/document/d/1F2Gxu8GARQDZXgqHn12LKkQ5wCV_AF4b_tVmjYB6YfA/edit
are good starting point for this discussion.

My personal preferences are:
 - Distribution based on Hadoop 3
 - Up-to-date BigPetStore
 - Software stacks and framework for streaming data & Machine Learning
 - Containers and Cloud Native: What? How?

It would be great to hear your thoughts.

Thanks,
Youngwoo

Re: [DISCUSS] Roadmap for the next release.1.5? 2.0?

Posted by Evans Ye <ev...@apache.org>.

I'm super up for Hadoop 3.

One other direction we can go for is to integrate with ML life-cycle tools
so that the bigdata -> ML experiments -> production models loop is
completed as a whole. One option is MLflow. This is similar to KubeFlow
discussion brought up by Jay however we see no feasible plan to integration
with Kubernetes and then KubeFlow. MLflow is much lightweight and probably
more easier to integrate.

I'll spend sometime to further understand MLflow first.

- Evans

Ganesh Raju <ga...@linaro.org> 於 2019年7月18日 週四 上午4:09寫道：

> +1 for Distribution based hadoop 3
> I would also prefer inclusion of Apache Ambari mpack.
>
> Thanks,
> Ganesh
>
>
> On Wed, Jul 17, 2019 at 2:09 AM Youngwoo Kim (김영우) <yw...@apache.org>
> wrote:
>
> > Hi folks,
> >
> > After 1.4.0 release, there is no discussion for the next release yet. so
> I
> > believe we need to share the ideas and prioritize the items for
> > development.
> >
> > And also https://issues.apache.org/jira/browse/BIGTOP-3123 and
> >
> >
> https://docs.google.com/document/d/1F2Gxu8GARQDZXgqHn12LKkQ5wCV_AF4b_tVmjYB6YfA/edit
> > are good starting point for this discussion.
> >
> > My personal preferences are:
> >  - Distribution based on Hadoop 3
> >  - Up-to-date BigPetStore
> >  - Software stacks and framework for streaming data & Machine Learning
> >  - Containers and Cloud Native: What? How?
> >
> > It would be great to hear your thoughts.
> >
> > Thanks,
> > Youngwoo
> >
>
>
> --
> IRC: ganeshraju@#linaro on irc.freenode.ne <http://irc.freenode.net/>t
>

Re: [DISCUSS] Roadmap for the next release.1.5? 2.0?

Posted by Ganesh Raju <ga...@linaro.org>.

+1 for Distribution based hadoop 3
I would also prefer inclusion of Apache Ambari mpack.

Thanks,
Ganesh


On Wed, Jul 17, 2019 at 2:09 AM Youngwoo Kim (김영우) <yw...@apache.org> wrote:

> Hi folks,
>
> After 1.4.0 release, there is no discussion for the next release yet. so I
> believe we need to share the ideas and prioritize the items for
> development.
>
> And also https://issues.apache.org/jira/browse/BIGTOP-3123 and
>
> https://docs.google.com/document/d/1F2Gxu8GARQDZXgqHn12LKkQ5wCV_AF4b_tVmjYB6YfA/edit
> are good starting point for this discussion.
>
> My personal preferences are:
>  - Distribution based on Hadoop 3
>  - Up-to-date BigPetStore
>  - Software stacks and framework for streaming data & Machine Learning
>  - Containers and Cloud Native: What? How?
>
> It would be great to hear your thoughts.
>
> Thanks,
> Youngwoo
>


-- 
IRC: ganeshraju@#linaro on irc.freenode.ne <http://irc.freenode.net/>t

Re: [DISCUSS] Roadmap for the next release.1.5? 2.0?

Posted by Konstantin Boudnik <co...@apache.org>.

Ugly... I wonder if it means that Hadoop development has decided to go
full-parcel (and its irks) and make Hadoop a boutique software?

At any rate, I believe this will slower the adoption of it to a point of where
it is only available through a single vendor left in the commercial market.
But that, fortunately, not my problem to resolve ;)

Cos
 
On Thu, Jul 25, 2019 at 05:27AM, Olaf Flebbe wrote:
> Hi
> 
> hadoop changed their scripts to break when not absolutely everything is in place.
> 
> The start scripts for yarn for example expect to all jars for hdfs, yarn and
> mapreduce to be present. They renamed (to be precise deprecated the old
> name) of environment variables in a way it looks more convenient, but it
> will mix up hadoop subprojects. we cannot use independent /run and /var dirs
> as the debian build rules and some other unwritten rules of linux packaging
> demand in order to be installed independently.
> 
> So hadoop has tied everything together. without rewriting their build
> scripts it is not possible to have yarn without hdfs or hdfs without yarn to
> be installed independently. 
> 
> Olaf
> 
> ps: The way pid files are written confuses systemd (or only me). effectively
> systemd somehow doesnt properly detect the state of the daemons.
> 
> Von meinem iPad gesendet
> 
> > Am 25.07.2019 um 02:11 schrieb Konstantin Boudnik <co...@apache.org>:
> > 
> > Olaf, thanks for trying... this could be exhausting, for sure ;(
> > 
> > Just to clarify: when you say "monolithic chunk" doesn't it mean that it
> > relies heavily on relative paths or something of the sort and couldn't be
> > broken into pieces because... well, it is so broken?
> > 
> > Thanks,
> >  Cos
> > 
> >> On Mon, Jul 22, 2019 at 10:34PM, Olaf Flebbe wrote:
> >> Hi,
> >> 
> >> If I would have only known that google document before ... I am
> >> asking myself if we should link it from confluence.
> >> 
> >> + hadoop 3.
> >> As you may know, I worked on it (bigtop-alpha branch), got it
> >> compiled and packaged. However, while testing, I discovered that
> >> Hadoop project built on top of some assumptions which do not hold
> >> true for current Bigtop. One of them is that Hadoop 3 is to be
> >> installed as a monolithic chunk on a filesystem. This is not what I
> >> understand as integrated into a Linux distribution. Other point is
> >> that hadoop's installation proposal does not follow the LSB:
> >> platform specfic libs are in "share" directories. I am exhausted,
> >> this is so broken.
> >> 
> >> I will stop here ... for a reason.
> >> 
> >> Olaf
> >> 
> >> 
> >> 
> >> 
> >> 
> >>> Am 17.07.19 um 09:09 schrieb Youngwoo Kim (김영우):
> >>> Hi folks,
> >>> 
> >>> After 1.4.0 release, there is no discussion for the next release yet. so I
> >>> believe we need to share the ideas and prioritize the items for
> >>> development.
> >>> 
> >>> And also https://issues.apache.org/jira/browse/BIGTOP-3123 and
> >>> https://docs.google.com/document/d/1F2Gxu8GARQDZXgqHn12LKkQ5wCV_AF4b_tVmjYB6YfA/edit
> >>> are good starting point for this discussion.
> >>> 
> >>> My personal preferences are:
> >>> - Distribution based on Hadoop 3
> >>> - Up-to-date BigPetStore
> >>> - Software stacks and framework for streaming data & Machine Learning
> >>> - Containers and Cloud Native: What? How?
> >>> 
> >>> It would be great to hear your thoughts.
> >>> 
> >>> Thanks,
> >>> Youngwoo
> >>>

Re: [DISCUSS] Roadmap for the next release.1.5? 2.0?

Posted by Olaf Flebbe <of...@oflebbe.de>.

Hi

hadoop changed their scripts to break when not absolutely everything is in place.

The start scripts for yarn for example expect to all jars for hdfs, yarn and mapreduce to be present. They renamed (to be precise deprecated the old name) of environment variables in a way it looks more convenient, but it will mix up hadoop subprojects. we cannot use independent /run and /var dirs as the debian build rules and some other unwritten rules of linux packaging demand in order to be installed independently.

So hadoop has tied everything together. without rewriting their build scripts it is not possible to have yarn without hdfs or hdfs without yarn to be installed independently. 

Olaf

ps: The way pid files are written confuses systemd (or only me). effectively systemd somehow doesnt properly detect the state of the daemons.

Von meinem iPad gesendet

> Am 25.07.2019 um 02:11 schrieb Konstantin Boudnik <co...@apache.org>:
> 
> Olaf, thanks for trying... this could be exhausting, for sure ;(
> 
> Just to clarify: when you say "monolithic chunk" doesn't it mean that it
> relies heavily on relative paths or something of the sort and couldn't be
> broken into pieces because... well, it is so broken?
> 
> Thanks,
>  Cos
> 
>> On Mon, Jul 22, 2019 at 10:34PM, Olaf Flebbe wrote:
>> Hi,
>> 
>> If I would have only known that google document before ... I am
>> asking myself if we should link it from confluence.
>> 
>> + hadoop 3.
>> As you may know, I worked on it (bigtop-alpha branch), got it
>> compiled and packaged. However, while testing, I discovered that
>> Hadoop project built on top of some assumptions which do not hold
>> true for current Bigtop. One of them is that Hadoop 3 is to be
>> installed as a monolithic chunk on a filesystem. This is not what I
>> understand as integrated into a Linux distribution. Other point is
>> that hadoop's installation proposal does not follow the LSB:
>> platform specfic libs are in "share" directories. I am exhausted,
>> this is so broken.
>> 
>> I will stop here ... for a reason.
>> 
>> Olaf
>> 
>> 
>> 
>> 
>> 
>>> Am 17.07.19 um 09:09 schrieb Youngwoo Kim (김영우):
>>> Hi folks,
>>> 
>>> After 1.4.0 release, there is no discussion for the next release yet. so I
>>> believe we need to share the ideas and prioritize the items for
>>> development.
>>> 
>>> And also https://issues.apache.org/jira/browse/BIGTOP-3123 and
>>> https://docs.google.com/document/d/1F2Gxu8GARQDZXgqHn12LKkQ5wCV_AF4b_tVmjYB6YfA/edit
>>> are good starting point for this discussion.
>>> 
>>> My personal preferences are:
>>> - Distribution based on Hadoop 3
>>> - Up-to-date BigPetStore
>>> - Software stacks and framework for streaming data & Machine Learning
>>> - Containers and Cloud Native: What? How?
>>> 
>>> It would be great to hear your thoughts.
>>> 
>>> Thanks,
>>> Youngwoo
>>>

Re: [DISCUSS] Roadmap for the next release.1.5? 2.0?

Posted by Konstantin Boudnik <co...@apache.org>.

Olaf, thanks for trying... this could be exhausting, for sure ;(

Just to clarify: when you say "monolithic chunk" doesn't it mean that it
relies heavily on relative paths or something of the sort and couldn't be
broken into pieces because... well, it is so broken?

Thanks,
  Cos

On Mon, Jul 22, 2019 at 10:34PM, Olaf Flebbe wrote:
> Hi,
> 
> If I would have only known that google document before ... I am
> asking myself if we should link it from confluence.
> 
> + hadoop 3.
> As you may know, I worked on it (bigtop-alpha branch), got it
> compiled and packaged. However, while testing, I discovered that
> Hadoop project built on top of some assumptions which do not hold
> true for current Bigtop. One of them is that Hadoop 3 is to be
> installed as a monolithic chunk on a filesystem. This is not what I
> understand as integrated into a Linux distribution. Other point is
> that hadoop's installation proposal does not follow the LSB:
> platform specfic libs are in "share" directories. I am exhausted,
> this is so broken.
> 
> I will stop here ... for a reason.
> 
> Olaf
> 
> 
> 
> 
> 
> Am 17.07.19 um 09:09 schrieb Youngwoo Kim (김영우):
> >Hi folks,
> >
> >After 1.4.0 release, there is no discussion for the next release yet. so I
> >believe we need to share the ideas and prioritize the items for
> >development.
> >
> >And also https://issues.apache.org/jira/browse/BIGTOP-3123 and
> >https://docs.google.com/document/d/1F2Gxu8GARQDZXgqHn12LKkQ5wCV_AF4b_tVmjYB6YfA/edit
> >are good starting point for this discussion.
> >
> >My personal preferences are:
> >  - Distribution based on Hadoop 3
> >  - Up-to-date BigPetStore
> >  - Software stacks and framework for streaming data & Machine Learning
> >  - Containers and Cloud Native: What? How?
> >
> >It would be great to hear your thoughts.
> >
> >Thanks,
> >Youngwoo
> >

Re: [DISCUSS] Roadmap for the next release.1.5? 2.0?

Posted by Olaf Flebbe <of...@oflebbe.de>.

Hi,

If I would have only known that google document before ... I am asking 
myself if we should link it from confluence.

+ hadoop 3.
As you may know, I worked on it (bigtop-alpha branch), got it compiled 
and packaged. However, while testing, I discovered that Hadoop project 
built on top of some assumptions which do not hold true for current 
Bigtop. One of them is that Hadoop 3 is to be installed as a monolithic 
chunk on a filesystem. This is not what I understand as integrated into 
a Linux distribution. Other point is that hadoop's installation proposal 
does not follow the LSB: platform specfic libs are in "share" 
directories. I am exhausted, this is so broken.

I will stop here ... for a reason.

Olaf





Am 17.07.19 um 09:09 schrieb Youngwoo Kim (김영우):
> Hi folks,
> 
> After 1.4.0 release, there is no discussion for the next release yet. so I
> believe we need to share the ideas and prioritize the items for
> development.
> 
> And also https://issues.apache.org/jira/browse/BIGTOP-3123 and
> https://docs.google.com/document/d/1F2Gxu8GARQDZXgqHn12LKkQ5wCV_AF4b_tVmjYB6YfA/edit
> are good starting point for this discussion.
> 
> My personal preferences are:
>   - Distribution based on Hadoop 3
>   - Up-to-date BigPetStore
>   - Software stacks and framework for streaming data & Machine Learning
>   - Containers and Cloud Native: What? How?
> 
> It would be great to hear your thoughts.
> 
> Thanks,
> Youngwoo
>