You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by Stack <st...@duboce.net> on 2017/11/03 19:08:55 UTC

Re: [DISCUSSION] Merging HDFS-7240 Object Store (Ozone) to trunk

On Sat, Oct 28, 2017 at 2:00 PM, Konstantin Shvachko <sh...@gmail.com>
wrote:

> Hey guys,
>
> It is an interesting question whether Ozone should be a part of Hadoop.
>


I don't see a direct answer to this question. Is there one? Pardon me if
I've not seen it but I'm interested in the response.

I ask because IMO the "Hadoop" project is over-stuffed already. Just see
the length of the cc list on this email. Ozone could be standalone. It is a
coherent enough effort.

Thanks,
St.Ack





> There are two main reasons why I think it should not.
>
1. With close to 500 sub-tasks, with 6 MB of code changes, and with a
> sizable community behind, it looks to me like a whole new project.
> It is essentially a new storage system, with different (than HDFS)
> architecture, separate S3-like APIs. This is really great - the World sure
> needs more distributed file systems. But it is not clear why Ozone should
> co-exist with HDFS under the same roof.
>
> 2. Ozone is probably just the first step in rebuilding HDFS under a new
> architecture. With the next steps presumably being HDFS-10419 and
> HDFS-11118.
> The design doc for the new architecture has never been published. I can
> only assume based on some presentations and personal communications that
> the idea is to use Ozone as a block storage, and re-implement NameNode, so
> that it stores only a partial namesapce in memory, while the bulk of it
> (cold data) is persisted to a local storage.
> Such architecture makes me wonder if it solves Hadoop's main problems.
> There are two main limitations in HDFS:
>   a. The throughput of Namespace operations. Which is limited by the number
> of RPCs the NameNode can handle
>   b. The number of objects (files + blocks) the system can maintain. Which
> is limited by the memory size of the NameNode.
> The RPC performance (a) is more important for Hadoop scalability than the
> object count (b). The read RPCs being the main priority.
> The new architecture targets the object count problem, but in the expense
> of the RPC throughput. Which seems to be a wrong resolution of the
> tradeoff.
> Also based on the use patterns on our large clusters we read up to 90% of
> the data we write, so cold data is a small fraction and most of it must be
> cached.
>
> To summarize:
> - Ozone is a big enough system to deserve its own project.
> - The architecture that Ozone leads to does not seem to solve the intrinsic
> problems of current HDFS.
>
> I will post my opinion in the Ozone jira. Should be more convenient to
> discuss it there for further reference.
>
> Thanks,
> --Konstantin
>
>
>
> On Wed, Oct 18, 2017 at 6:54 PM, Yang Weiwei <ch...@hotmail.com>
> wrote:
>
> > Hello everyone,
> >
> >
> > I would like to start this thread to discuss merging Ozone (HDFS-7240) to
> > trunk. This feature implements an object store which can co-exist with
> > HDFS. Ozone is disabled by default. We have tested Ozone with cluster
> sizes
> > varying from 1 to 100 data nodes.
> >
> >
> >
> > The merge payload includes the following:
> >
> >   1.  All services, management scripts
> >   2.  Object store APIs, exposed via both REST and RPC
> >   3.  Master service UIs, command line interfaces
> >   4.  Pluggable pipeline Integration
> >   5.  Ozone File System (Hadoop compatible file system implementation,
> > passes all FileSystem contract tests)
> >   6.  Corona - a load generator for Ozone.
> >   7.  Essential documentation added to Hadoop site.
> >   8.  Version specific Ozone Documentation, accessible via service UI.
> >   9.  Docker support for ozone, which enables faster development cycles.
> >
> >
> > To build Ozone and run ozone using docker, please follow instructions in
> > this wiki page. https://cwiki.apache.org/confl
> > uence/display/HADOOP/Dev+cluster+with+docker.
> >
> >
> > We have built a passionate and diverse community to drive this feature
> > development. As a team, we have achieved significant progress in past 3
> > years since first JIRA for HDFS-7240 was opened on Oct 2014. So far, we
> > have resolved almost 400 JIRAs by 20+ contributors/committers from
> > different countries and affiliations. We also want to thank the large
> > number of community members who were supportive of our efforts and
> > contributed ideas and participated in the design of ozone.
> >
> >
> > Please share your thoughts, thanks!
> >
> >
> > -- Weiwei Yang
> >
>
>
> On Wed, Oct 18, 2017 at 6:54 PM, Yang Weiwei <ch...@hotmail.com>
> wrote:
>
> > Hello everyone,
> >
> >
> > I would like to start this thread to discuss merging Ozone (HDFS-7240) to
> > trunk. This feature implements an object store which can co-exist with
> > HDFS. Ozone is disabled by default. We have tested Ozone with cluster
> sizes
> > varying from 1 to 100 data nodes.
> >
> >
> >
> > The merge payload includes the following:
> >
> >   1.  All services, management scripts
> >   2.  Object store APIs, exposed via both REST and RPC
> >   3.  Master service UIs, command line interfaces
> >   4.  Pluggable pipeline Integration
> >   5.  Ozone File System (Hadoop compatible file system implementation,
> > passes all FileSystem contract tests)
> >   6.  Corona - a load generator for Ozone.
> >   7.  Essential documentation added to Hadoop site.
> >   8.  Version specific Ozone Documentation, accessible via service UI.
> >   9.  Docker support for ozone, which enables faster development cycles.
> >
> >
> > To build Ozone and run ozone using docker, please follow instructions in
> > this wiki page. https://cwiki.apache.org/confluence/display/HADOOP/Dev+
> > cluster+with+docker.
> >
> >
> > We have built a passionate and diverse community to drive this feature
> > development. As a team, we have achieved significant progress in past 3
> > years since first JIRA for HDFS-7240 was opened on Oct 2014. So far, we
> > have resolved almost 400 JIRAs by 20+ contributors/committers from
> > different countries and affiliations. We also want to thank the large
> > number of community members who were supportive of our efforts and
> > contributed ideas and participated in the design of ozone.
> >
> >
> > Please share your thoughts, thanks!
> >
> >
> > -- Weiwei Yang
> >
>

Re: [DISCUSSION] Merging HDFS-7240 Object Store (Ozone) to trunk

Posted by Vinod Kumar Vavilapalli <vi...@apache.org>.
> 	At a minimum, it should at least be using it’s own maven module for a lot of the bits that generates it’s own maven jars so that we can split this functionality up at build/test time.


I expected this to be the case, but looks like it isn't.

There's lot of value in splitting the HDFS code into smaller modules. Definitely newer code like Ozone.

When we did this for YARN, initially there were concerns about module proliferation, but looking back, my observations have been that it has done us far more good than expected. Starting with the fact that we had clients and servers modularized independently, as well as servers from other servers, with far cleaner contracts than what we had in Hadoop 1 world.

Thanks
+Vinod

Re: [DISCUSSION] Merging HDFS-7240 Object Store (Ozone) to trunk

Posted by Vinod Kumar Vavilapalli <vi...@apache.org>.
> 	At a minimum, it should at least be using it’s own maven module for a lot of the bits that generates it’s own maven jars so that we can split this functionality up at build/test time.


I expected this to be the case, but looks like it isn't.

There's lot of value in splitting the HDFS code into smaller modules. Definitely newer code like Ozone.

When we did this for YARN, initially there were concerns about module proliferation, but looking back, my observations have been that it has done us far more good than expected. Starting with the fact that we had clients and servers modularized independently, as well as servers from other servers, with far cleaner contracts than what we had in Hadoop 1 world.

Thanks
+Vinod

Re: [DISCUSSION] Merging HDFS-7240 Object Store (Ozone) to trunk

Posted by Vinod Kumar Vavilapalli <vi...@apache.org>.
> 	At a minimum, it should at least be using it’s own maven module for a lot of the bits that generates it’s own maven jars so that we can split this functionality up at build/test time.


I expected this to be the case, but looks like it isn't.

There's lot of value in splitting the HDFS code into smaller modules. Definitely newer code like Ozone.

When we did this for YARN, initially there were concerns about module proliferation, but looking back, my observations have been that it has done us far more good than expected. Starting with the fact that we had clients and servers modularized independently, as well as servers from other servers, with far cleaner contracts than what we had in Hadoop 1 world.

Thanks
+Vinod

Re: [DISCUSSION] Merging HDFS-7240 Object Store (Ozone) to trunk

Posted by Vinod Kumar Vavilapalli <vi...@apache.org>.
> 	At a minimum, it should at least be using it’s own maven module for a lot of the bits that generates it’s own maven jars so that we can split this functionality up at build/test time.


I expected this to be the case, but looks like it isn't.

There's lot of value in splitting the HDFS code into smaller modules. Definitely newer code like Ozone.

When we did this for YARN, initially there were concerns about module proliferation, but looking back, my observations have been that it has done us far more good than expected. Starting with the fact that we had clients and servers modularized independently, as well as servers from other servers, with far cleaner contracts than what we had in Hadoop 1 world.

Thanks
+Vinod

Re: [DISCUSSION] Merging HDFS-7240 Object Store (Ozone) to trunk

Posted by Allen Wittenauer <aw...@effectivemachines.com>.
> On Nov 3, 2017, at 12:08 PM, Stack <st...@duboce.net> wrote:
> 
> On Sat, Oct 28, 2017 at 2:00 PM, Konstantin Shvachko <sh...@gmail.com>
> wrote:
> 
>> It is an interesting question whether Ozone should be a part of Hadoop.
> 
> I don't see a direct answer to this question. Is there one? Pardon me if
> I've not seen it but I'm interested in the response.

	+1

	Given:

	* a completely different set of config files (ozone-site.xml, etc)
	* package name is org.apache.hadoop.ozone, not org.apache.hadoop.hdfs.ozone

	… it doesn’t really seem to want to be part of HDFS, much less Hadoop.

Plus hadoop-hdfs-project/hadoop-hdfs is already a battle zone when it comes to unit tests, dependencies, etc [*]

	At a minimum, it should at least be using it’s own maven module for a lot of the bits that generates it’s own maven jars so that we can split this functionality up at build/test time.

	At a higher level, this feels a lot like the design decisions that were made around yarn-native-services.  This feature is either part of HDFS or it’s not. Pick one.  Doing both is incredibly confusing for everyone outside of the branch.
---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org


Re: [DISCUSSION] Merging HDFS-7240 Object Store (Ozone) to trunk

Posted by Allen Wittenauer <aw...@effectivemachines.com>.
> On Nov 3, 2017, at 12:08 PM, Stack <st...@duboce.net> wrote:
> 
> On Sat, Oct 28, 2017 at 2:00 PM, Konstantin Shvachko <sh...@gmail.com>
> wrote:
> 
>> It is an interesting question whether Ozone should be a part of Hadoop.
> 
> I don't see a direct answer to this question. Is there one? Pardon me if
> I've not seen it but I'm interested in the response.

	+1

	Given:

	* a completely different set of config files (ozone-site.xml, etc)
	* package name is org.apache.hadoop.ozone, not org.apache.hadoop.hdfs.ozone

	… it doesn’t really seem to want to be part of HDFS, much less Hadoop.

Plus hadoop-hdfs-project/hadoop-hdfs is already a battle zone when it comes to unit tests, dependencies, etc [*]

	At a minimum, it should at least be using it’s own maven module for a lot of the bits that generates it’s own maven jars so that we can split this functionality up at build/test time.

	At a higher level, this feels a lot like the design decisions that were made around yarn-native-services.  This feature is either part of HDFS or it’s not. Pick one.  Doing both is incredibly confusing for everyone outside of the branch.
---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org


Re: [DISCUSSION] Merging HDFS-7240 Object Store (Ozone) to trunk

Posted by Allen Wittenauer <aw...@effectivemachines.com>.
> On Nov 3, 2017, at 12:08 PM, Stack <st...@duboce.net> wrote:
> 
> On Sat, Oct 28, 2017 at 2:00 PM, Konstantin Shvachko <sh...@gmail.com>
> wrote:
> 
>> It is an interesting question whether Ozone should be a part of Hadoop.
> 
> I don't see a direct answer to this question. Is there one? Pardon me if
> I've not seen it but I'm interested in the response.

	+1

	Given:

	* a completely different set of config files (ozone-site.xml, etc)
	* package name is org.apache.hadoop.ozone, not org.apache.hadoop.hdfs.ozone

	… it doesn’t really seem to want to be part of HDFS, much less Hadoop.

Plus hadoop-hdfs-project/hadoop-hdfs is already a battle zone when it comes to unit tests, dependencies, etc [*]

	At a minimum, it should at least be using it’s own maven module for a lot of the bits that generates it’s own maven jars so that we can split this functionality up at build/test time.

	At a higher level, this feels a lot like the design decisions that were made around yarn-native-services.  This feature is either part of HDFS or it’s not. Pick one.  Doing both is incredibly confusing for everyone outside of the branch.
---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org


Re: [DISCUSSION] Merging HDFS-7240 Object Store (Ozone) to trunk

Posted by Allen Wittenauer <aw...@effectivemachines.com>.
> On Nov 3, 2017, at 12:08 PM, Stack <st...@duboce.net> wrote:
> 
> On Sat, Oct 28, 2017 at 2:00 PM, Konstantin Shvachko <sh...@gmail.com>
> wrote:
> 
>> It is an interesting question whether Ozone should be a part of Hadoop.
> 
> I don't see a direct answer to this question. Is there one? Pardon me if
> I've not seen it but I'm interested in the response.

	+1

	Given:

	* a completely different set of config files (ozone-site.xml, etc)
	* package name is org.apache.hadoop.ozone, not org.apache.hadoop.hdfs.ozone

	… it doesn’t really seem to want to be part of HDFS, much less Hadoop.

Plus hadoop-hdfs-project/hadoop-hdfs is already a battle zone when it comes to unit tests, dependencies, etc [*]

	At a minimum, it should at least be using it’s own maven module for a lot of the bits that generates it’s own maven jars so that we can split this functionality up at build/test time.

	At a higher level, this feels a lot like the design decisions that were made around yarn-native-services.  This feature is either part of HDFS or it’s not. Pick one.  Doing both is incredibly confusing for everyone outside of the branch.
---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-help@hadoop.apache.org