You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by Wei-Chiu Chuang <we...@apache.org> on 2019/06/13 18:56:50 UTC

HDFS Scalability Limit?

Hi community,

I am currently drafting a HDFS scalability guideline doc, and I'd like to
understand any data points regarding HDFS scalability limit. I'd like to
share it publicly eventually.

As an example, through my workplace, and through community chatters, I am
aware that HDFS is capable of operating at the following scale:

Number of DataNodes:
Unfederated: I can reasonably believe a single HDFS NameNode can manage up
to 4000 DataNodes. Is there any one who would like to share an even larger
cluster?

Federated: I am aware of one federated HDFS cluster composed of 20,000
DataNodes. JD.com
<https://conferences.oreilly.com/strata/strata-eu-2018/public/schedule/detail/64692>
has a 15K DN cluster and 210PB total capacity. I suspect it's a federated
HDFS cluster.

Number of blocks & files:
500 million files&blocks seems to be the upper limit at this point. At this
scale NameNode consumes around 200GB heap, and my experience told me any
number beyond 200GB is unstable. But at some point I recalled some one
mentioned a 400GB NN heap.

Amount of Data:
I am aware a few clusters more than 100PB in size (federated or not) --
Uber, Yahoo Japan, JD.com.

Number of Volumes in a DataNode:
DataNodes with 24 volumes is known to work reasonably well. If DataNode is
used for archival use cases, a DN can have up to 48 volumes. This is
certainly hardware dependent, but if I know where the current limit is, I
can start optimizing the software.

Total disk space:
CDH
<https://www.cloudera.com/documentation/enterprise/release-notes/topics/hardware_requirements_guide.html#concept_fzz_dq4_gbb>
recommends no more than 100TB per DataNode. Are there successful
deployments that install more than this number? Of course, you can easily
exceed this number if it is used purely for data archival.

What are other scalability limits that people are interested?

Best,
Wei-Chiu

Re: HDFS Scalability Limit?

Posted by Konstantin Shvachko <sh...@gmail.com>.

Hi Wei-Chiu,

We run similar Hadoop installations as Kihwal describes. Thanks for sharing
Kihwal.
Our clusters are also not Federated.
So far the growth rate is the same as we reported in our talk last year
(slide #5) 2x per year:
https://www.slideshare.net/Hadoop_Summit/scaling-hadoop-at-linkedin-107176757

We track three main metrics: Total space used, Number of Objects (Files+
blocks), and Number of Tasks per day.
I found that the number of nodes is mostly irrelevant in measuring cluster
size, since the nodes are very diverse in configuration and are constantly
upgrading, so you may have the same #nodes, but much more drives, cores,
RAM on each of them - a bigger cluster.

I do not see 200 GB heap size as a limit. We ran Dynamometer experiments
with a bigger heap fitting 1 billion files and blocks. It should be doable,
but we may hit other scalability limits when we get to so many objects. See
Erik's talk discussing the experiments and solutions:
https://www.slideshare.net/xkrogen/hadoop-meetup-jan-2019-dynamometer-and-a-case-study-in-namenode-gc

Well Hadoop scalability has always been a moving target for me. I don't
think you can set it in stone once and for all.

Thanks,
--Konstantin

On Sat, Jun 15, 2019 at 5:20 PM Wei-Chiu Chuang <we...@apache.org> wrote:

> Thank you, Kihwal for the insightful comments!
>
> As I understand it, Yahoo's ops team has a good control of application
> behavior. I tend to be conservative in terms of number of files&blocks and
> heap size. We don't have such luxury, and our customers have a wide
> spectrum of workloads and features (e.g., snapshots, data at-rest
> encryption, Impala).
>
> Yes -- decomm/recomm is a pain, and I am working with my colleague,
> @Stephen
> O'Donnell <so...@cloudera.com> , to address this problem. Have you
> tried maintenance mode? It's in Hadoop 2.9 but a number of decomm/recomm
> needs are alleviated by maintenance mode.
>
> I know Twitter is a big user of maintenance mode, and I'm wondering if
> Twitter folks can offer some experience with it at large scale. CDH
> supports maintenance mode, but our users don't seem to be quite familiar
> with it. Are there issues that were dealt with, but not reported in the
> JIRA? Basically, I'd like to know the operational complexity of this
> feature at large scale.
>
> On Thu, Jun 13, 2019 at 4:00 PM Kihwal Lee <kihwal@verizonmedia.com
> .invalid>
> wrote:
>
> > Hi Wei-Chiu,
> >
> > We have experience with 5,000 - 6,000 node clusters.  Although it
> ran/runs
> > fine, any heavy hitter activities such as decommissioning needed to be
> > carefully planned.   In terms of files and blocks, we have multiple
> > clusters running stable with over 500M files and blocks.  Some at over
> 800M
> > with the max heap at 256GB. It can probably go higher, but we haven't
> done
> > performance testing & optimizations beyond 256GB yet.  All our clusters
> are
> > un-federated. Funny how the feature was developed in Yahoo! and ended up
> > not being used here. :)  We have a cluster with about 180PB of
> provisioned
> > space. Many clusters are using over 100PB in their steady state.  We
> don't
> > run datanodes too dense, so can't tell what the per-datanode limit is.
> >
> > Thanks and 73
> > Kihwal
> >
> > On Thu, Jun 13, 2019 at 1:57 PM Wei-Chiu Chuang <we...@apache.org>
> > wrote:
> >
> >> Hi community,
> >>
> >> I am currently drafting a HDFS scalability guideline doc, and I'd like
> to
> >> understand any data points regarding HDFS scalability limit. I'd like to
> >> share it publicly eventually.
> >>
> >> As an example, through my workplace, and through community chatters, I
> am
> >> aware that HDFS is capable of operating at the following scale:
> >>
> >> Number of DataNodes:
> >> Unfederated: I can reasonably believe a single HDFS NameNode can manage
> up
> >> to 4000 DataNodes. Is there any one who would like to share an even
> larger
> >> cluster?
> >>
> >> Federated: I am aware of one federated HDFS cluster composed of 20,000
> >> DataNodes. JD.com
> >> <
> >>
> https://conferences.oreilly.com/strata/strata-eu-2018/public/schedule/detail/64692
> >> >
> >> has a 15K DN cluster and 210PB total capacity. I suspect it's a
> federated
> >> HDFS cluster.
> >>
> >> Number of blocks & files:
> >> 500 million files&blocks seems to be the upper limit at this point. At
> >> this
> >> scale NameNode consumes around 200GB heap, and my experience told me any
> >> number beyond 200GB is unstable. But at some point I recalled some one
> >> mentioned a 400GB NN heap.
> >>
> >> Amount of Data:
> >> I am aware a few clusters more than 100PB in size (federated or not) --
> >> Uber, Yahoo Japan, JD.com.
> >>
> >> Number of Volumes in a DataNode:
> >> DataNodes with 24 volumes is known to work reasonably well. If DataNode
> is
> >> used for archival use cases, a DN can have up to 48 volumes. This is
> >> certainly hardware dependent, but if I know where the current limit is,
> I
> >> can start optimizing the software.
> >>
> >> Total disk space:
> >> CDH
> >> <
> >>
> https://www.cloudera.com/documentation/enterprise/release-notes/topics/hardware_requirements_guide.html#concept_fzz_dq4_gbb
> >> >
> >> recommends no more than 100TB per DataNode. Are there successful
> >> deployments that install more than this number? Of course, you can
> easily
> >> exceed this number if it is used purely for data archival.
> >>
> >>
> >> What are other scalability limits that people are interested?
> >>
> >> Best,
> >> Wei-Chiu
> >>
> >
>

Re: HDFS Scalability Limit?

Posted by Konstantin Shvachko <sh...@gmail.com>.

Hi Wei-Chiu,

We run similar Hadoop installations as Kihwal describes. Thanks for sharing
Kihwal.
Our clusters are also not Federated.
So far the growth rate is the same as we reported in our talk last year
(slide #5) 2x per year:
https://www.slideshare.net/Hadoop_Summit/scaling-hadoop-at-linkedin-107176757

We track three main metrics: Total space used, Number of Objects (Files+
blocks), and Number of Tasks per day.
I found that the number of nodes is mostly irrelevant in measuring cluster
size, since the nodes are very diverse in configuration and are constantly
upgrading, so you may have the same #nodes, but much more drives, cores,
RAM on each of them - a bigger cluster.

I do not see 200 GB heap size as a limit. We ran Dynamometer experiments
with a bigger heap fitting 1 billion files and blocks. It should be doable,
but we may hit other scalability limits when we get to so many objects. See
Erik's talk discussing the experiments and solutions:
https://www.slideshare.net/xkrogen/hadoop-meetup-jan-2019-dynamometer-and-a-case-study-in-namenode-gc

Well Hadoop scalability has always been a moving target for me. I don't
think you can set it in stone once and for all.

Thanks,
--Konstantin

On Sat, Jun 15, 2019 at 5:20 PM Wei-Chiu Chuang <we...@apache.org> wrote:

> Thank you, Kihwal for the insightful comments!
>
> As I understand it, Yahoo's ops team has a good control of application
> behavior. I tend to be conservative in terms of number of files&blocks and
> heap size. We don't have such luxury, and our customers have a wide
> spectrum of workloads and features (e.g., snapshots, data at-rest
> encryption, Impala).
>
> Yes -- decomm/recomm is a pain, and I am working with my colleague,
> @Stephen
> O'Donnell <so...@cloudera.com> , to address this problem. Have you
> tried maintenance mode? It's in Hadoop 2.9 but a number of decomm/recomm
> needs are alleviated by maintenance mode.
>
> I know Twitter is a big user of maintenance mode, and I'm wondering if
> Twitter folks can offer some experience with it at large scale. CDH
> supports maintenance mode, but our users don't seem to be quite familiar
> with it. Are there issues that were dealt with, but not reported in the
> JIRA? Basically, I'd like to know the operational complexity of this
> feature at large scale.
>
> On Thu, Jun 13, 2019 at 4:00 PM Kihwal Lee <kihwal@verizonmedia.com
> .invalid>
> wrote:
>
> > Hi Wei-Chiu,
> >
> > We have experience with 5,000 - 6,000 node clusters.  Although it
> ran/runs
> > fine, any heavy hitter activities such as decommissioning needed to be
> > carefully planned.   In terms of files and blocks, we have multiple
> > clusters running stable with over 500M files and blocks.  Some at over
> 800M
> > with the max heap at 256GB. It can probably go higher, but we haven't
> done
> > performance testing & optimizations beyond 256GB yet.  All our clusters
> are
> > un-federated. Funny how the feature was developed in Yahoo! and ended up
> > not being used here. :)  We have a cluster with about 180PB of
> provisioned
> > space. Many clusters are using over 100PB in their steady state.  We
> don't
> > run datanodes too dense, so can't tell what the per-datanode limit is.
> >
> > Thanks and 73
> > Kihwal
> >
> > On Thu, Jun 13, 2019 at 1:57 PM Wei-Chiu Chuang <we...@apache.org>
> > wrote:
> >
> >> Hi community,
> >>
> >> I am currently drafting a HDFS scalability guideline doc, and I'd like
> to
> >> understand any data points regarding HDFS scalability limit. I'd like to
> >> share it publicly eventually.
> >>
> >> As an example, through my workplace, and through community chatters, I
> am
> >> aware that HDFS is capable of operating at the following scale:
> >>
> >> Number of DataNodes:
> >> Unfederated: I can reasonably believe a single HDFS NameNode can manage
> up
> >> to 4000 DataNodes. Is there any one who would like to share an even
> larger
> >> cluster?
> >>
> >> Federated: I am aware of one federated HDFS cluster composed of 20,000
> >> DataNodes. JD.com
> >> <
> >>
> https://conferences.oreilly.com/strata/strata-eu-2018/public/schedule/detail/64692
> >> >
> >> has a 15K DN cluster and 210PB total capacity. I suspect it's a
> federated
> >> HDFS cluster.
> >>
> >> Number of blocks & files:
> >> 500 million files&blocks seems to be the upper limit at this point. At
> >> this
> >> scale NameNode consumes around 200GB heap, and my experience told me any
> >> number beyond 200GB is unstable. But at some point I recalled some one
> >> mentioned a 400GB NN heap.
> >>
> >> Amount of Data:
> >> I am aware a few clusters more than 100PB in size (federated or not) --
> >> Uber, Yahoo Japan, JD.com.
> >>
> >> Number of Volumes in a DataNode:
> >> DataNodes with 24 volumes is known to work reasonably well. If DataNode
> is
> >> used for archival use cases, a DN can have up to 48 volumes. This is
> >> certainly hardware dependent, but if I know where the current limit is,
> I
> >> can start optimizing the software.
> >>
> >> Total disk space:
> >> CDH
> >> <
> >>
> https://www.cloudera.com/documentation/enterprise/release-notes/topics/hardware_requirements_guide.html#concept_fzz_dq4_gbb
> >> >
> >> recommends no more than 100TB per DataNode. Are there successful
> >> deployments that install more than this number? Of course, you can
> easily
> >> exceed this number if it is used purely for data archival.
> >>
> >>
> >> What are other scalability limits that people are interested?
> >>
> >> Best,
> >> Wei-Chiu
> >>
> >
>

Re: HDFS Scalability Limit?

Posted by Konstantin Shvachko <sh...@gmail.com>.

Hi Wei-Chiu,

We run similar Hadoop installations as Kihwal describes. Thanks for sharing
Kihwal.
Our clusters are also not Federated.
So far the growth rate is the same as we reported in our talk last year
(slide #5) 2x per year:
https://www.slideshare.net/Hadoop_Summit/scaling-hadoop-at-linkedin-107176757

We track three main metrics: Total space used, Number of Objects (Files+
blocks), and Number of Tasks per day.
I found that the number of nodes is mostly irrelevant in measuring cluster
size, since the nodes are very diverse in configuration and are constantly
upgrading, so you may have the same #nodes, but much more drives, cores,
RAM on each of them - a bigger cluster.

I do not see 200 GB heap size as a limit. We ran Dynamometer experiments
with a bigger heap fitting 1 billion files and blocks. It should be doable,
but we may hit other scalability limits when we get to so many objects. See
Erik's talk discussing the experiments and solutions:
https://www.slideshare.net/xkrogen/hadoop-meetup-jan-2019-dynamometer-and-a-case-study-in-namenode-gc

Well Hadoop scalability has always been a moving target for me. I don't
think you can set it in stone once and for all.

Thanks,
--Konstantin

On Sat, Jun 15, 2019 at 5:20 PM Wei-Chiu Chuang <we...@apache.org> wrote:

> Thank you, Kihwal for the insightful comments!
>
> As I understand it, Yahoo's ops team has a good control of application
> behavior. I tend to be conservative in terms of number of files&blocks and
> heap size. We don't have such luxury, and our customers have a wide
> spectrum of workloads and features (e.g., snapshots, data at-rest
> encryption, Impala).
>
> Yes -- decomm/recomm is a pain, and I am working with my colleague,
> @Stephen
> O'Donnell <so...@cloudera.com> , to address this problem. Have you
> tried maintenance mode? It's in Hadoop 2.9 but a number of decomm/recomm
> needs are alleviated by maintenance mode.
>
> I know Twitter is a big user of maintenance mode, and I'm wondering if
> Twitter folks can offer some experience with it at large scale. CDH
> supports maintenance mode, but our users don't seem to be quite familiar
> with it. Are there issues that were dealt with, but not reported in the
> JIRA? Basically, I'd like to know the operational complexity of this
> feature at large scale.
>
> On Thu, Jun 13, 2019 at 4:00 PM Kihwal Lee <kihwal@verizonmedia.com
> .invalid>
> wrote:
>
> > Hi Wei-Chiu,
> >
> > We have experience with 5,000 - 6,000 node clusters.  Although it
> ran/runs
> > fine, any heavy hitter activities such as decommissioning needed to be
> > carefully planned.   In terms of files and blocks, we have multiple
> > clusters running stable with over 500M files and blocks.  Some at over
> 800M
> > with the max heap at 256GB. It can probably go higher, but we haven't
> done
> > performance testing & optimizations beyond 256GB yet.  All our clusters
> are
> > un-federated. Funny how the feature was developed in Yahoo! and ended up
> > not being used here. :)  We have a cluster with about 180PB of
> provisioned
> > space. Many clusters are using over 100PB in their steady state.  We
> don't
> > run datanodes too dense, so can't tell what the per-datanode limit is.
> >
> > Thanks and 73
> > Kihwal
> >
> > On Thu, Jun 13, 2019 at 1:57 PM Wei-Chiu Chuang <we...@apache.org>
> > wrote:
> >
> >> Hi community,
> >>
> >> I am currently drafting a HDFS scalability guideline doc, and I'd like
> to
> >> understand any data points regarding HDFS scalability limit. I'd like to
> >> share it publicly eventually.
> >>
> >> As an example, through my workplace, and through community chatters, I
> am
> >> aware that HDFS is capable of operating at the following scale:
> >>
> >> Number of DataNodes:
> >> Unfederated: I can reasonably believe a single HDFS NameNode can manage
> up
> >> to 4000 DataNodes. Is there any one who would like to share an even
> larger
> >> cluster?
> >>
> >> Federated: I am aware of one federated HDFS cluster composed of 20,000
> >> DataNodes. JD.com
> >> <
> >>
> https://conferences.oreilly.com/strata/strata-eu-2018/public/schedule/detail/64692
> >> >
> >> has a 15K DN cluster and 210PB total capacity. I suspect it's a
> federated
> >> HDFS cluster.
> >>
> >> Number of blocks & files:
> >> 500 million files&blocks seems to be the upper limit at this point. At
> >> this
> >> scale NameNode consumes around 200GB heap, and my experience told me any
> >> number beyond 200GB is unstable. But at some point I recalled some one
> >> mentioned a 400GB NN heap.
> >>
> >> Amount of Data:
> >> I am aware a few clusters more than 100PB in size (federated or not) --
> >> Uber, Yahoo Japan, JD.com.
> >>
> >> Number of Volumes in a DataNode:
> >> DataNodes with 24 volumes is known to work reasonably well. If DataNode
> is
> >> used for archival use cases, a DN can have up to 48 volumes. This is
> >> certainly hardware dependent, but if I know where the current limit is,
> I
> >> can start optimizing the software.
> >>
> >> Total disk space:
> >> CDH
> >> <
> >>
> https://www.cloudera.com/documentation/enterprise/release-notes/topics/hardware_requirements_guide.html#concept_fzz_dq4_gbb
> >> >
> >> recommends no more than 100TB per DataNode. Are there successful
> >> deployments that install more than this number? Of course, you can
> easily
> >> exceed this number if it is used purely for data archival.
> >>
> >>
> >> What are other scalability limits that people are interested?
> >>
> >> Best,
> >> Wei-Chiu
> >>
> >
>

Re: HDFS Scalability Limit?

Posted by Wei-Chiu Chuang <we...@apache.org>.

Thank you, Kihwal for the insightful comments!

As I understand it, Yahoo's ops team has a good control of application
behavior. I tend to be conservative in terms of number of files&blocks and
heap size. We don't have such luxury, and our customers have a wide
spectrum of workloads and features (e.g., snapshots, data at-rest
encryption, Impala).

Yes -- decomm/recomm is a pain, and I am working with my colleague, @Stephen
O'Donnell <so...@cloudera.com> , to address this problem. Have you
tried maintenance mode? It's in Hadoop 2.9 but a number of decomm/recomm
needs are alleviated by maintenance mode.

I know Twitter is a big user of maintenance mode, and I'm wondering if
Twitter folks can offer some experience with it at large scale. CDH
supports maintenance mode, but our users don't seem to be quite familiar
with it. Are there issues that were dealt with, but not reported in the
JIRA? Basically, I'd like to know the operational complexity of this
feature at large scale.

On Thu, Jun 13, 2019 at 4:00 PM Kihwal Lee <ki...@verizonmedia.com.invalid>
wrote:

> Hi Wei-Chiu,
>
> We have experience with 5,000 - 6,000 node clusters.  Although it ran/runs
> fine, any heavy hitter activities such as decommissioning needed to be
> carefully planned.   In terms of files and blocks, we have multiple
> clusters running stable with over 500M files and blocks.  Some at over 800M
> with the max heap at 256GB. It can probably go higher, but we haven't done
> performance testing & optimizations beyond 256GB yet.  All our clusters are
> un-federated. Funny how the feature was developed in Yahoo! and ended up
> not being used here. :)  We have a cluster with about 180PB of provisioned
> space. Many clusters are using over 100PB in their steady state.  We don't
> run datanodes too dense, so can't tell what the per-datanode limit is.
>
> Thanks and 73
> Kihwal
>
> On Thu, Jun 13, 2019 at 1:57 PM Wei-Chiu Chuang <we...@apache.org>
> wrote:
>
>> Hi community,
>>
>> I am currently drafting a HDFS scalability guideline doc, and I'd like to
>> understand any data points regarding HDFS scalability limit. I'd like to
>> share it publicly eventually.
>>
>> As an example, through my workplace, and through community chatters, I am
>> aware that HDFS is capable of operating at the following scale:
>>
>> Number of DataNodes:
>> Unfederated: I can reasonably believe a single HDFS NameNode can manage up
>> to 4000 DataNodes. Is there any one who would like to share an even larger
>> cluster?
>>
>> Federated: I am aware of one federated HDFS cluster composed of 20,000
>> DataNodes. JD.com
>> <
>> https://conferences.oreilly.com/strata/strata-eu-2018/public/schedule/detail/64692
>> >
>> has a 15K DN cluster and 210PB total capacity. I suspect it's a federated
>> HDFS cluster.
>>
>> Number of blocks & files:
>> 500 million files&blocks seems to be the upper limit at this point. At
>> this
>> scale NameNode consumes around 200GB heap, and my experience told me any
>> number beyond 200GB is unstable. But at some point I recalled some one
>> mentioned a 400GB NN heap.
>>
>> Amount of Data:
>> I am aware a few clusters more than 100PB in size (federated or not) --
>> Uber, Yahoo Japan, JD.com.
>>
>> Number of Volumes in a DataNode:
>> DataNodes with 24 volumes is known to work reasonably well. If DataNode is
>> used for archival use cases, a DN can have up to 48 volumes. This is
>> certainly hardware dependent, but if I know where the current limit is, I
>> can start optimizing the software.
>>
>> Total disk space:
>> CDH
>> <
>> https://www.cloudera.com/documentation/enterprise/release-notes/topics/hardware_requirements_guide.html#concept_fzz_dq4_gbb
>> >
>> recommends no more than 100TB per DataNode. Are there successful
>> deployments that install more than this number? Of course, you can easily
>> exceed this number if it is used purely for data archival.
>>
>>
>> What are other scalability limits that people are interested?
>>
>> Best,
>> Wei-Chiu
>>
>

Re: HDFS Scalability Limit?

Posted by Wei-Chiu Chuang <we...@apache.org>.

Thank you, Kihwal for the insightful comments!

As I understand it, Yahoo's ops team has a good control of application
behavior. I tend to be conservative in terms of number of files&blocks and
heap size. We don't have such luxury, and our customers have a wide
spectrum of workloads and features (e.g., snapshots, data at-rest
encryption, Impala).

Yes -- decomm/recomm is a pain, and I am working with my colleague, @Stephen
O'Donnell <so...@cloudera.com> , to address this problem. Have you
tried maintenance mode? It's in Hadoop 2.9 but a number of decomm/recomm
needs are alleviated by maintenance mode.

I know Twitter is a big user of maintenance mode, and I'm wondering if
Twitter folks can offer some experience with it at large scale. CDH
supports maintenance mode, but our users don't seem to be quite familiar
with it. Are there issues that were dealt with, but not reported in the
JIRA? Basically, I'd like to know the operational complexity of this
feature at large scale.

On Thu, Jun 13, 2019 at 4:00 PM Kihwal Lee <ki...@verizonmedia.com.invalid>
wrote:

> Hi Wei-Chiu,
>
> We have experience with 5,000 - 6,000 node clusters.  Although it ran/runs
> fine, any heavy hitter activities such as decommissioning needed to be
> carefully planned.   In terms of files and blocks, we have multiple
> clusters running stable with over 500M files and blocks.  Some at over 800M
> with the max heap at 256GB. It can probably go higher, but we haven't done
> performance testing & optimizations beyond 256GB yet.  All our clusters are
> un-federated. Funny how the feature was developed in Yahoo! and ended up
> not being used here. :)  We have a cluster with about 180PB of provisioned
> space. Many clusters are using over 100PB in their steady state.  We don't
> run datanodes too dense, so can't tell what the per-datanode limit is.
>
> Thanks and 73
> Kihwal
>
> On Thu, Jun 13, 2019 at 1:57 PM Wei-Chiu Chuang <we...@apache.org>
> wrote:
>
>> Hi community,
>>
>> I am currently drafting a HDFS scalability guideline doc, and I'd like to
>> understand any data points regarding HDFS scalability limit. I'd like to
>> share it publicly eventually.
>>
>> As an example, through my workplace, and through community chatters, I am
>> aware that HDFS is capable of operating at the following scale:
>>
>> Number of DataNodes:
>> Unfederated: I can reasonably believe a single HDFS NameNode can manage up
>> to 4000 DataNodes. Is there any one who would like to share an even larger
>> cluster?
>>
>> Federated: I am aware of one federated HDFS cluster composed of 20,000
>> DataNodes. JD.com
>> <
>> https://conferences.oreilly.com/strata/strata-eu-2018/public/schedule/detail/64692
>> >
>> has a 15K DN cluster and 210PB total capacity. I suspect it's a federated
>> HDFS cluster.
>>
>> Number of blocks & files:
>> 500 million files&blocks seems to be the upper limit at this point. At
>> this
>> scale NameNode consumes around 200GB heap, and my experience told me any
>> number beyond 200GB is unstable. But at some point I recalled some one
>> mentioned a 400GB NN heap.
>>
>> Amount of Data:
>> I am aware a few clusters more than 100PB in size (federated or not) --
>> Uber, Yahoo Japan, JD.com.
>>
>> Number of Volumes in a DataNode:
>> DataNodes with 24 volumes is known to work reasonably well. If DataNode is
>> used for archival use cases, a DN can have up to 48 volumes. This is
>> certainly hardware dependent, but if I know where the current limit is, I
>> can start optimizing the software.
>>
>> Total disk space:
>> CDH
>> <
>> https://www.cloudera.com/documentation/enterprise/release-notes/topics/hardware_requirements_guide.html#concept_fzz_dq4_gbb
>> >
>> recommends no more than 100TB per DataNode. Are there successful
>> deployments that install more than this number? Of course, you can easily
>> exceed this number if it is used purely for data archival.
>>
>>
>> What are other scalability limits that people are interested?
>>
>> Best,
>> Wei-Chiu
>>
>

Re: HDFS Scalability Limit?

Posted by Wei-Chiu Chuang <we...@apache.org>.

Thank you, Kihwal for the insightful comments!

As I understand it, Yahoo's ops team has a good control of application
behavior. I tend to be conservative in terms of number of files&blocks and
heap size. We don't have such luxury, and our customers have a wide
spectrum of workloads and features (e.g., snapshots, data at-rest
encryption, Impala).

Yes -- decomm/recomm is a pain, and I am working with my colleague, @Stephen
O'Donnell <so...@cloudera.com> , to address this problem. Have you
tried maintenance mode? It's in Hadoop 2.9 but a number of decomm/recomm
needs are alleviated by maintenance mode.

I know Twitter is a big user of maintenance mode, and I'm wondering if
Twitter folks can offer some experience with it at large scale. CDH
supports maintenance mode, but our users don't seem to be quite familiar
with it. Are there issues that were dealt with, but not reported in the
JIRA? Basically, I'd like to know the operational complexity of this
feature at large scale.

On Thu, Jun 13, 2019 at 4:00 PM Kihwal Lee <ki...@verizonmedia.com.invalid>
wrote:

> Hi Wei-Chiu,
>
> We have experience with 5,000 - 6,000 node clusters.  Although it ran/runs
> fine, any heavy hitter activities such as decommissioning needed to be
> carefully planned.   In terms of files and blocks, we have multiple
> clusters running stable with over 500M files and blocks.  Some at over 800M
> with the max heap at 256GB. It can probably go higher, but we haven't done
> performance testing & optimizations beyond 256GB yet.  All our clusters are
> un-federated. Funny how the feature was developed in Yahoo! and ended up
> not being used here. :)  We have a cluster with about 180PB of provisioned
> space. Many clusters are using over 100PB in their steady state.  We don't
> run datanodes too dense, so can't tell what the per-datanode limit is.
>
> Thanks and 73
> Kihwal
>
> On Thu, Jun 13, 2019 at 1:57 PM Wei-Chiu Chuang <we...@apache.org>
> wrote:
>
>> Hi community,
>>
>> I am currently drafting a HDFS scalability guideline doc, and I'd like to
>> understand any data points regarding HDFS scalability limit. I'd like to
>> share it publicly eventually.
>>
>> As an example, through my workplace, and through community chatters, I am
>> aware that HDFS is capable of operating at the following scale:
>>
>> Number of DataNodes:
>> Unfederated: I can reasonably believe a single HDFS NameNode can manage up
>> to 4000 DataNodes. Is there any one who would like to share an even larger
>> cluster?
>>
>> Federated: I am aware of one federated HDFS cluster composed of 20,000
>> DataNodes. JD.com
>> <
>> https://conferences.oreilly.com/strata/strata-eu-2018/public/schedule/detail/64692
>> >
>> has a 15K DN cluster and 210PB total capacity. I suspect it's a federated
>> HDFS cluster.
>>
>> Number of blocks & files:
>> 500 million files&blocks seems to be the upper limit at this point. At
>> this
>> scale NameNode consumes around 200GB heap, and my experience told me any
>> number beyond 200GB is unstable. But at some point I recalled some one
>> mentioned a 400GB NN heap.
>>
>> Amount of Data:
>> I am aware a few clusters more than 100PB in size (federated or not) --
>> Uber, Yahoo Japan, JD.com.
>>
>> Number of Volumes in a DataNode:
>> DataNodes with 24 volumes is known to work reasonably well. If DataNode is
>> used for archival use cases, a DN can have up to 48 volumes. This is
>> certainly hardware dependent, but if I know where the current limit is, I
>> can start optimizing the software.
>>
>> Total disk space:
>> CDH
>> <
>> https://www.cloudera.com/documentation/enterprise/release-notes/topics/hardware_requirements_guide.html#concept_fzz_dq4_gbb
>> >
>> recommends no more than 100TB per DataNode. Are there successful
>> deployments that install more than this number? Of course, you can easily
>> exceed this number if it is used purely for data archival.
>>
>>
>> What are other scalability limits that people are interested?
>>
>> Best,
>> Wei-Chiu
>>
>

Re: HDFS Scalability Limit?

Posted by Kihwal Lee <ki...@verizonmedia.com.INVALID>.

Hi Wei-Chiu,

We have experience with 5,000 - 6,000 node clusters.  Although it ran/runs
fine, any heavy hitter activities such as decommissioning needed to be
carefully planned.   In terms of files and blocks, we have multiple
clusters running stable with over 500M files and blocks.  Some at over 800M
with the max heap at 256GB. It can probably go higher, but we haven't done
performance testing & optimizations beyond 256GB yet.  All our clusters are
un-federated. Funny how the feature was developed in Yahoo! and ended up
not being used here. :)  We have a cluster with about 180PB of provisioned
space. Many clusters are using over 100PB in their steady state.  We don't
run datanodes too dense, so can't tell what the per-datanode limit is.

Thanks and 73
Kihwal

On Thu, Jun 13, 2019 at 1:57 PM Wei-Chiu Chuang <we...@apache.org> wrote:

> Hi community,
>
> I am currently drafting a HDFS scalability guideline doc, and I'd like to
> understand any data points regarding HDFS scalability limit. I'd like to
> share it publicly eventually.
>
> As an example, through my workplace, and through community chatters, I am
> aware that HDFS is capable of operating at the following scale:
>
> Number of DataNodes:
> Unfederated: I can reasonably believe a single HDFS NameNode can manage up
> to 4000 DataNodes. Is there any one who would like to share an even larger
> cluster?
>
> Federated: I am aware of one federated HDFS cluster composed of 20,000
> DataNodes. JD.com
> <
> https://conferences.oreilly.com/strata/strata-eu-2018/public/schedule/detail/64692
> >
> has a 15K DN cluster and 210PB total capacity. I suspect it's a federated
> HDFS cluster.
>
> Number of blocks & files:
> 500 million files&blocks seems to be the upper limit at this point. At this
> scale NameNode consumes around 200GB heap, and my experience told me any
> number beyond 200GB is unstable. But at some point I recalled some one
> mentioned a 400GB NN heap.
>
> Amount of Data:
> I am aware a few clusters more than 100PB in size (federated or not) --
> Uber, Yahoo Japan, JD.com.
>
> Number of Volumes in a DataNode:
> DataNodes with 24 volumes is known to work reasonably well. If DataNode is
> used for archival use cases, a DN can have up to 48 volumes. This is
> certainly hardware dependent, but if I know where the current limit is, I
> can start optimizing the software.
>
> Total disk space:
> CDH
> <
> https://www.cloudera.com/documentation/enterprise/release-notes/topics/hardware_requirements_guide.html#concept_fzz_dq4_gbb
> >
> recommends no more than 100TB per DataNode. Are there successful
> deployments that install more than this number? Of course, you can easily
> exceed this number if it is used purely for data archival.
>
>
> What are other scalability limits that people are interested?
>
> Best,
> Wei-Chiu
>

Re: HDFS Scalability Limit?

Posted by Kihwal Lee <ki...@verizonmedia.com.INVALID>.

Hi Wei-Chiu,

We have experience with 5,000 - 6,000 node clusters.  Although it ran/runs
fine, any heavy hitter activities such as decommissioning needed to be
carefully planned.   In terms of files and blocks, we have multiple
clusters running stable with over 500M files and blocks.  Some at over 800M
with the max heap at 256GB. It can probably go higher, but we haven't done
performance testing & optimizations beyond 256GB yet.  All our clusters are
un-federated. Funny how the feature was developed in Yahoo! and ended up
not being used here. :)  We have a cluster with about 180PB of provisioned
space. Many clusters are using over 100PB in their steady state.  We don't
run datanodes too dense, so can't tell what the per-datanode limit is.

Thanks and 73
Kihwal

On Thu, Jun 13, 2019 at 1:57 PM Wei-Chiu Chuang <we...@apache.org> wrote:

> Hi community,
>
> I am currently drafting a HDFS scalability guideline doc, and I'd like to
> understand any data points regarding HDFS scalability limit. I'd like to
> share it publicly eventually.
>
> As an example, through my workplace, and through community chatters, I am
> aware that HDFS is capable of operating at the following scale:
>
> Number of DataNodes:
> Unfederated: I can reasonably believe a single HDFS NameNode can manage up
> to 4000 DataNodes. Is there any one who would like to share an even larger
> cluster?
>
> Federated: I am aware of one federated HDFS cluster composed of 20,000
> DataNodes. JD.com
> <
> https://conferences.oreilly.com/strata/strata-eu-2018/public/schedule/detail/64692
> >
> has a 15K DN cluster and 210PB total capacity. I suspect it's a federated
> HDFS cluster.
>
> Number of blocks & files:
> 500 million files&blocks seems to be the upper limit at this point. At this
> scale NameNode consumes around 200GB heap, and my experience told me any
> number beyond 200GB is unstable. But at some point I recalled some one
> mentioned a 400GB NN heap.
>
> Amount of Data:
> I am aware a few clusters more than 100PB in size (federated or not) --
> Uber, Yahoo Japan, JD.com.
>
> Number of Volumes in a DataNode:
> DataNodes with 24 volumes is known to work reasonably well. If DataNode is
> used for archival use cases, a DN can have up to 48 volumes. This is
> certainly hardware dependent, but if I know where the current limit is, I
> can start optimizing the software.
>
> Total disk space:
> CDH
> <
> https://www.cloudera.com/documentation/enterprise/release-notes/topics/hardware_requirements_guide.html#concept_fzz_dq4_gbb
> >
> recommends no more than 100TB per DataNode. Are there successful
> deployments that install more than this number? Of course, you can easily
> exceed this number if it is used purely for data archival.
>
>
> What are other scalability limits that people are interested?
>
> Best,
> Wei-Chiu
>

Re: HDFS Scalability Limit?

Posted by Kihwal Lee <ki...@verizonmedia.com.INVALID>.

Hi Wei-Chiu,

We have experience with 5,000 - 6,000 node clusters.  Although it ran/runs
fine, any heavy hitter activities such as decommissioning needed to be
carefully planned.   In terms of files and blocks, we have multiple
clusters running stable with over 500M files and blocks.  Some at over 800M
with the max heap at 256GB. It can probably go higher, but we haven't done
performance testing & optimizations beyond 256GB yet.  All our clusters are
un-federated. Funny how the feature was developed in Yahoo! and ended up
not being used here. :)  We have a cluster with about 180PB of provisioned
space. Many clusters are using over 100PB in their steady state.  We don't
run datanodes too dense, so can't tell what the per-datanode limit is.

Thanks and 73
Kihwal

On Thu, Jun 13, 2019 at 1:57 PM Wei-Chiu Chuang <we...@apache.org> wrote:

> Hi community,
>
> I am currently drafting a HDFS scalability guideline doc, and I'd like to
> understand any data points regarding HDFS scalability limit. I'd like to
> share it publicly eventually.
>
> As an example, through my workplace, and through community chatters, I am
> aware that HDFS is capable of operating at the following scale:
>
> Number of DataNodes:
> Unfederated: I can reasonably believe a single HDFS NameNode can manage up
> to 4000 DataNodes. Is there any one who would like to share an even larger
> cluster?
>
> Federated: I am aware of one federated HDFS cluster composed of 20,000
> DataNodes. JD.com
> <
> https://conferences.oreilly.com/strata/strata-eu-2018/public/schedule/detail/64692
> >
> has a 15K DN cluster and 210PB total capacity. I suspect it's a federated
> HDFS cluster.
>
> Number of blocks & files:
> 500 million files&blocks seems to be the upper limit at this point. At this
> scale NameNode consumes around 200GB heap, and my experience told me any
> number beyond 200GB is unstable. But at some point I recalled some one
> mentioned a 400GB NN heap.
>
> Amount of Data:
> I am aware a few clusters more than 100PB in size (federated or not) --
> Uber, Yahoo Japan, JD.com.
>
> Number of Volumes in a DataNode:
> DataNodes with 24 volumes is known to work reasonably well. If DataNode is
> used for archival use cases, a DN can have up to 48 volumes. This is
> certainly hardware dependent, but if I know where the current limit is, I
> can start optimizing the software.
>
> Total disk space:
> CDH
> <
> https://www.cloudera.com/documentation/enterprise/release-notes/topics/hardware_requirements_guide.html#concept_fzz_dq4_gbb
> >
> recommends no more than 100TB per DataNode. Are there successful
> deployments that install more than this number? Of course, you can easily
> exceed this number if it is used purely for data archival.
>
>
> What are other scalability limits that people are interested?
>
> Best,
> Wei-Chiu
>