You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-dev@hadoop.apache.org by ZanderXu <za...@apache.org> on 2024/04/24 06:54:20 UTC

Fwd: Discussion about NameNode Fine-grained locking

Hi everyone

All subtasks of the first phase of the FGL have been completed and I plan
to merge them into the trunk and start the second phase based on the trunk.

Here is the PR that used to merge the first phases into trunk:
https://github.com/apache/hadoop/pull/6762
Here is the ticket: https://issues.apache.org/jira/browse/HDFS-17384

I hope you can help to review this PR when you are available and give some
ideas.


HDFS-17385 <https://issues.apache.org/jira/browse/HDFS-17385> is used for
the second phase and I have created some subtasks to describe solutions for
some problems, such as: snapshot, getListing, quota.
You are welcome to join us to complete it together.


---------- Forwarded message ---------
From: Zengqiang XU <za...@apache.org>
Date: Fri, 2 Feb 2024 at 11:07
Subject: Discussion about NameNode Fine-grained locking
To: <hd...@hadoop.apache.org>
Cc: Zengqiang XU <xu...@gmail.com>


Hi everyone

I have started a discussion about NameNode Fine-grained Locking to improve
performance of write operations in NameNode.

I started this discussion again for serval main reasons:
1. We have implemented it and gained nearly 7x performance improvement in
our prod environment
2. Many other companies made similar improvements based on their internal
branch.
3. This topic has been discussed for a long time, but still without any
results.

I hope we can push this important improvement in the community so that all
end-users can enjoy this significant improvement.

I'd really appreciate you can join in and work with me to push this feature
forward.

Thanks very much.

Ticket: HDFS-17366 <https://issues.apache.org/jira/browse/HDFS-17366>
Design: NameNode Fine-grained locking based on directory tree
<https://docs.google.com/document/d/1X499gHxT0WSU1fj8uo4RuF3GqKxWkWXznXx4tspTBLY/edit?usp=sharing>

Re: Discussion about NameNode Fine-grained locking

Posted by Hui Fei <fe...@gmail.com>.

BTW, there is a Slack channel hdfs-fgl for this feature. can join it and
discuss more details.

Is it necessary to hold a meeting to discuss this? So that we can push it
forward quickly. Agreed with ZanderXu, it seems inefficient to discuss
details via email list.


Hui Fei <fe...@gmail.com> 于2024年5月6日周一 23:50写道：

> Thanks all
>
> Seems all concerns are related to the stage 2. We can address these and
> make it more clear before we start it.
>
> From development experience, I think it is reasonable to split the big
> feature into several stages. And stage 1 is also independent and it also
> can be as a minor feature that uses fs and bm locks instead of the global
> lock.
>
>
> ZanderXu <za...@apache.org> 于2024年4月29日周一 15:17写道：
>
>> Thanks @Ayush Saxena <ay...@gmail.com> and @Xiaoqiao He
>> <he...@apache.org> for your nice questions.
>>
>> Let me summarize your concerns and corresponding solutions:
>>
>> *1. Questions about the Snapshot feature*
>> It's difficult to apply the FGL to Snapshot feature,  but we can just
>> using
>> the global FS write lock to make it thread safe.
>> So if we can identity if a path contains the snapshot feature, we can just
>> using the global FS write lock to protect it.
>>
>> You can refer to HDFS-17479
>> <https://issues.apache.org/jira/browse/HDFS-17479> to get how to identify
>> it.
>>
>> Regarding performance of the operations related to the snapshot features,
>> we can discuss it in two categories:
>> Read operations involves snapshots:
>> The FGL branch uses the global write lock to protect them, the GLOBAL
>> branch uses the global read lock to protect them. It's hard to conclude
>> which version has better performance, it depends on the global lock
>> competition.
>>
>> Write operations involves snapshots:
>> Both FGL and GLOBAL branch use the global write lock to protect them. It's
>> hard to conclude which version has better performance, it depends on the
>> global lock competition too.
>>
>> So I think if namenode load is low, the GLOBAL branch will have a better
>> performance than FGL; If namenode load is high, the FGL branch may have a
>> better performance than the GLOBAL, which also depends on the ratio of
>> read
>> and write operations on the SNAPSHOT feature.
>>
>> We can do somethings to let end-user to choose a branch with a better
>> branch according to their business:
>> First, we need to make the lock mode can be selectable, so that end-user
>> can choose to use FGL of GLOBAL.
>> Second, using the global write lock to make operations related to snapshot
>> thread safe as I described in HDFS-17479.
>>
>>
>> *2. Questions about the Symlinks feature*
>> If Symlink is related to snapshot, we can refer to the solution of the
>> snapshot;  If Symlink is not related to snapshot, I think it's easy to
>> meet
>> the FGL.
>> Only createSymlink involves two paths, FGL just need to lock them in the
>> order to make this operation thread. For other operations, it is the same
>> as other normal iNode, right?
>>
>> If I missed difficult points, please let me know.
>>
>>
>> *3. Questions about Memory Usage of iNode locks*
>> I think there are too many solutions to limit the memory usage of these
>> iNode locks, such as: Using a limit capacity lock pool to ensure the
>> maximum memory usage,  Just holding iNode locks for fixed depth of
>> directories, etc.
>>
>> We can just abstract this LockManager first and then support its
>> implementation with different ideas, so that we can limit the maximum
>> memory usage of these iNode locks.
>> FGL can acquire or lease iNode locks through LockManager.
>>
>>
>> *4. Questions about Performance of acquiring and releasing iNode locks*
>> We can add some benchmark for LockManager, to test the performance or
>> acquire and release unblocked locks.
>>
>>
>> *5. Questions about StoragePolicy, ECPolicy, ACL, Quota, etc.*
>> These policies may be sot on an ancestor node and used by some children
>> files.  The set operation for these policies will be protected by the
>> directory tree, since there are all file-related operations.  In addition
>> to Quota and StoragePolicy, the use of other policies will also be
>> protected by directory tree, such as ECPolicy and ACL.
>>
>> Quota is a little special since its update operations may not be protected
>> by the directory tree, we can assign a locks to each QuotaFeature and use
>> these locks to make updating operations thread safe. you can refer to
>> HDFS-17473 <https://issues.apache.org/jira/browse/HDFS-17473> to get some
>> detailed information.
>>
>> StoragePolicy is a little special since it is used not only by
>> file-related
>> operations but also block-related operations.  ProcessExtraRedundancyBlock
>> uses storage policy to choose redundancy replicas and
>> BlockReconstructionWork uses storage policy to choose target DNs. In order
>> to maximize the performance improvement, BR and IBR should only involve
>> the
>> iNodeFile to which the current processing block belongs. These redundancy
>> blocks can be processed by the Redundancy monitor while holding the
>> directory tree locks. You can refer to HDFS-17505
>> <https://issues.apache.org/jira/browse/HDFS-17505> to get more detailed
>> informations.
>>
>> *6. Performance of the phase 1*
>> HDFS-17506 <https://issues.apache.org/jira/browse/HDFS-17506> is used to
>> do
>> some performance testing for phase 1, and I will complete it later.
>>
>>
>> Discuss solution through mails is not efficient, you can create one
>> sub-tasks under HDFS-17366
>> <https://issues.apache.org/jira/browse/HDFS-17366> to describe your
>> concerns and I will try to give some answers.
>>
>> Thanks @Ayush Saxena <ay...@gmail.com>  and @Xiaoqiao He
>> <he...@apache.org> again.
>>
>>
>>
>> On Mon, 29 Apr 2024 at 02:00, Ayush Saxena <ay...@gmail.com> wrote:
>>
>> > Thanx Everyone for chasing this, Great to see some momentum around FGL,
>> > that should be a great improvement.
>> >
>> > I have some two broad categories:
>> > ** About the process:*
>> > I think in the above mails, there are mentions that phase one is
>> complete
>> > in a feature branch & we are gonna merge that to trunk. If I am
>> catching it
>> > right, then you can't hit the merge button like that. To merge a feature
>> > branch. You need to call for a Vote specific to that branch & it
>> requires 3
>> > binding votes to merge, unlike any other code change which requires 1.
>> It
>> > is there in our Bylaws.
>> >
>> > So, do follow the process.
>> >
>> > ** About the feature itself:* (A very quick look at the doc and the
>> Jira,
>> > so please take it with a grain of salt)
>> > * The Google Drive link that you folks shared as part of the first
>> mail. I
>> > don't have access to that. So, please open up the permissions for that
>> doc
>> > or share the new link
>> > * Chasing the design doc present on the Jira
>> > * I think we only have Phase-1 ready, so can you share some metrics just
>> > for that? Perf improvements just with splitting the FS & BM Locks
>> > * The memory implications of Phase-1? I don't think there should be any
>> > major impact on the memory in case of just phase-1
>> > * Regarding the snapshot stuff, you mentioned taking lock on the root
>> > itself? Does just taking lock on the snapshot root rather than the FS
>> root
>> > works?
>> > * Secondly about the usage of Snapshot or Symlinks, I don't think we
>> > should operate under the assumptions that they aren't widely used or
>> not,
>> > we might just not know folks who don't use it widely or they are just
>> users
>> > not the ones contributing. We can just accept for now, that in those
>> cases
>> > it isn't optimised and we just lock the entire FS space, which it does
>> even
>> > today, so no regressions there.
>> > * Regarding memory usage: Do you have some numbers on how much the
>> memory
>> > footprint increases?
>> > * Under the Lock Pool: I think you are assuming there would be very few
>> > inodes where lock would be required at any given time, so there won't be
>> > too much heap consumption? I think you are compromising on the
>> Horizontal
>> > Scalability here. I doubt if your assumption doesn't hold true, under
>> heavy
>> > read load by concurrent clients accessing different inodes, the Namenode
>> > will start giving memory troubles, that would do more harm than good.
>> > Anyway Namenode heap is way bigger problem than anything, so we should
>> be
>> > very careful increasing load over there.
>> > * For the Locks on the inodes: Do you plan to have locs for each inode?
>> > Can we somehow limit that to the depth of the tree? Like currently we
>> take
>> > lock on the root, have a config which makes us take lock at Level-2 or 3
>> > (configurable), that might fetch some perf benefits and can be used to
>> > control the memory usage as well?
>> > * What is the cost of creating these inode locks? If the lock isn't
>> > already cached it would incur some cost? Do you have some numbers around
>> > that? Say I disable caching altogether & then let a test load run, what
>> > does the perf numbers look like in that case
>> > * I think we need to limit the size of INodeLockPool, we can't let it
>> grow
>> > infinitely in case of heavy loads and we need to have some auto
>> > throttling mechanism for it
>> > * I didn't catch your Storage Policy problem. If I decode it right, the
>> > problem is like the policy could be set on an ancestor node & the
>> children
>> > abide by that & this is the problem, if that is the case then isn't that
>> > the case with ErasureCoding policies or even ACLs or so? Can you
>> elaborate
>> > a bit on that.
>> >
>> >
>> > Anyway, regarding the Phase-1. If you share (the perf numbers with
>> proper
>> > details + Impact on memory if any) for just phase 1 & if they are good,
>> > then if you call for a branch merge vote for Phase-1 FGL, you have my
>> vote,
>> > however you'll need to sway the rest of the folks on your own :-)
>> >
>> > Good Luck, Nice Work Guys!!!
>> >
>> > -Ayush
>> >
>> >
>> > On Sun, 28 Apr 2024 at 18:32, Xiaoqiao He <he...@apache.org>
>> wrote:
>> >
>> >> Thanks ZanderXu and Hui Fei for your work on this feature. It will be
>> >> a very helpful improvement for the HDFS module in the next journal.
>> >>
>> >> 1. If we need any more review bandwidth, I would like to be involved
>> >> to help review if possible.
>> >> 2. From the design document there are still missing some detailed
>> >> descriptions such as snapshot, symbolic link and reserved etc as
>> mentioned
>> >> above. I think it will be helpful for newbies who want to be involved
>> >> if all corner
>> >> cases are considered and described.
>> >> 3. From slack, we plan to check into the trunk at this phase. I am not
>> >> sure
>> >> If it is the proper time, following the dev plan there are two steps
>> left
>> >> to
>> >> finish this feature from the design document, right? If that, I think
>> we
>> >> should
>> >> postpone checking in when all plans are ready. Considering that there
>> are
>> >> many unfinished tries for this feature in history, I think postpone
>> >> checking
>> >> will be the safe way, another way it will involve more rebase cost if
>> you
>> >> keep
>> >> separate dev branch, however I think It is not one difficult thing for
>> >> you.
>> >>
>> >> Good luck and look forward to making that happen soon!
>> >>
>> >> Best Regards,
>> >> - He Xiaoqiao
>> >>
>> >> On Fri, Apr 26, 2024 at 3:50 PM Hui Fei <fe...@gmail.com> wrote:
>> >> >
>> >> > Thanks for interest and advice on this.
>> >> >
>> >> > Just would like to share some info here
>> >> >
>> >> > ZanderXu leads this feature and he has spent a lot of time on it. He
>> is
>> >> the main developer in stage 1.  Yuanboliu and Kokonguyen191 also took
>> some
>> >> tasks. Other developers (slfan1989 haiyang1987 huangzhaobo99 RocMarshal
>> >> kokonguyen191) helped review PRs. (Forgive me if I missed someone)
>> >> >
>> >> > Actually haiyang1987, Yuanboliu and Kokonguyen191 are also very
>> >> familiar with this feature. We discussed many details offline.
>> >> >
>> >> > Welcome to more people interested in joining the development and
>> review
>> >> of the stage 2 and 3.
>> >> >
>> >> >
>> >> > Zengqiang XU <xu...@gmail.com> 于2024年4月26日周五 14:56写道：
>> >> >>
>> >> >> Thanks Shilun for your response:
>> >> >>
>> >> >> 1. This is a big and very useful feature, so it really needs more
>> >> >> developers to get on board.
>> >> >> 2. This fine grained lock has been implemented based on internal
>> >> branches
>> >> >> and has gained benefits by many companies, such as: Meituan,
>> Kuaishou,
>> >> >> Bytedance, etc.  But it has not been contributed to the community
>> due
>> >> to
>> >> >> various reasons, such as there is a big difference between the
>> version
>> >> of
>> >> >> the internal branch and the community trunk branch, the internal
>> >> branch may
>> >> >> ignore some functions to make FGL clear, and the contribution needs
>> a
>> >> lot
>> >> >> of work and will take many times. It means that this solution has
>> >> already
>> >> >> been practiced in their prod environment. We have also practiced it
>> in
>> >> our
>> >> >> prod environment and gained benefits, and we are also willing to
>> spend
>> >> a
>> >> >> lot of time contributing to the community.
>> >> >> 3. Regarding the benchmark testing, we don't need to pay more
>> >> attention to
>> >> >> whether the performance is improved by 5 times, 10 times or 20
>> times,
>> >> >> because there are too many factors that affect it.
>> >> >> 4. As I described above, this solution is already  being practiced
>> by
>> >> many
>> >> >> companies. Right now, we just need to think about how to implement
>> it
>> >> with
>> >> >> high quality and more comprehensively.
>> >> >> 5. I firmly believe that all problems can be solved as long as the
>> >> overall
>> >> >> solution is right.
>> >> >> 6. I can spend a lot of time leading the promotion of this entire
>> >> feature
>> >> >> and I hope more people can join us in promoting it.
>> >> >> 7. You are always welcome to raise your concerns.
>> >> >>
>> >> >>
>> >> >> Thanks Shilun again, I hope you can help review designs and PRs.
>> Thanks
>> >> >>
>> >> >> On Fri, 26 Apr 2024 at 08:00, slfan1989 <sl...@apache.org>
>> wrote:
>> >> >>
>> >> >> > Thank you for your hard work! This is a very meaningful
>> improvement,
>> >> and
>> >> >> > from the design document, we can see a significant increase in
>> HDFS
>> >> >> > read/write throughput.
>> >> >> >
>> >> >> > I am happy to see the progress made on HDFS-17384.
>> >> >> >
>> >> >> > However, I still have some concerns, which roughly involve the
>> >> following
>> >> >> > aspects:
>> >> >> >
>> >> >> > 1. While ZanderXu and Hui Fei have deep expertise in HDFS and are
>> >> familiar
>> >> >> > with related development details, we still need more community
>> >> member to
>> >> >> > review the code to ensure that the relevant upgrades meet
>> >> expectations.
>> >> >> >
>> >> >> > 2. We need more details on benchmarks to ensure that test results
>> >> can be
>> >> >> > reproduced and to allow more community member to participate in
>> the
>> >> testing
>> >> >> > process.
>> >> >> >
>> >> >> > Looking forward to everything going smoothly in the future.
>> >> >> >
>> >> >> > Best Regards,
>> >> >> > - Shilun Fan.
>> >> >> >
>> >> >> > On Wed, Apr 24, 2024 at 3:51 PM Xiaoqiao He <
>> hexiaoqiao@apache.org>
>> >> wrote:
>> >> >> >
>> >> >> >> cc private@h.a.o.
>> >> >> >>
>> >> >> >> On Wed, Apr 24, 2024 at 3:35 PM ZanderXu <za...@apache.org>
>> >> wrote:
>> >> >> >> >
>> >> >> >> > Here are some summaries about the first phase:
>> >> >> >> > 1. There are no big changes in this phase
>> >> >> >> > 2. This phase just uses FS lock and BM lock to replace the
>> >> original
>> >> >> >> global
>> >> >> >> > lock
>> >> >> >> > 3. It's useful to improve the performance, since some
>> operations
>> >> just
>> >> >> >> need
>> >> >> >> > to hold FS lock or BM lock instead of the global lock
>> >> >> >> > 4. This feature is turned off by default, you can enable it by
>> >> setting
>> >> >> >> > dfs.namenode.lock.model.provider.class to
>> >> >> >> >
>> >> org.apache.hadoop.hdfs.server.namenode.fgl.FineGrainedFSNamesystemLock
>> >> >> >> > 5. This phase is very import for the ongoing development of the
>> >> entire
>> >> >> >> FGL
>> >> >> >> >
>> >> >> >> > Here I would like to express my special thanks to
>> @kokonguyen191
>> >> and
>> >> >> >> > @yuanboliu for their contributions.  And you are also welcome
>> to
>> >> join us
>> >> >> >> > and complete it together.
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > On Wed, 24 Apr 2024 at 14:54, ZanderXu <za...@apache.org>
>> >> wrote:
>> >> >> >> >
>> >> >> >> > > Hi everyone
>> >> >> >> > >
>> >> >> >> > > All subtasks of the first phase of the FGL have been
>> completed
>> >> and I
>> >> >> >> plan
>> >> >> >> > > to merge them into the trunk and start the second phase based
>> >> on the
>> >> >> >> trunk.
>> >> >> >> > >
>> >> >> >> > > Here is the PR that used to merge the first phases into
>> trunk:
>> >> >> >> > > https://github.com/apache/hadoop/pull/6762
>> >> >> >> > > Here is the ticket:
>> >> https://issues.apache.org/jira/browse/HDFS-17384
>> >> >> >> > >
>> >> >> >> > > I hope you can help to review this PR when you are available
>> >> and give
>> >> >> >> some
>> >> >> >> > > ideas.
>> >> >> >> > >
>> >> >> >> > >
>> >> >> >> > > HDFS-17385 <https://issues.apache.org/jira/browse/HDFS-17385
>> >
>> >> is
>> >> >> >> used for
>> >> >> >> > > the second phase and I have created some subtasks to describe
>> >> >> >> solutions for
>> >> >> >> > > some problems, such as: snapshot, getListing, quota.
>> >> >> >> > > You are welcome to join us to complete it together.
>> >> >> >> > >
>> >> >> >> > >
>> >> >> >> > > ---------- Forwarded message ---------
>> >> >> >> > > From: Zengqiang XU <za...@apache.org>
>> >> >> >> > > Date: Fri, 2 Feb 2024 at 11:07
>> >> >> >> > > Subject: Discussion about NameNode Fine-grained locking
>> >> >> >> > > To: <hd...@hadoop.apache.org>
>> >> >> >> > > Cc: Zengqiang XU <xu...@gmail.com>
>> >> >> >> > >
>> >> >> >> > >
>> >> >> >> > > Hi everyone
>> >> >> >> > >
>> >> >> >> > > I have started a discussion about NameNode Fine-grained
>> Locking
>> >> to
>> >> >> >> improve
>> >> >> >> > > performance of write operations in NameNode.
>> >> >> >> > >
>> >> >> >> > > I started this discussion again for serval main reasons:
>> >> >> >> > > 1. We have implemented it and gained nearly 7x performance
>> >> >> >> improvement in
>> >> >> >> > > our prod environment
>> >> >> >> > > 2. Many other companies made similar improvements based on
>> their
>> >> >> >> internal
>> >> >> >> > > branch.
>> >> >> >> > > 3. This topic has been discussed for a long time, but still
>> >> without
>> >> >> >> any
>> >> >> >> > > results.
>> >> >> >> > >
>> >> >> >> > > I hope we can push this important improvement in the
>> community
>> >> so
>> >> >> >> that all
>> >> >> >> > > end-users can enjoy this significant improvement.
>> >> >> >> > >
>> >> >> >> > > I'd really appreciate you can join in and work with me to
>> push
>> >> this
>> >> >> >> > > feature forward.
>> >> >> >> > >
>> >> >> >> > > Thanks very much.
>> >> >> >> > >
>> >> >> >> > > Ticket: HDFS-17366 <
>> >> https://issues.apache.org/jira/browse/HDFS-17366>
>> >> >> >> > > Design: NameNode Fine-grained locking based on directory tree
>> >> >> >> > > <
>> >> >> >>
>> >>
>> https://docs.google.com/document/d/1X499gHxT0WSU1fj8uo4RuF3GqKxWkWXznXx4tspTBLY/edit?usp=sharing
>> >> >> >> >
>> >> >> >> > >
>> >> >> >>
>> >> >> >>
>> >> ---------------------------------------------------------------------
>> >> >> >> To unsubscribe, e-mail: private-unsubscribe@hadoop.apache.org
>> >> >> >> For additional commands, e-mail: private-help@hadoop.apache.org
>> >> >> >>
>> >> >> >>
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
>> >> For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
>> >>
>> >>
>>
>

Re: Discussion about NameNode Fine-grained locking

Posted by Hui Fei <fe...@gmail.com>.

Thanks all

Seems all concerns are related to the stage 2. We can address these and
make it more clear before we start it.

From development experience, I think it is reasonable to split the big
feature into several stages. And stage 1 is also independent and it also
can be as a minor feature that uses fs and bm locks instead of the global
lock.


ZanderXu <za...@apache.org> 于2024年4月29日周一 15:17写道：

> Thanks @Ayush Saxena <ay...@gmail.com> and @Xiaoqiao He
> <he...@apache.org> for your nice questions.
>
> Let me summarize your concerns and corresponding solutions:
>
> *1. Questions about the Snapshot feature*
> It's difficult to apply the FGL to Snapshot feature,  but we can just using
> the global FS write lock to make it thread safe.
> So if we can identity if a path contains the snapshot feature, we can just
> using the global FS write lock to protect it.
>
> You can refer to HDFS-17479
> <https://issues.apache.org/jira/browse/HDFS-17479> to get how to identify
> it.
>
> Regarding performance of the operations related to the snapshot features,
> we can discuss it in two categories:
> Read operations involves snapshots:
> The FGL branch uses the global write lock to protect them, the GLOBAL
> branch uses the global read lock to protect them. It's hard to conclude
> which version has better performance, it depends on the global lock
> competition.
>
> Write operations involves snapshots:
> Both FGL and GLOBAL branch use the global write lock to protect them. It's
> hard to conclude which version has better performance, it depends on the
> global lock competition too.
>
> So I think if namenode load is low, the GLOBAL branch will have a better
> performance than FGL; If namenode load is high, the FGL branch may have a
> better performance than the GLOBAL, which also depends on the ratio of read
> and write operations on the SNAPSHOT feature.
>
> We can do somethings to let end-user to choose a branch with a better
> branch according to their business:
> First, we need to make the lock mode can be selectable, so that end-user
> can choose to use FGL of GLOBAL.
> Second, using the global write lock to make operations related to snapshot
> thread safe as I described in HDFS-17479.
>
>
> *2. Questions about the Symlinks feature*
> If Symlink is related to snapshot, we can refer to the solution of the
> snapshot;  If Symlink is not related to snapshot, I think it's easy to meet
> the FGL.
> Only createSymlink involves two paths, FGL just need to lock them in the
> order to make this operation thread. For other operations, it is the same
> as other normal iNode, right?
>
> If I missed difficult points, please let me know.
>
>
> *3. Questions about Memory Usage of iNode locks*
> I think there are too many solutions to limit the memory usage of these
> iNode locks, such as: Using a limit capacity lock pool to ensure the
> maximum memory usage,  Just holding iNode locks for fixed depth of
> directories, etc.
>
> We can just abstract this LockManager first and then support its
> implementation with different ideas, so that we can limit the maximum
> memory usage of these iNode locks.
> FGL can acquire or lease iNode locks through LockManager.
>
>
> *4. Questions about Performance of acquiring and releasing iNode locks*
> We can add some benchmark for LockManager, to test the performance or
> acquire and release unblocked locks.
>
>
> *5. Questions about StoragePolicy, ECPolicy, ACL, Quota, etc.*
> These policies may be sot on an ancestor node and used by some children
> files.  The set operation for these policies will be protected by the
> directory tree, since there are all file-related operations.  In addition
> to Quota and StoragePolicy, the use of other policies will also be
> protected by directory tree, such as ECPolicy and ACL.
>
> Quota is a little special since its update operations may not be protected
> by the directory tree, we can assign a locks to each QuotaFeature and use
> these locks to make updating operations thread safe. you can refer to
> HDFS-17473 <https://issues.apache.org/jira/browse/HDFS-17473> to get some
> detailed information.
>
> StoragePolicy is a little special since it is used not only by file-related
> operations but also block-related operations.  ProcessExtraRedundancyBlock
> uses storage policy to choose redundancy replicas and
> BlockReconstructionWork uses storage policy to choose target DNs. In order
> to maximize the performance improvement, BR and IBR should only involve the
> iNodeFile to which the current processing block belongs. These redundancy
> blocks can be processed by the Redundancy monitor while holding the
> directory tree locks. You can refer to HDFS-17505
> <https://issues.apache.org/jira/browse/HDFS-17505> to get more detailed
> informations.
>
> *6. Performance of the phase 1*
> HDFS-17506 <https://issues.apache.org/jira/browse/HDFS-17506> is used to
> do
> some performance testing for phase 1, and I will complete it later.
>
>
> Discuss solution through mails is not efficient, you can create one
> sub-tasks under HDFS-17366
> <https://issues.apache.org/jira/browse/HDFS-17366> to describe your
> concerns and I will try to give some answers.
>
> Thanks @Ayush Saxena <ay...@gmail.com>  and @Xiaoqiao He
> <he...@apache.org> again.
>
>
>
> On Mon, 29 Apr 2024 at 02:00, Ayush Saxena <ay...@gmail.com> wrote:
>
> > Thanx Everyone for chasing this, Great to see some momentum around FGL,
> > that should be a great improvement.
> >
> > I have some two broad categories:
> > ** About the process:*
> > I think in the above mails, there are mentions that phase one is complete
> > in a feature branch & we are gonna merge that to trunk. If I am catching
> it
> > right, then you can't hit the merge button like that. To merge a feature
> > branch. You need to call for a Vote specific to that branch & it
> requires 3
> > binding votes to merge, unlike any other code change which requires 1. It
> > is there in our Bylaws.
> >
> > So, do follow the process.
> >
> > ** About the feature itself:* (A very quick look at the doc and the Jira,
> > so please take it with a grain of salt)
> > * The Google Drive link that you folks shared as part of the first mail.
> I
> > don't have access to that. So, please open up the permissions for that
> doc
> > or share the new link
> > * Chasing the design doc present on the Jira
> > * I think we only have Phase-1 ready, so can you share some metrics just
> > for that? Perf improvements just with splitting the FS & BM Locks
> > * The memory implications of Phase-1? I don't think there should be any
> > major impact on the memory in case of just phase-1
> > * Regarding the snapshot stuff, you mentioned taking lock on the root
> > itself? Does just taking lock on the snapshot root rather than the FS
> root
> > works?
> > * Secondly about the usage of Snapshot or Symlinks, I don't think we
> > should operate under the assumptions that they aren't widely used or not,
> > we might just not know folks who don't use it widely or they are just
> users
> > not the ones contributing. We can just accept for now, that in those
> cases
> > it isn't optimised and we just lock the entire FS space, which it does
> even
> > today, so no regressions there.
> > * Regarding memory usage: Do you have some numbers on how much the memory
> > footprint increases?
> > * Under the Lock Pool: I think you are assuming there would be very few
> > inodes where lock would be required at any given time, so there won't be
> > too much heap consumption? I think you are compromising on the Horizontal
> > Scalability here. I doubt if your assumption doesn't hold true, under
> heavy
> > read load by concurrent clients accessing different inodes, the Namenode
> > will start giving memory troubles, that would do more harm than good.
> > Anyway Namenode heap is way bigger problem than anything, so we should be
> > very careful increasing load over there.
> > * For the Locks on the inodes: Do you plan to have locs for each inode?
> > Can we somehow limit that to the depth of the tree? Like currently we
> take
> > lock on the root, have a config which makes us take lock at Level-2 or 3
> > (configurable), that might fetch some perf benefits and can be used to
> > control the memory usage as well?
> > * What is the cost of creating these inode locks? If the lock isn't
> > already cached it would incur some cost? Do you have some numbers around
> > that? Say I disable caching altogether & then let a test load run, what
> > does the perf numbers look like in that case
> > * I think we need to limit the size of INodeLockPool, we can't let it
> grow
> > infinitely in case of heavy loads and we need to have some auto
> > throttling mechanism for it
> > * I didn't catch your Storage Policy problem. If I decode it right, the
> > problem is like the policy could be set on an ancestor node & the
> children
> > abide by that & this is the problem, if that is the case then isn't that
> > the case with ErasureCoding policies or even ACLs or so? Can you
> elaborate
> > a bit on that.
> >
> >
> > Anyway, regarding the Phase-1. If you share (the perf numbers with proper
> > details + Impact on memory if any) for just phase 1 & if they are good,
> > then if you call for a branch merge vote for Phase-1 FGL, you have my
> vote,
> > however you'll need to sway the rest of the folks on your own :-)
> >
> > Good Luck, Nice Work Guys!!!
> >
> > -Ayush
> >
> >
> > On Sun, 28 Apr 2024 at 18:32, Xiaoqiao He <he...@apache.org> wrote:
> >
> >> Thanks ZanderXu and Hui Fei for your work on this feature. It will be
> >> a very helpful improvement for the HDFS module in the next journal.
> >>
> >> 1. If we need any more review bandwidth, I would like to be involved
> >> to help review if possible.
> >> 2. From the design document there are still missing some detailed
> >> descriptions such as snapshot, symbolic link and reserved etc as
> mentioned
> >> above. I think it will be helpful for newbies who want to be involved
> >> if all corner
> >> cases are considered and described.
> >> 3. From slack, we plan to check into the trunk at this phase. I am not
> >> sure
> >> If it is the proper time, following the dev plan there are two steps
> left
> >> to
> >> finish this feature from the design document, right? If that, I think we
> >> should
> >> postpone checking in when all plans are ready. Considering that there
> are
> >> many unfinished tries for this feature in history, I think postpone
> >> checking
> >> will be the safe way, another way it will involve more rebase cost if
> you
> >> keep
> >> separate dev branch, however I think It is not one difficult thing for
> >> you.
> >>
> >> Good luck and look forward to making that happen soon!
> >>
> >> Best Regards,
> >> - He Xiaoqiao
> >>
> >> On Fri, Apr 26, 2024 at 3:50 PM Hui Fei <fe...@gmail.com> wrote:
> >> >
> >> > Thanks for interest and advice on this.
> >> >
> >> > Just would like to share some info here
> >> >
> >> > ZanderXu leads this feature and he has spent a lot of time on it. He
> is
> >> the main developer in stage 1.  Yuanboliu and Kokonguyen191 also took
> some
> >> tasks. Other developers (slfan1989 haiyang1987 huangzhaobo99 RocMarshal
> >> kokonguyen191) helped review PRs. (Forgive me if I missed someone)
> >> >
> >> > Actually haiyang1987, Yuanboliu and Kokonguyen191 are also very
> >> familiar with this feature. We discussed many details offline.
> >> >
> >> > Welcome to more people interested in joining the development and
> review
> >> of the stage 2 and 3.
> >> >
> >> >
> >> > Zengqiang XU <xu...@gmail.com> 于2024年4月26日周五 14:56写道：
> >> >>
> >> >> Thanks Shilun for your response:
> >> >>
> >> >> 1. This is a big and very useful feature, so it really needs more
> >> >> developers to get on board.
> >> >> 2. This fine grained lock has been implemented based on internal
> >> branches
> >> >> and has gained benefits by many companies, such as: Meituan,
> Kuaishou,
> >> >> Bytedance, etc.  But it has not been contributed to the community due
> >> to
> >> >> various reasons, such as there is a big difference between the
> version
> >> of
> >> >> the internal branch and the community trunk branch, the internal
> >> branch may
> >> >> ignore some functions to make FGL clear, and the contribution needs a
> >> lot
> >> >> of work and will take many times. It means that this solution has
> >> already
> >> >> been practiced in their prod environment. We have also practiced it
> in
> >> our
> >> >> prod environment and gained benefits, and we are also willing to
> spend
> >> a
> >> >> lot of time contributing to the community.
> >> >> 3. Regarding the benchmark testing, we don't need to pay more
> >> attention to
> >> >> whether the performance is improved by 5 times, 10 times or 20 times,
> >> >> because there are too many factors that affect it.
> >> >> 4. As I described above, this solution is already  being practiced by
> >> many
> >> >> companies. Right now, we just need to think about how to implement it
> >> with
> >> >> high quality and more comprehensively.
> >> >> 5. I firmly believe that all problems can be solved as long as the
> >> overall
> >> >> solution is right.
> >> >> 6. I can spend a lot of time leading the promotion of this entire
> >> feature
> >> >> and I hope more people can join us in promoting it.
> >> >> 7. You are always welcome to raise your concerns.
> >> >>
> >> >>
> >> >> Thanks Shilun again, I hope you can help review designs and PRs.
> Thanks
> >> >>
> >> >> On Fri, 26 Apr 2024 at 08:00, slfan1989 <sl...@apache.org>
> wrote:
> >> >>
> >> >> > Thank you for your hard work! This is a very meaningful
> improvement,
> >> and
> >> >> > from the design document, we can see a significant increase in HDFS
> >> >> > read/write throughput.
> >> >> >
> >> >> > I am happy to see the progress made on HDFS-17384.
> >> >> >
> >> >> > However, I still have some concerns, which roughly involve the
> >> following
> >> >> > aspects:
> >> >> >
> >> >> > 1. While ZanderXu and Hui Fei have deep expertise in HDFS and are
> >> familiar
> >> >> > with related development details, we still need more community
> >> member to
> >> >> > review the code to ensure that the relevant upgrades meet
> >> expectations.
> >> >> >
> >> >> > 2. We need more details on benchmarks to ensure that test results
> >> can be
> >> >> > reproduced and to allow more community member to participate in the
> >> testing
> >> >> > process.
> >> >> >
> >> >> > Looking forward to everything going smoothly in the future.
> >> >> >
> >> >> > Best Regards,
> >> >> > - Shilun Fan.
> >> >> >
> >> >> > On Wed, Apr 24, 2024 at 3:51 PM Xiaoqiao He <hexiaoqiao@apache.org
> >
> >> wrote:
> >> >> >
> >> >> >> cc private@h.a.o.
> >> >> >>
> >> >> >> On Wed, Apr 24, 2024 at 3:35 PM ZanderXu <za...@apache.org>
> >> wrote:
> >> >> >> >
> >> >> >> > Here are some summaries about the first phase:
> >> >> >> > 1. There are no big changes in this phase
> >> >> >> > 2. This phase just uses FS lock and BM lock to replace the
> >> original
> >> >> >> global
> >> >> >> > lock
> >> >> >> > 3. It's useful to improve the performance, since some operations
> >> just
> >> >> >> need
> >> >> >> > to hold FS lock or BM lock instead of the global lock
> >> >> >> > 4. This feature is turned off by default, you can enable it by
> >> setting
> >> >> >> > dfs.namenode.lock.model.provider.class to
> >> >> >> >
> >> org.apache.hadoop.hdfs.server.namenode.fgl.FineGrainedFSNamesystemLock
> >> >> >> > 5. This phase is very import for the ongoing development of the
> >> entire
> >> >> >> FGL
> >> >> >> >
> >> >> >> > Here I would like to express my special thanks to @kokonguyen191
> >> and
> >> >> >> > @yuanboliu for their contributions.  And you are also welcome to
> >> join us
> >> >> >> > and complete it together.
> >> >> >> >
> >> >> >> >
> >> >> >> > On Wed, 24 Apr 2024 at 14:54, ZanderXu <za...@apache.org>
> >> wrote:
> >> >> >> >
> >> >> >> > > Hi everyone
> >> >> >> > >
> >> >> >> > > All subtasks of the first phase of the FGL have been completed
> >> and I
> >> >> >> plan
> >> >> >> > > to merge them into the trunk and start the second phase based
> >> on the
> >> >> >> trunk.
> >> >> >> > >
> >> >> >> > > Here is the PR that used to merge the first phases into trunk:
> >> >> >> > > https://github.com/apache/hadoop/pull/6762
> >> >> >> > > Here is the ticket:
> >> https://issues.apache.org/jira/browse/HDFS-17384
> >> >> >> > >
> >> >> >> > > I hope you can help to review this PR when you are available
> >> and give
> >> >> >> some
> >> >> >> > > ideas.
> >> >> >> > >
> >> >> >> > >
> >> >> >> > > HDFS-17385 <https://issues.apache.org/jira/browse/HDFS-17385>
> >> is
> >> >> >> used for
> >> >> >> > > the second phase and I have created some subtasks to describe
> >> >> >> solutions for
> >> >> >> > > some problems, such as: snapshot, getListing, quota.
> >> >> >> > > You are welcome to join us to complete it together.
> >> >> >> > >
> >> >> >> > >
> >> >> >> > > ---------- Forwarded message ---------
> >> >> >> > > From: Zengqiang XU <za...@apache.org>
> >> >> >> > > Date: Fri, 2 Feb 2024 at 11:07
> >> >> >> > > Subject: Discussion about NameNode Fine-grained locking
> >> >> >> > > To: <hd...@hadoop.apache.org>
> >> >> >> > > Cc: Zengqiang XU <xu...@gmail.com>
> >> >> >> > >
> >> >> >> > >
> >> >> >> > > Hi everyone
> >> >> >> > >
> >> >> >> > > I have started a discussion about NameNode Fine-grained
> Locking
> >> to
> >> >> >> improve
> >> >> >> > > performance of write operations in NameNode.
> >> >> >> > >
> >> >> >> > > I started this discussion again for serval main reasons:
> >> >> >> > > 1. We have implemented it and gained nearly 7x performance
> >> >> >> improvement in
> >> >> >> > > our prod environment
> >> >> >> > > 2. Many other companies made similar improvements based on
> their
> >> >> >> internal
> >> >> >> > > branch.
> >> >> >> > > 3. This topic has been discussed for a long time, but still
> >> without
> >> >> >> any
> >> >> >> > > results.
> >> >> >> > >
> >> >> >> > > I hope we can push this important improvement in the community
> >> so
> >> >> >> that all
> >> >> >> > > end-users can enjoy this significant improvement.
> >> >> >> > >
> >> >> >> > > I'd really appreciate you can join in and work with me to push
> >> this
> >> >> >> > > feature forward.
> >> >> >> > >
> >> >> >> > > Thanks very much.
> >> >> >> > >
> >> >> >> > > Ticket: HDFS-17366 <
> >> https://issues.apache.org/jira/browse/HDFS-17366>
> >> >> >> > > Design: NameNode Fine-grained locking based on directory tree
> >> >> >> > > <
> >> >> >>
> >>
> https://docs.google.com/document/d/1X499gHxT0WSU1fj8uo4RuF3GqKxWkWXznXx4tspTBLY/edit?usp=sharing
> >> >> >> >
> >> >> >> > >
> >> >> >>
> >> >> >>
> >> ---------------------------------------------------------------------
> >> >> >> To unsubscribe, e-mail: private-unsubscribe@hadoop.apache.org
> >> >> >> For additional commands, e-mail: private-help@hadoop.apache.org
> >> >> >>
> >> >> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
> >> For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
> >>
> >>
>

Re: Discussion about NameNode Fine-grained locking

Posted by ZanderXu <za...@apache.org>.

Thanks @Ayush Saxena <ay...@gmail.com> and @Xiaoqiao He
<he...@apache.org> for your nice questions.

Let me summarize your concerns and corresponding solutions:

*1. Questions about the Snapshot feature*
It's difficult to apply the FGL to Snapshot feature,  but we can just using
the global FS write lock to make it thread safe.
So if we can identity if a path contains the snapshot feature, we can just
using the global FS write lock to protect it.

You can refer to HDFS-17479
<https://issues.apache.org/jira/browse/HDFS-17479> to get how to identify
it.

Regarding performance of the operations related to the snapshot features,
we can discuss it in two categories:
Read operations involves snapshots:
The FGL branch uses the global write lock to protect them, the GLOBAL
branch uses the global read lock to protect them. It's hard to conclude
which version has better performance, it depends on the global lock
competition.

Write operations involves snapshots:
Both FGL and GLOBAL branch use the global write lock to protect them. It's
hard to conclude which version has better performance, it depends on the
global lock competition too.

So I think if namenode load is low, the GLOBAL branch will have a better
performance than FGL; If namenode load is high, the FGL branch may have a
better performance than the GLOBAL, which also depends on the ratio of read
and write operations on the SNAPSHOT feature.

We can do somethings to let end-user to choose a branch with a better
branch according to their business:
First, we need to make the lock mode can be selectable, so that end-user
can choose to use FGL of GLOBAL.
Second, using the global write lock to make operations related to snapshot
thread safe as I described in HDFS-17479.


*2. Questions about the Symlinks feature*
If Symlink is related to snapshot, we can refer to the solution of the
snapshot;  If Symlink is not related to snapshot, I think it's easy to meet
the FGL.
Only createSymlink involves two paths, FGL just need to lock them in the
order to make this operation thread. For other operations, it is the same
as other normal iNode, right?

If I missed difficult points, please let me know.


*3. Questions about Memory Usage of iNode locks*
I think there are too many solutions to limit the memory usage of these
iNode locks, such as: Using a limit capacity lock pool to ensure the
maximum memory usage,  Just holding iNode locks for fixed depth of
directories, etc.

We can just abstract this LockManager first and then support its
implementation with different ideas, so that we can limit the maximum
memory usage of these iNode locks.
FGL can acquire or lease iNode locks through LockManager.


*4. Questions about Performance of acquiring and releasing iNode locks*
We can add some benchmark for LockManager, to test the performance or
acquire and release unblocked locks.


*5. Questions about StoragePolicy, ECPolicy, ACL, Quota, etc.*
These policies may be sot on an ancestor node and used by some children
files.  The set operation for these policies will be protected by the
directory tree, since there are all file-related operations.  In addition
to Quota and StoragePolicy, the use of other policies will also be
protected by directory tree, such as ECPolicy and ACL.

Quota is a little special since its update operations may not be protected
by the directory tree, we can assign a locks to each QuotaFeature and use
these locks to make updating operations thread safe. you can refer to
HDFS-17473 <https://issues.apache.org/jira/browse/HDFS-17473> to get some
detailed information.

StoragePolicy is a little special since it is used not only by file-related
operations but also block-related operations.  ProcessExtraRedundancyBlock
uses storage policy to choose redundancy replicas and
BlockReconstructionWork uses storage policy to choose target DNs. In order
to maximize the performance improvement, BR and IBR should only involve the
iNodeFile to which the current processing block belongs. These redundancy
blocks can be processed by the Redundancy monitor while holding the
directory tree locks. You can refer to HDFS-17505
<https://issues.apache.org/jira/browse/HDFS-17505> to get more detailed
informations.

*6. Performance of the phase 1*
HDFS-17506 <https://issues.apache.org/jira/browse/HDFS-17506> is used to do
some performance testing for phase 1, and I will complete it later.


Discuss solution through mails is not efficient, you can create one
sub-tasks under HDFS-17366
<https://issues.apache.org/jira/browse/HDFS-17366> to describe your
concerns and I will try to give some answers.

Thanks @Ayush Saxena <ay...@gmail.com>  and @Xiaoqiao He
<he...@apache.org> again.



On Mon, 29 Apr 2024 at 02:00, Ayush Saxena <ay...@gmail.com> wrote:

> Thanx Everyone for chasing this, Great to see some momentum around FGL,
> that should be a great improvement.
>
> I have some two broad categories:
> ** About the process:*
> I think in the above mails, there are mentions that phase one is complete
> in a feature branch & we are gonna merge that to trunk. If I am catching it
> right, then you can't hit the merge button like that. To merge a feature
> branch. You need to call for a Vote specific to that branch & it requires 3
> binding votes to merge, unlike any other code change which requires 1. It
> is there in our Bylaws.
>
> So, do follow the process.
>
> ** About the feature itself:* (A very quick look at the doc and the Jira,
> so please take it with a grain of salt)
> * The Google Drive link that you folks shared as part of the first mail. I
> don't have access to that. So, please open up the permissions for that doc
> or share the new link
> * Chasing the design doc present on the Jira
> * I think we only have Phase-1 ready, so can you share some metrics just
> for that? Perf improvements just with splitting the FS & BM Locks
> * The memory implications of Phase-1? I don't think there should be any
> major impact on the memory in case of just phase-1
> * Regarding the snapshot stuff, you mentioned taking lock on the root
> itself? Does just taking lock on the snapshot root rather than the FS root
> works?
> * Secondly about the usage of Snapshot or Symlinks, I don't think we
> should operate under the assumptions that they aren't widely used or not,
> we might just not know folks who don't use it widely or they are just users
> not the ones contributing. We can just accept for now, that in those cases
> it isn't optimised and we just lock the entire FS space, which it does even
> today, so no regressions there.
> * Regarding memory usage: Do you have some numbers on how much the memory
> footprint increases?
> * Under the Lock Pool: I think you are assuming there would be very few
> inodes where lock would be required at any given time, so there won't be
> too much heap consumption? I think you are compromising on the Horizontal
> Scalability here. I doubt if your assumption doesn't hold true, under heavy
> read load by concurrent clients accessing different inodes, the Namenode
> will start giving memory troubles, that would do more harm than good.
> Anyway Namenode heap is way bigger problem than anything, so we should be
> very careful increasing load over there.
> * For the Locks on the inodes: Do you plan to have locs for each inode?
> Can we somehow limit that to the depth of the tree? Like currently we take
> lock on the root, have a config which makes us take lock at Level-2 or 3
> (configurable), that might fetch some perf benefits and can be used to
> control the memory usage as well?
> * What is the cost of creating these inode locks? If the lock isn't
> already cached it would incur some cost? Do you have some numbers around
> that? Say I disable caching altogether & then let a test load run, what
> does the perf numbers look like in that case
> * I think we need to limit the size of INodeLockPool, we can't let it grow
> infinitely in case of heavy loads and we need to have some auto
> throttling mechanism for it
> * I didn't catch your Storage Policy problem. If I decode it right, the
> problem is like the policy could be set on an ancestor node & the children
> abide by that & this is the problem, if that is the case then isn't that
> the case with ErasureCoding policies or even ACLs or so? Can you elaborate
> a bit on that.
>
>
> Anyway, regarding the Phase-1. If you share (the perf numbers with proper
> details + Impact on memory if any) for just phase 1 & if they are good,
> then if you call for a branch merge vote for Phase-1 FGL, you have my vote,
> however you'll need to sway the rest of the folks on your own :-)
>
> Good Luck, Nice Work Guys!!!
>
> -Ayush
>
>
> On Sun, 28 Apr 2024 at 18:32, Xiaoqiao He <he...@apache.org> wrote:
>
>> Thanks ZanderXu and Hui Fei for your work on this feature. It will be
>> a very helpful improvement for the HDFS module in the next journal.
>>
>> 1. If we need any more review bandwidth, I would like to be involved
>> to help review if possible.
>> 2. From the design document there are still missing some detailed
>> descriptions such as snapshot, symbolic link and reserved etc as mentioned
>> above. I think it will be helpful for newbies who want to be involved
>> if all corner
>> cases are considered and described.
>> 3. From slack, we plan to check into the trunk at this phase. I am not
>> sure
>> If it is the proper time, following the dev plan there are two steps left
>> to
>> finish this feature from the design document, right? If that, I think we
>> should
>> postpone checking in when all plans are ready. Considering that there are
>> many unfinished tries for this feature in history, I think postpone
>> checking
>> will be the safe way, another way it will involve more rebase cost if you
>> keep
>> separate dev branch, however I think It is not one difficult thing for
>> you.
>>
>> Good luck and look forward to making that happen soon!
>>
>> Best Regards,
>> - He Xiaoqiao
>>
>> On Fri, Apr 26, 2024 at 3:50 PM Hui Fei <fe...@gmail.com> wrote:
>> >
>> > Thanks for interest and advice on this.
>> >
>> > Just would like to share some info here
>> >
>> > ZanderXu leads this feature and he has spent a lot of time on it. He is
>> the main developer in stage 1.  Yuanboliu and Kokonguyen191 also took some
>> tasks. Other developers (slfan1989 haiyang1987 huangzhaobo99 RocMarshal
>> kokonguyen191) helped review PRs. (Forgive me if I missed someone)
>> >
>> > Actually haiyang1987, Yuanboliu and Kokonguyen191 are also very
>> familiar with this feature. We discussed many details offline.
>> >
>> > Welcome to more people interested in joining the development and review
>> of the stage 2 and 3.
>> >
>> >
>> > Zengqiang XU <xu...@gmail.com> 于2024年4月26日周五 14:56写道：
>> >>
>> >> Thanks Shilun for your response:
>> >>
>> >> 1. This is a big and very useful feature, so it really needs more
>> >> developers to get on board.
>> >> 2. This fine grained lock has been implemented based on internal
>> branches
>> >> and has gained benefits by many companies, such as: Meituan, Kuaishou,
>> >> Bytedance, etc.  But it has not been contributed to the community due
>> to
>> >> various reasons, such as there is a big difference between the version
>> of
>> >> the internal branch and the community trunk branch, the internal
>> branch may
>> >> ignore some functions to make FGL clear, and the contribution needs a
>> lot
>> >> of work and will take many times. It means that this solution has
>> already
>> >> been practiced in their prod environment. We have also practiced it in
>> our
>> >> prod environment and gained benefits, and we are also willing to spend
>> a
>> >> lot of time contributing to the community.
>> >> 3. Regarding the benchmark testing, we don't need to pay more
>> attention to
>> >> whether the performance is improved by 5 times, 10 times or 20 times,
>> >> because there are too many factors that affect it.
>> >> 4. As I described above, this solution is already  being practiced by
>> many
>> >> companies. Right now, we just need to think about how to implement it
>> with
>> >> high quality and more comprehensively.
>> >> 5. I firmly believe that all problems can be solved as long as the
>> overall
>> >> solution is right.
>> >> 6. I can spend a lot of time leading the promotion of this entire
>> feature
>> >> and I hope more people can join us in promoting it.
>> >> 7. You are always welcome to raise your concerns.
>> >>
>> >>
>> >> Thanks Shilun again, I hope you can help review designs and PRs. Thanks
>> >>
>> >> On Fri, 26 Apr 2024 at 08:00, slfan1989 <sl...@apache.org> wrote:
>> >>
>> >> > Thank you for your hard work! This is a very meaningful improvement,
>> and
>> >> > from the design document, we can see a significant increase in HDFS
>> >> > read/write throughput.
>> >> >
>> >> > I am happy to see the progress made on HDFS-17384.
>> >> >
>> >> > However, I still have some concerns, which roughly involve the
>> following
>> >> > aspects:
>> >> >
>> >> > 1. While ZanderXu and Hui Fei have deep expertise in HDFS and are
>> familiar
>> >> > with related development details, we still need more community
>> member to
>> >> > review the code to ensure that the relevant upgrades meet
>> expectations.
>> >> >
>> >> > 2. We need more details on benchmarks to ensure that test results
>> can be
>> >> > reproduced and to allow more community member to participate in the
>> testing
>> >> > process.
>> >> >
>> >> > Looking forward to everything going smoothly in the future.
>> >> >
>> >> > Best Regards,
>> >> > - Shilun Fan.
>> >> >
>> >> > On Wed, Apr 24, 2024 at 3:51 PM Xiaoqiao He <he...@apache.org>
>> wrote:
>> >> >
>> >> >> cc private@h.a.o.
>> >> >>
>> >> >> On Wed, Apr 24, 2024 at 3:35 PM ZanderXu <za...@apache.org>
>> wrote:
>> >> >> >
>> >> >> > Here are some summaries about the first phase:
>> >> >> > 1. There are no big changes in this phase
>> >> >> > 2. This phase just uses FS lock and BM lock to replace the
>> original
>> >> >> global
>> >> >> > lock
>> >> >> > 3. It's useful to improve the performance, since some operations
>> just
>> >> >> need
>> >> >> > to hold FS lock or BM lock instead of the global lock
>> >> >> > 4. This feature is turned off by default, you can enable it by
>> setting
>> >> >> > dfs.namenode.lock.model.provider.class to
>> >> >> >
>> org.apache.hadoop.hdfs.server.namenode.fgl.FineGrainedFSNamesystemLock
>> >> >> > 5. This phase is very import for the ongoing development of the
>> entire
>> >> >> FGL
>> >> >> >
>> >> >> > Here I would like to express my special thanks to @kokonguyen191
>> and
>> >> >> > @yuanboliu for their contributions.  And you are also welcome to
>> join us
>> >> >> > and complete it together.
>> >> >> >
>> >> >> >
>> >> >> > On Wed, 24 Apr 2024 at 14:54, ZanderXu <za...@apache.org>
>> wrote:
>> >> >> >
>> >> >> > > Hi everyone
>> >> >> > >
>> >> >> > > All subtasks of the first phase of the FGL have been completed
>> and I
>> >> >> plan
>> >> >> > > to merge them into the trunk and start the second phase based
>> on the
>> >> >> trunk.
>> >> >> > >
>> >> >> > > Here is the PR that used to merge the first phases into trunk:
>> >> >> > > https://github.com/apache/hadoop/pull/6762
>> >> >> > > Here is the ticket:
>> https://issues.apache.org/jira/browse/HDFS-17384
>> >> >> > >
>> >> >> > > I hope you can help to review this PR when you are available
>> and give
>> >> >> some
>> >> >> > > ideas.
>> >> >> > >
>> >> >> > >
>> >> >> > > HDFS-17385 <https://issues.apache.org/jira/browse/HDFS-17385>
>> is
>> >> >> used for
>> >> >> > > the second phase and I have created some subtasks to describe
>> >> >> solutions for
>> >> >> > > some problems, such as: snapshot, getListing, quota.
>> >> >> > > You are welcome to join us to complete it together.
>> >> >> > >
>> >> >> > >
>> >> >> > > ---------- Forwarded message ---------
>> >> >> > > From: Zengqiang XU <za...@apache.org>
>> >> >> > > Date: Fri, 2 Feb 2024 at 11:07
>> >> >> > > Subject: Discussion about NameNode Fine-grained locking
>> >> >> > > To: <hd...@hadoop.apache.org>
>> >> >> > > Cc: Zengqiang XU <xu...@gmail.com>
>> >> >> > >
>> >> >> > >
>> >> >> > > Hi everyone
>> >> >> > >
>> >> >> > > I have started a discussion about NameNode Fine-grained Locking
>> to
>> >> >> improve
>> >> >> > > performance of write operations in NameNode.
>> >> >> > >
>> >> >> > > I started this discussion again for serval main reasons:
>> >> >> > > 1. We have implemented it and gained nearly 7x performance
>> >> >> improvement in
>> >> >> > > our prod environment
>> >> >> > > 2. Many other companies made similar improvements based on their
>> >> >> internal
>> >> >> > > branch.
>> >> >> > > 3. This topic has been discussed for a long time, but still
>> without
>> >> >> any
>> >> >> > > results.
>> >> >> > >
>> >> >> > > I hope we can push this important improvement in the community
>> so
>> >> >> that all
>> >> >> > > end-users can enjoy this significant improvement.
>> >> >> > >
>> >> >> > > I'd really appreciate you can join in and work with me to push
>> this
>> >> >> > > feature forward.
>> >> >> > >
>> >> >> > > Thanks very much.
>> >> >> > >
>> >> >> > > Ticket: HDFS-17366 <
>> https://issues.apache.org/jira/browse/HDFS-17366>
>> >> >> > > Design: NameNode Fine-grained locking based on directory tree
>> >> >> > > <
>> >> >>
>> https://docs.google.com/document/d/1X499gHxT0WSU1fj8uo4RuF3GqKxWkWXznXx4tspTBLY/edit?usp=sharing
>> >> >> >
>> >> >> > >
>> >> >>
>> >> >>
>> ---------------------------------------------------------------------
>> >> >> To unsubscribe, e-mail: private-unsubscribe@hadoop.apache.org
>> >> >> For additional commands, e-mail: private-help@hadoop.apache.org
>> >> >>
>> >> >>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
>> For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
>>
>>

Re: Discussion about NameNode Fine-grained locking

Posted by Ayush Saxena <ay...@gmail.com>.

Thanx Everyone for chasing this, Great to see some momentum around FGL,
that should be a great improvement.

I have some two broad categories:
** About the process:*
I think in the above mails, there are mentions that phase one is complete
in a feature branch & we are gonna merge that to trunk. If I am catching it
right, then you can't hit the merge button like that. To merge a feature
branch. You need to call for a Vote specific to that branch & it requires 3
binding votes to merge, unlike any other code change which requires 1. It
is there in our Bylaws.

So, do follow the process.

** About the feature itself:* (A very quick look at the doc and the Jira,
so please take it with a grain of salt)
* The Google Drive link that you folks shared as part of the first mail. I
don't have access to that. So, please open up the permissions for that doc
or share the new link
* Chasing the design doc present on the Jira
* I think we only have Phase-1 ready, so can you share some metrics just
for that? Perf improvements just with splitting the FS & BM Locks
* The memory implications of Phase-1? I don't think there should be any
major impact on the memory in case of just phase-1
* Regarding the snapshot stuff, you mentioned taking lock on the root
itself? Does just taking lock on the snapshot root rather than the FS root
works?
* Secondly about the usage of Snapshot or Symlinks, I don't think we should
operate under the assumptions that they aren't widely used or not, we might
just not know folks who don't use it widely or they are just users not the
ones contributing. We can just accept for now, that in those cases it isn't
optimised and we just lock the entire FS space, which it does even today,
so no regressions there.
* Regarding memory usage: Do you have some numbers on how much the memory
footprint increases?
* Under the Lock Pool: I think you are assuming there would be very few
inodes where lock would be required at any given time, so there won't be
too much heap consumption? I think you are compromising on the Horizontal
Scalability here. I doubt if your assumption doesn't hold true, under heavy
read load by concurrent clients accessing different inodes, the Namenode
will start giving memory troubles, that would do more harm than good.
Anyway Namenode heap is way bigger problem than anything, so we should be
very careful increasing load over there.
* For the Locks on the inodes: Do you plan to have locs for each inode? Can
we somehow limit that to the depth of the tree? Like currently we take lock
on the root, have a config which makes us take lock at Level-2 or 3
(configurable), that might fetch some perf benefits and can be used to
control the memory usage as well?
* What is the cost of creating these inode locks? If the lock isn't already
cached it would incur some cost? Do you have some numbers around that? Say
I disable caching altogether & then let a test load run, what does the perf
numbers look like in that case
* I think we need to limit the size of INodeLockPool, we can't let it grow
infinitely in case of heavy loads and we need to have some auto
throttling mechanism for it
* I didn't catch your Storage Policy problem. If I decode it right, the
problem is like the policy could be set on an ancestor node & the children
abide by that & this is the problem, if that is the case then isn't that
the case with ErasureCoding policies or even ACLs or so? Can you elaborate
a bit on that.


Anyway, regarding the Phase-1. If you share (the perf numbers with proper
details + Impact on memory if any) for just phase 1 & if they are good,
then if you call for a branch merge vote for Phase-1 FGL, you have my vote,
however you'll need to sway the rest of the folks on your own :-)

Good Luck, Nice Work Guys!!!

-Ayush


On Sun, 28 Apr 2024 at 18:32, Xiaoqiao He <he...@apache.org> wrote:

> Thanks ZanderXu and Hui Fei for your work on this feature. It will be
> a very helpful improvement for the HDFS module in the next journal.
>
> 1. If we need any more review bandwidth, I would like to be involved
> to help review if possible.
> 2. From the design document there are still missing some detailed
> descriptions such as snapshot, symbolic link and reserved etc as mentioned
> above. I think it will be helpful for newbies who want to be involved
> if all corner
> cases are considered and described.
> 3. From slack, we plan to check into the trunk at this phase. I am not sure
> If it is the proper time, following the dev plan there are two steps left
> to
> finish this feature from the design document, right? If that, I think we
> should
> postpone checking in when all plans are ready. Considering that there are
> many unfinished tries for this feature in history, I think postpone
> checking
> will be the safe way, another way it will involve more rebase cost if you
> keep
> separate dev branch, however I think It is not one difficult thing for you.
>
> Good luck and look forward to making that happen soon!
>
> Best Regards,
> - He Xiaoqiao
>
> On Fri, Apr 26, 2024 at 3:50 PM Hui Fei <fe...@gmail.com> wrote:
> >
> > Thanks for interest and advice on this.
> >
> > Just would like to share some info here
> >
> > ZanderXu leads this feature and he has spent a lot of time on it. He is
> the main developer in stage 1.  Yuanboliu and Kokonguyen191 also took some
> tasks. Other developers (slfan1989 haiyang1987 huangzhaobo99 RocMarshal
> kokonguyen191) helped review PRs. (Forgive me if I missed someone)
> >
> > Actually haiyang1987, Yuanboliu and Kokonguyen191 are also very familiar
> with this feature. We discussed many details offline.
> >
> > Welcome to more people interested in joining the development and review
> of the stage 2 and 3.
> >
> >
> > Zengqiang XU <xu...@gmail.com> 于2024年4月26日周五 14:56写道：
> >>
> >> Thanks Shilun for your response:
> >>
> >> 1. This is a big and very useful feature, so it really needs more
> >> developers to get on board.
> >> 2. This fine grained lock has been implemented based on internal
> branches
> >> and has gained benefits by many companies, such as: Meituan, Kuaishou,
> >> Bytedance, etc.  But it has not been contributed to the community due to
> >> various reasons, such as there is a big difference between the version
> of
> >> the internal branch and the community trunk branch, the internal branch
> may
> >> ignore some functions to make FGL clear, and the contribution needs a
> lot
> >> of work and will take many times. It means that this solution has
> already
> >> been practiced in their prod environment. We have also practiced it in
> our
> >> prod environment and gained benefits, and we are also willing to spend a
> >> lot of time contributing to the community.
> >> 3. Regarding the benchmark testing, we don't need to pay more attention
> to
> >> whether the performance is improved by 5 times, 10 times or 20 times,
> >> because there are too many factors that affect it.
> >> 4. As I described above, this solution is already  being practiced by
> many
> >> companies. Right now, we just need to think about how to implement it
> with
> >> high quality and more comprehensively.
> >> 5. I firmly believe that all problems can be solved as long as the
> overall
> >> solution is right.
> >> 6. I can spend a lot of time leading the promotion of this entire
> feature
> >> and I hope more people can join us in promoting it.
> >> 7. You are always welcome to raise your concerns.
> >>
> >>
> >> Thanks Shilun again, I hope you can help review designs and PRs. Thanks
> >>
> >> On Fri, 26 Apr 2024 at 08:00, slfan1989 <sl...@apache.org> wrote:
> >>
> >> > Thank you for your hard work! This is a very meaningful improvement,
> and
> >> > from the design document, we can see a significant increase in HDFS
> >> > read/write throughput.
> >> >
> >> > I am happy to see the progress made on HDFS-17384.
> >> >
> >> > However, I still have some concerns, which roughly involve the
> following
> >> > aspects:
> >> >
> >> > 1. While ZanderXu and Hui Fei have deep expertise in HDFS and are
> familiar
> >> > with related development details, we still need more community member
> to
> >> > review the code to ensure that the relevant upgrades meet
> expectations.
> >> >
> >> > 2. We need more details on benchmarks to ensure that test results can
> be
> >> > reproduced and to allow more community member to participate in the
> testing
> >> > process.
> >> >
> >> > Looking forward to everything going smoothly in the future.
> >> >
> >> > Best Regards,
> >> > - Shilun Fan.
> >> >
> >> > On Wed, Apr 24, 2024 at 3:51 PM Xiaoqiao He <he...@apache.org>
> wrote:
> >> >
> >> >> cc private@h.a.o.
> >> >>
> >> >> On Wed, Apr 24, 2024 at 3:35 PM ZanderXu <za...@apache.org>
> wrote:
> >> >> >
> >> >> > Here are some summaries about the first phase:
> >> >> > 1. There are no big changes in this phase
> >> >> > 2. This phase just uses FS lock and BM lock to replace the original
> >> >> global
> >> >> > lock
> >> >> > 3. It's useful to improve the performance, since some operations
> just
> >> >> need
> >> >> > to hold FS lock or BM lock instead of the global lock
> >> >> > 4. This feature is turned off by default, you can enable it by
> setting
> >> >> > dfs.namenode.lock.model.provider.class to
> >> >> >
> org.apache.hadoop.hdfs.server.namenode.fgl.FineGrainedFSNamesystemLock
> >> >> > 5. This phase is very import for the ongoing development of the
> entire
> >> >> FGL
> >> >> >
> >> >> > Here I would like to express my special thanks to @kokonguyen191
> and
> >> >> > @yuanboliu for their contributions.  And you are also welcome to
> join us
> >> >> > and complete it together.
> >> >> >
> >> >> >
> >> >> > On Wed, 24 Apr 2024 at 14:54, ZanderXu <za...@apache.org>
> wrote:
> >> >> >
> >> >> > > Hi everyone
> >> >> > >
> >> >> > > All subtasks of the first phase of the FGL have been completed
> and I
> >> >> plan
> >> >> > > to merge them into the trunk and start the second phase based on
> the
> >> >> trunk.
> >> >> > >
> >> >> > > Here is the PR that used to merge the first phases into trunk:
> >> >> > > https://github.com/apache/hadoop/pull/6762
> >> >> > > Here is the ticket:
> https://issues.apache.org/jira/browse/HDFS-17384
> >> >> > >
> >> >> > > I hope you can help to review this PR when you are available and
> give
> >> >> some
> >> >> > > ideas.
> >> >> > >
> >> >> > >
> >> >> > > HDFS-17385 <https://issues.apache.org/jira/browse/HDFS-17385> is
> >> >> used for
> >> >> > > the second phase and I have created some subtasks to describe
> >> >> solutions for
> >> >> > > some problems, such as: snapshot, getListing, quota.
> >> >> > > You are welcome to join us to complete it together.
> >> >> > >
> >> >> > >
> >> >> > > ---------- Forwarded message ---------
> >> >> > > From: Zengqiang XU <za...@apache.org>
> >> >> > > Date: Fri, 2 Feb 2024 at 11:07
> >> >> > > Subject: Discussion about NameNode Fine-grained locking
> >> >> > > To: <hd...@hadoop.apache.org>
> >> >> > > Cc: Zengqiang XU <xu...@gmail.com>
> >> >> > >
> >> >> > >
> >> >> > > Hi everyone
> >> >> > >
> >> >> > > I have started a discussion about NameNode Fine-grained Locking
> to
> >> >> improve
> >> >> > > performance of write operations in NameNode.
> >> >> > >
> >> >> > > I started this discussion again for serval main reasons:
> >> >> > > 1. We have implemented it and gained nearly 7x performance
> >> >> improvement in
> >> >> > > our prod environment
> >> >> > > 2. Many other companies made similar improvements based on their
> >> >> internal
> >> >> > > branch.
> >> >> > > 3. This topic has been discussed for a long time, but still
> without
> >> >> any
> >> >> > > results.
> >> >> > >
> >> >> > > I hope we can push this important improvement in the community so
> >> >> that all
> >> >> > > end-users can enjoy this significant improvement.
> >> >> > >
> >> >> > > I'd really appreciate you can join in and work with me to push
> this
> >> >> > > feature forward.
> >> >> > >
> >> >> > > Thanks very much.
> >> >> > >
> >> >> > > Ticket: HDFS-17366 <
> https://issues.apache.org/jira/browse/HDFS-17366>
> >> >> > > Design: NameNode Fine-grained locking based on directory tree
> >> >> > > <
> >> >>
> https://docs.google.com/document/d/1X499gHxT0WSU1fj8uo4RuF3GqKxWkWXznXx4tspTBLY/edit?usp=sharing
> >> >> >
> >> >> > >
> >> >>
> >> >> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: private-unsubscribe@hadoop.apache.org
> >> >> For additional commands, e-mail: private-help@hadoop.apache.org
> >> >>
> >> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
>
>

Re: Discussion about NameNode Fine-grained locking

Posted by Xiaoqiao He <he...@apache.org>.

Thanks ZanderXu and Hui Fei for your work on this feature. It will be
a very helpful improvement for the HDFS module in the next journal.

1. If we need any more review bandwidth, I would like to be involved
to help review if possible.
2. From the design document there are still missing some detailed
descriptions such as snapshot, symbolic link and reserved etc as mentioned
above. I think it will be helpful for newbies who want to be involved
if all corner
cases are considered and described.
3. From slack, we plan to check into the trunk at this phase. I am not sure
If it is the proper time, following the dev plan there are two steps left to
finish this feature from the design document, right? If that, I think we should
postpone checking in when all plans are ready. Considering that there are
many unfinished tries for this feature in history, I think postpone checking
will be the safe way, another way it will involve more rebase cost if you keep
separate dev branch, however I think It is not one difficult thing for you.

Good luck and look forward to making that happen soon!

Best Regards,
- He Xiaoqiao

On Fri, Apr 26, 2024 at 3:50 PM Hui Fei <fe...@gmail.com> wrote:
>
> Thanks for interest and advice on this.
>
> Just would like to share some info here
>
> ZanderXu leads this feature and he has spent a lot of time on it. He is the main developer in stage 1.  Yuanboliu and Kokonguyen191 also took some tasks. Other developers (slfan1989 haiyang1987 huangzhaobo99 RocMarshal kokonguyen191) helped review PRs. (Forgive me if I missed someone)
>
> Actually haiyang1987, Yuanboliu and Kokonguyen191 are also very familiar with this feature. We discussed many details offline.
>
> Welcome to more people interested in joining the development and review of the stage 2 and 3.
>
>
> Zengqiang XU <xu...@gmail.com> 于2024年4月26日周五 14:56写道：
>>
>> Thanks Shilun for your response:
>>
>> 1. This is a big and very useful feature, so it really needs more
>> developers to get on board.
>> 2. This fine grained lock has been implemented based on internal branches
>> and has gained benefits by many companies, such as: Meituan, Kuaishou,
>> Bytedance, etc.  But it has not been contributed to the community due to
>> various reasons, such as there is a big difference between the version of
>> the internal branch and the community trunk branch, the internal branch may
>> ignore some functions to make FGL clear, and the contribution needs a lot
>> of work and will take many times. It means that this solution has already
>> been practiced in their prod environment. We have also practiced it in our
>> prod environment and gained benefits, and we are also willing to spend a
>> lot of time contributing to the community.
>> 3. Regarding the benchmark testing, we don't need to pay more attention to
>> whether the performance is improved by 5 times, 10 times or 20 times,
>> because there are too many factors that affect it.
>> 4. As I described above, this solution is already  being practiced by many
>> companies. Right now, we just need to think about how to implement it with
>> high quality and more comprehensively.
>> 5. I firmly believe that all problems can be solved as long as the overall
>> solution is right.
>> 6. I can spend a lot of time leading the promotion of this entire feature
>> and I hope more people can join us in promoting it.
>> 7. You are always welcome to raise your concerns.
>>
>>
>> Thanks Shilun again, I hope you can help review designs and PRs. Thanks
>>
>> On Fri, 26 Apr 2024 at 08:00, slfan1989 <sl...@apache.org> wrote:
>>
>> > Thank you for your hard work! This is a very meaningful improvement, and
>> > from the design document, we can see a significant increase in HDFS
>> > read/write throughput.
>> >
>> > I am happy to see the progress made on HDFS-17384.
>> >
>> > However, I still have some concerns, which roughly involve the following
>> > aspects:
>> >
>> > 1. While ZanderXu and Hui Fei have deep expertise in HDFS and are familiar
>> > with related development details, we still need more community member to
>> > review the code to ensure that the relevant upgrades meet expectations.
>> >
>> > 2. We need more details on benchmarks to ensure that test results can be
>> > reproduced and to allow more community member to participate in the testing
>> > process.
>> >
>> > Looking forward to everything going smoothly in the future.
>> >
>> > Best Regards,
>> > - Shilun Fan.
>> >
>> > On Wed, Apr 24, 2024 at 3:51 PM Xiaoqiao He <he...@apache.org> wrote:
>> >
>> >> cc private@h.a.o.
>> >>
>> >> On Wed, Apr 24, 2024 at 3:35 PM ZanderXu <za...@apache.org> wrote:
>> >> >
>> >> > Here are some summaries about the first phase:
>> >> > 1. There are no big changes in this phase
>> >> > 2. This phase just uses FS lock and BM lock to replace the original
>> >> global
>> >> > lock
>> >> > 3. It's useful to improve the performance, since some operations just
>> >> need
>> >> > to hold FS lock or BM lock instead of the global lock
>> >> > 4. This feature is turned off by default, you can enable it by setting
>> >> > dfs.namenode.lock.model.provider.class to
>> >> > org.apache.hadoop.hdfs.server.namenode.fgl.FineGrainedFSNamesystemLock
>> >> > 5. This phase is very import for the ongoing development of the entire
>> >> FGL
>> >> >
>> >> > Here I would like to express my special thanks to @kokonguyen191 and
>> >> > @yuanboliu for their contributions.  And you are also welcome to join us
>> >> > and complete it together.
>> >> >
>> >> >
>> >> > On Wed, 24 Apr 2024 at 14:54, ZanderXu <za...@apache.org> wrote:
>> >> >
>> >> > > Hi everyone
>> >> > >
>> >> > > All subtasks of the first phase of the FGL have been completed and I
>> >> plan
>> >> > > to merge them into the trunk and start the second phase based on the
>> >> trunk.
>> >> > >
>> >> > > Here is the PR that used to merge the first phases into trunk:
>> >> > > https://github.com/apache/hadoop/pull/6762
>> >> > > Here is the ticket: https://issues.apache.org/jira/browse/HDFS-17384
>> >> > >
>> >> > > I hope you can help to review this PR when you are available and give
>> >> some
>> >> > > ideas.
>> >> > >
>> >> > >
>> >> > > HDFS-17385 <https://issues.apache.org/jira/browse/HDFS-17385> is
>> >> used for
>> >> > > the second phase and I have created some subtasks to describe
>> >> solutions for
>> >> > > some problems, such as: snapshot, getListing, quota.
>> >> > > You are welcome to join us to complete it together.
>> >> > >
>> >> > >
>> >> > > ---------- Forwarded message ---------
>> >> > > From: Zengqiang XU <za...@apache.org>
>> >> > > Date: Fri, 2 Feb 2024 at 11:07
>> >> > > Subject: Discussion about NameNode Fine-grained locking
>> >> > > To: <hd...@hadoop.apache.org>
>> >> > > Cc: Zengqiang XU <xu...@gmail.com>
>> >> > >
>> >> > >
>> >> > > Hi everyone
>> >> > >
>> >> > > I have started a discussion about NameNode Fine-grained Locking to
>> >> improve
>> >> > > performance of write operations in NameNode.
>> >> > >
>> >> > > I started this discussion again for serval main reasons:
>> >> > > 1. We have implemented it and gained nearly 7x performance
>> >> improvement in
>> >> > > our prod environment
>> >> > > 2. Many other companies made similar improvements based on their
>> >> internal
>> >> > > branch.
>> >> > > 3. This topic has been discussed for a long time, but still without
>> >> any
>> >> > > results.
>> >> > >
>> >> > > I hope we can push this important improvement in the community so
>> >> that all
>> >> > > end-users can enjoy this significant improvement.
>> >> > >
>> >> > > I'd really appreciate you can join in and work with me to push this
>> >> > > feature forward.
>> >> > >
>> >> > > Thanks very much.
>> >> > >
>> >> > > Ticket: HDFS-17366 <https://issues.apache.org/jira/browse/HDFS-17366>
>> >> > > Design: NameNode Fine-grained locking based on directory tree
>> >> > > <
>> >> https://docs.google.com/document/d/1X499gHxT0WSU1fj8uo4RuF3GqKxWkWXznXx4tspTBLY/edit?usp=sharing
>> >> >
>> >> > >
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: private-unsubscribe@hadoop.apache.org
>> >> For additional commands, e-mail: private-help@hadoop.apache.org
>> >>
>> >>

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org

Re: Discussion about NameNode Fine-grained locking

Posted by Hui Fei <fe...@gmail.com>.

Thanks for interest and advice on this.

Just would like to share some info here

ZanderXu leads this feature and he has spent a lot of time on it. He is the
main developer in stage 1.  Yuanboliu and Kokonguyen191 also took some
tasks. Other developers (slfan1989 haiyang1987 huangzhaobo99 RocMarshal
kokonguyen191) helped review PRs. (Forgive me if I missed someone)

Actually haiyang1987, Yuanboliu and Kokonguyen191 are also very familiar
with this feature. We discussed many details offline.

Welcome to more people interested in joining the development and review of
the stage 2 and 3.


Zengqiang XU <xu...@gmail.com> 于2024年4月26日周五 14:56写道：

> Thanks Shilun for your response:
>
> 1. This is a big and very useful feature, so it really needs more
> developers to get on board.
> 2. This fine grained lock has been implemented based on internal branches
> and has gained benefits by many companies, such as: Meituan, Kuaishou,
> Bytedance, etc.  But it has not been contributed to the community due to
> various reasons, such as there is a big difference between the version of
> the internal branch and the community trunk branch, the internal branch may
> ignore some functions to make FGL clear, and the contribution needs a lot
> of work and will take many times. It means that this solution has already
> been practiced in their prod environment. We have also practiced it in our
> prod environment and gained benefits, and we are also willing to spend a
> lot of time contributing to the community.
> 3. Regarding the benchmark testing, we don't need to pay more attention to
> whether the performance is improved by 5 times, 10 times or 20 times,
> because there are too many factors that affect it.
> 4. As I described above, this solution is already  being practiced by many
> companies. Right now, we just need to think about how to implement it with
> high quality and more comprehensively.
> 5. I firmly believe that all problems can be solved as long as the overall
> solution is right.
> 6. I can spend a lot of time leading the promotion of this entire feature
> and I hope more people can join us in promoting it.
> 7. You are always welcome to raise your concerns.
>
>
> Thanks Shilun again, I hope you can help review designs and PRs. Thanks
>
> On Fri, 26 Apr 2024 at 08:00, slfan1989 <sl...@apache.org> wrote:
>
> > Thank you for your hard work! This is a very meaningful improvement, and
> > from the design document, we can see a significant increase in HDFS
> > read/write throughput.
> >
> > I am happy to see the progress made on HDFS-17384.
> >
> > However, I still have some concerns, which roughly involve the following
> > aspects:
> >
> > 1. While ZanderXu and Hui Fei have deep expertise in HDFS and are
> familiar
> > with related development details, we still need more community member to
> > review the code to ensure that the relevant upgrades meet expectations.
> >
> > 2. We need more details on benchmarks to ensure that test results can be
> > reproduced and to allow more community member to participate in the
> testing
> > process.
> >
> > Looking forward to everything going smoothly in the future.
> >
> > Best Regards,
> > - Shilun Fan.
> >
> > On Wed, Apr 24, 2024 at 3:51 PM Xiaoqiao He <he...@apache.org>
> wrote:
> >
> >> cc private@h.a.o.
> >>
> >> On Wed, Apr 24, 2024 at 3:35 PM ZanderXu <za...@apache.org> wrote:
> >> >
> >> > Here are some summaries about the first phase:
> >> > 1. There are no big changes in this phase
> >> > 2. This phase just uses FS lock and BM lock to replace the original
> >> global
> >> > lock
> >> > 3. It's useful to improve the performance, since some operations just
> >> need
> >> > to hold FS lock or BM lock instead of the global lock
> >> > 4. This feature is turned off by default, you can enable it by setting
> >> > dfs.namenode.lock.model.provider.class to
> >> > org.apache.hadoop.hdfs.server.namenode.fgl.FineGrainedFSNamesystemLock
> >> > 5. This phase is very import for the ongoing development of the entire
> >> FGL
> >> >
> >> > Here I would like to express my special thanks to @kokonguyen191 and
> >> > @yuanboliu for their contributions.  And you are also welcome to join
> us
> >> > and complete it together.
> >> >
> >> >
> >> > On Wed, 24 Apr 2024 at 14:54, ZanderXu <za...@apache.org> wrote:
> >> >
> >> > > Hi everyone
> >> > >
> >> > > All subtasks of the first phase of the FGL have been completed and I
> >> plan
> >> > > to merge them into the trunk and start the second phase based on the
> >> trunk.
> >> > >
> >> > > Here is the PR that used to merge the first phases into trunk:
> >> > > https://github.com/apache/hadoop/pull/6762
> >> > > Here is the ticket:
> https://issues.apache.org/jira/browse/HDFS-17384
> >> > >
> >> > > I hope you can help to review this PR when you are available and
> give
> >> some
> >> > > ideas.
> >> > >
> >> > >
> >> > > HDFS-17385 <https://issues.apache.org/jira/browse/HDFS-17385> is
> >> used for
> >> > > the second phase and I have created some subtasks to describe
> >> solutions for
> >> > > some problems, such as: snapshot, getListing, quota.
> >> > > You are welcome to join us to complete it together.
> >> > >
> >> > >
> >> > > ---------- Forwarded message ---------
> >> > > From: Zengqiang XU <za...@apache.org>
> >> > > Date: Fri, 2 Feb 2024 at 11:07
> >> > > Subject: Discussion about NameNode Fine-grained locking
> >> > > To: <hd...@hadoop.apache.org>
> >> > > Cc: Zengqiang XU <xu...@gmail.com>
> >> > >
> >> > >
> >> > > Hi everyone
> >> > >
> >> > > I have started a discussion about NameNode Fine-grained Locking to
> >> improve
> >> > > performance of write operations in NameNode.
> >> > >
> >> > > I started this discussion again for serval main reasons:
> >> > > 1. We have implemented it and gained nearly 7x performance
> >> improvement in
> >> > > our prod environment
> >> > > 2. Many other companies made similar improvements based on their
> >> internal
> >> > > branch.
> >> > > 3. This topic has been discussed for a long time, but still without
> >> any
> >> > > results.
> >> > >
> >> > > I hope we can push this important improvement in the community so
> >> that all
> >> > > end-users can enjoy this significant improvement.
> >> > >
> >> > > I'd really appreciate you can join in and work with me to push this
> >> > > feature forward.
> >> > >
> >> > > Thanks very much.
> >> > >
> >> > > Ticket: HDFS-17366 <
> https://issues.apache.org/jira/browse/HDFS-17366>
> >> > > Design: NameNode Fine-grained locking based on directory tree
> >> > > <
> >>
> https://docs.google.com/document/d/1X499gHxT0WSU1fj8uo4RuF3GqKxWkWXznXx4tspTBLY/edit?usp=sharing
> >> >
> >> > >
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: private-unsubscribe@hadoop.apache.org
> >> For additional commands, e-mail: private-help@hadoop.apache.org
> >>
> >>
>

Re: Discussion about NameNode Fine-grained locking

Posted by ZanderXu <za...@apache.org>.

We have created a "hdfs-fgl" channel in slack that you can join to
discuss FGL efficiently.

On Fri, 26 Apr 2024 at 14:56, Zengqiang XU <xu...@gmail.com>
wrote:

> Thanks Shilun for your response:
>
> 1. This is a big and very useful feature, so it really needs more
> developers to get on board.
> 2. This fine grained lock has been implemented based on internal branches
> and has gained benefits by many companies, such as: Meituan, Kuaishou,
> Bytedance, etc.  But it has not been contributed to the community due to
> various reasons, such as there is a big difference between the version of
> the internal branch and the community trunk branch, the internal branch may
> ignore some functions to make FGL clear, and the contribution needs a lot
> of work and will take many times. It means that this solution has already
> been practiced in their prod environment. We have also practiced it in our
> prod environment and gained benefits, and we are also willing to spend a
> lot of time contributing to the community.
> 3. Regarding the benchmark testing, we don't need to pay more attention to
> whether the performance is improved by 5 times, 10 times or 20 times,
> because there are too many factors that affect it.
> 4. As I described above, this solution is already  being practiced by many
> companies. Right now, we just need to think about how to implement it with
> high quality and more comprehensively.
> 5. I firmly believe that all problems can be solved as long as the overall
> solution is right.
> 6. I can spend a lot of time leading the promotion of this entire feature
> and I hope more people can join us in promoting it.
> 7. You are always welcome to raise your concerns.
>
>
> Thanks Shilun again, I hope you can help review designs and PRs. Thanks
>
> On Fri, 26 Apr 2024 at 08:00, slfan1989 <sl...@apache.org> wrote:
>
>> Thank you for your hard work! This is a very meaningful improvement, and
>> from the design document, we can see a significant increase in HDFS
>> read/write throughput.
>>
>> I am happy to see the progress made on HDFS-17384.
>>
>> However, I still have some concerns, which roughly involve the following
>> aspects:
>>
>> 1. While ZanderXu and Hui Fei have deep expertise in HDFS and are
>> familiar with related development details, we still need more community
>> member to review the code to ensure that the relevant upgrades meet
>> expectations.
>>
>> 2. We need more details on benchmarks to ensure that test results can be
>> reproduced and to allow more community member to participate in the testing
>> process.
>>
>> Looking forward to everything going smoothly in the future.
>>
>> Best Regards,
>> - Shilun Fan.
>>
>> On Wed, Apr 24, 2024 at 3:51 PM Xiaoqiao He <he...@apache.org>
>> wrote:
>>
>>> cc private@h.a.o.
>>>
>>> On Wed, Apr 24, 2024 at 3:35 PM ZanderXu <za...@apache.org> wrote:
>>> >
>>> > Here are some summaries about the first phase:
>>> > 1. There are no big changes in this phase
>>> > 2. This phase just uses FS lock and BM lock to replace the original
>>> global
>>> > lock
>>> > 3. It's useful to improve the performance, since some operations just
>>> need
>>> > to hold FS lock or BM lock instead of the global lock
>>> > 4. This feature is turned off by default, you can enable it by setting
>>> > dfs.namenode.lock.model.provider.class to
>>> > org.apache.hadoop.hdfs.server.namenode.fgl.FineGrainedFSNamesystemLock
>>> > 5. This phase is very import for the ongoing development of the entire
>>> FGL
>>> >
>>> > Here I would like to express my special thanks to @kokonguyen191 and
>>> > @yuanboliu for their contributions.  And you are also welcome to join
>>> us
>>> > and complete it together.
>>> >
>>> >
>>> > On Wed, 24 Apr 2024 at 14:54, ZanderXu <za...@apache.org> wrote:
>>> >
>>> > > Hi everyone
>>> > >
>>> > > All subtasks of the first phase of the FGL have been completed and I
>>> plan
>>> > > to merge them into the trunk and start the second phase based on the
>>> trunk.
>>> > >
>>> > > Here is the PR that used to merge the first phases into trunk:
>>> > > https://github.com/apache/hadoop/pull/6762
>>> > > Here is the ticket: https://issues.apache.org/jira/browse/HDFS-17384
>>> > >
>>> > > I hope you can help to review this PR when you are available and
>>> give some
>>> > > ideas.
>>> > >
>>> > >
>>> > > HDFS-17385 <https://issues.apache.org/jira/browse/HDFS-17385> is
>>> used for
>>> > > the second phase and I have created some subtasks to describe
>>> solutions for
>>> > > some problems, such as: snapshot, getListing, quota.
>>> > > You are welcome to join us to complete it together.
>>> > >
>>> > >
>>> > > ---------- Forwarded message ---------
>>> > > From: Zengqiang XU <za...@apache.org>
>>> > > Date: Fri, 2 Feb 2024 at 11:07
>>> > > Subject: Discussion about NameNode Fine-grained locking
>>> > > To: <hd...@hadoop.apache.org>
>>> > > Cc: Zengqiang XU <xu...@gmail.com>
>>> > >
>>> > >
>>> > > Hi everyone
>>> > >
>>> > > I have started a discussion about NameNode Fine-grained Locking to
>>> improve
>>> > > performance of write operations in NameNode.
>>> > >
>>> > > I started this discussion again for serval main reasons:
>>> > > 1. We have implemented it and gained nearly 7x performance
>>> improvement in
>>> > > our prod environment
>>> > > 2. Many other companies made similar improvements based on their
>>> internal
>>> > > branch.
>>> > > 3. This topic has been discussed for a long time, but still without
>>> any
>>> > > results.
>>> > >
>>> > > I hope we can push this important improvement in the community so
>>> that all
>>> > > end-users can enjoy this significant improvement.
>>> > >
>>> > > I'd really appreciate you can join in and work with me to push this
>>> > > feature forward.
>>> > >
>>> > > Thanks very much.
>>> > >
>>> > > Ticket: HDFS-17366 <https://issues.apache.org/jira/browse/HDFS-17366
>>> >
>>> > > Design: NameNode Fine-grained locking based on directory tree
>>> > > <
>>> https://docs.google.com/document/d/1X499gHxT0WSU1fj8uo4RuF3GqKxWkWXznXx4tspTBLY/edit?usp=sharing
>>> >
>>> > >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: private-unsubscribe@hadoop.apache.org
>>> For additional commands, e-mail: private-help@hadoop.apache.org
>>>
>>>

Re: Discussion about NameNode Fine-grained locking

Posted by Zengqiang XU <xu...@gmail.com>.

Thanks Shilun for your response:

1. This is a big and very useful feature, so it really needs more
developers to get on board.
2. This fine grained lock has been implemented based on internal branches
and has gained benefits by many companies, such as: Meituan, Kuaishou,
Bytedance, etc.  But it has not been contributed to the community due to
various reasons, such as there is a big difference between the version of
the internal branch and the community trunk branch, the internal branch may
ignore some functions to make FGL clear, and the contribution needs a lot
of work and will take many times. It means that this solution has already
been practiced in their prod environment. We have also practiced it in our
prod environment and gained benefits, and we are also willing to spend a
lot of time contributing to the community.
3. Regarding the benchmark testing, we don't need to pay more attention to
whether the performance is improved by 5 times, 10 times or 20 times,
because there are too many factors that affect it.
4. As I described above, this solution is already  being practiced by many
companies. Right now, we just need to think about how to implement it with
high quality and more comprehensively.
5. I firmly believe that all problems can be solved as long as the overall
solution is right.
6. I can spend a lot of time leading the promotion of this entire feature
and I hope more people can join us in promoting it.
7. You are always welcome to raise your concerns.


Thanks Shilun again, I hope you can help review designs and PRs. Thanks

On Fri, 26 Apr 2024 at 08:00, slfan1989 <sl...@apache.org> wrote:

> Thank you for your hard work! This is a very meaningful improvement, and
> from the design document, we can see a significant increase in HDFS
> read/write throughput.
>
> I am happy to see the progress made on HDFS-17384.
>
> However, I still have some concerns, which roughly involve the following
> aspects:
>
> 1. While ZanderXu and Hui Fei have deep expertise in HDFS and are familiar
> with related development details, we still need more community member to
> review the code to ensure that the relevant upgrades meet expectations.
>
> 2. We need more details on benchmarks to ensure that test results can be
> reproduced and to allow more community member to participate in the testing
> process.
>
> Looking forward to everything going smoothly in the future.
>
> Best Regards,
> - Shilun Fan.
>
> On Wed, Apr 24, 2024 at 3:51 PM Xiaoqiao He <he...@apache.org> wrote:
>
>> cc private@h.a.o.
>>
>> On Wed, Apr 24, 2024 at 3:35 PM ZanderXu <za...@apache.org> wrote:
>> >
>> > Here are some summaries about the first phase:
>> > 1. There are no big changes in this phase
>> > 2. This phase just uses FS lock and BM lock to replace the original
>> global
>> > lock
>> > 3. It's useful to improve the performance, since some operations just
>> need
>> > to hold FS lock or BM lock instead of the global lock
>> > 4. This feature is turned off by default, you can enable it by setting
>> > dfs.namenode.lock.model.provider.class to
>> > org.apache.hadoop.hdfs.server.namenode.fgl.FineGrainedFSNamesystemLock
>> > 5. This phase is very import for the ongoing development of the entire
>> FGL
>> >
>> > Here I would like to express my special thanks to @kokonguyen191 and
>> > @yuanboliu for their contributions.  And you are also welcome to join us
>> > and complete it together.
>> >
>> >
>> > On Wed, 24 Apr 2024 at 14:54, ZanderXu <za...@apache.org> wrote:
>> >
>> > > Hi everyone
>> > >
>> > > All subtasks of the first phase of the FGL have been completed and I
>> plan
>> > > to merge them into the trunk and start the second phase based on the
>> trunk.
>> > >
>> > > Here is the PR that used to merge the first phases into trunk:
>> > > https://github.com/apache/hadoop/pull/6762
>> > > Here is the ticket: https://issues.apache.org/jira/browse/HDFS-17384
>> > >
>> > > I hope you can help to review this PR when you are available and give
>> some
>> > > ideas.
>> > >
>> > >
>> > > HDFS-17385 <https://issues.apache.org/jira/browse/HDFS-17385> is
>> used for
>> > > the second phase and I have created some subtasks to describe
>> solutions for
>> > > some problems, such as: snapshot, getListing, quota.
>> > > You are welcome to join us to complete it together.
>> > >
>> > >
>> > > ---------- Forwarded message ---------
>> > > From: Zengqiang XU <za...@apache.org>
>> > > Date: Fri, 2 Feb 2024 at 11:07
>> > > Subject: Discussion about NameNode Fine-grained locking
>> > > To: <hd...@hadoop.apache.org>
>> > > Cc: Zengqiang XU <xu...@gmail.com>
>> > >
>> > >
>> > > Hi everyone
>> > >
>> > > I have started a discussion about NameNode Fine-grained Locking to
>> improve
>> > > performance of write operations in NameNode.
>> > >
>> > > I started this discussion again for serval main reasons:
>> > > 1. We have implemented it and gained nearly 7x performance
>> improvement in
>> > > our prod environment
>> > > 2. Many other companies made similar improvements based on their
>> internal
>> > > branch.
>> > > 3. This topic has been discussed for a long time, but still without
>> any
>> > > results.
>> > >
>> > > I hope we can push this important improvement in the community so
>> that all
>> > > end-users can enjoy this significant improvement.
>> > >
>> > > I'd really appreciate you can join in and work with me to push this
>> > > feature forward.
>> > >
>> > > Thanks very much.
>> > >
>> > > Ticket: HDFS-17366 <https://issues.apache.org/jira/browse/HDFS-17366>
>> > > Design: NameNode Fine-grained locking based on directory tree
>> > > <
>> https://docs.google.com/document/d/1X499gHxT0WSU1fj8uo4RuF3GqKxWkWXznXx4tspTBLY/edit?usp=sharing
>> >
>> > >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: private-unsubscribe@hadoop.apache.org
>> For additional commands, e-mail: private-help@hadoop.apache.org
>>
>>

Re: Discussion about NameNode Fine-grained locking

Posted by slfan1989 <sl...@apache.org>.

Thank you for your hard work! This is a very meaningful improvement, and
from the design document, we can see a significant increase in HDFS
read/write throughput.

I am happy to see the progress made on HDFS-17384.

However, I still have some concerns, which roughly involve the following
aspects:

1. While ZanderXu and Hui Fei have deep expertise in HDFS and are familiar
with related development details, we still need more community member to
review the code to ensure that the relevant upgrades meet expectations.

2. We need more details on benchmarks to ensure that test results can be
reproduced and to allow more community member to participate in the testing
process.

Looking forward to everything going smoothly in the future.

Best Regards,
- Shilun Fan.

On Wed, Apr 24, 2024 at 3:51 PM Xiaoqiao He <he...@apache.org> wrote:

> cc private@h.a.o.
>
> On Wed, Apr 24, 2024 at 3:35 PM ZanderXu <za...@apache.org> wrote:
> >
> > Here are some summaries about the first phase:
> > 1. There are no big changes in this phase
> > 2. This phase just uses FS lock and BM lock to replace the original
> global
> > lock
> > 3. It's useful to improve the performance, since some operations just
> need
> > to hold FS lock or BM lock instead of the global lock
> > 4. This feature is turned off by default, you can enable it by setting
> > dfs.namenode.lock.model.provider.class to
> > org.apache.hadoop.hdfs.server.namenode.fgl.FineGrainedFSNamesystemLock
> > 5. This phase is very import for the ongoing development of the entire
> FGL
> >
> > Here I would like to express my special thanks to @kokonguyen191 and
> > @yuanboliu for their contributions.  And you are also welcome to join us
> > and complete it together.
> >
> >
> > On Wed, 24 Apr 2024 at 14:54, ZanderXu <za...@apache.org> wrote:
> >
> > > Hi everyone
> > >
> > > All subtasks of the first phase of the FGL have been completed and I
> plan
> > > to merge them into the trunk and start the second phase based on the
> trunk.
> > >
> > > Here is the PR that used to merge the first phases into trunk:
> > > https://github.com/apache/hadoop/pull/6762
> > > Here is the ticket: https://issues.apache.org/jira/browse/HDFS-17384
> > >
> > > I hope you can help to review this PR when you are available and give
> some
> > > ideas.
> > >
> > >
> > > HDFS-17385 <https://issues.apache.org/jira/browse/HDFS-17385> is used
> for
> > > the second phase and I have created some subtasks to describe
> solutions for
> > > some problems, such as: snapshot, getListing, quota.
> > > You are welcome to join us to complete it together.
> > >
> > >
> > > ---------- Forwarded message ---------
> > > From: Zengqiang XU <za...@apache.org>
> > > Date: Fri, 2 Feb 2024 at 11:07
> > > Subject: Discussion about NameNode Fine-grained locking
> > > To: <hd...@hadoop.apache.org>
> > > Cc: Zengqiang XU <xu...@gmail.com>
> > >
> > >
> > > Hi everyone
> > >
> > > I have started a discussion about NameNode Fine-grained Locking to
> improve
> > > performance of write operations in NameNode.
> > >
> > > I started this discussion again for serval main reasons:
> > > 1. We have implemented it and gained nearly 7x performance improvement
> in
> > > our prod environment
> > > 2. Many other companies made similar improvements based on their
> internal
> > > branch.
> > > 3. This topic has been discussed for a long time, but still without any
> > > results.
> > >
> > > I hope we can push this important improvement in the community so that
> all
> > > end-users can enjoy this significant improvement.
> > >
> > > I'd really appreciate you can join in and work with me to push this
> > > feature forward.
> > >
> > > Thanks very much.
> > >
> > > Ticket: HDFS-17366 <https://issues.apache.org/jira/browse/HDFS-17366>
> > > Design: NameNode Fine-grained locking based on directory tree
> > > <
> https://docs.google.com/document/d/1X499gHxT0WSU1fj8uo4RuF3GqKxWkWXznXx4tspTBLY/edit?usp=sharing
> >
> > >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: private-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: private-help@hadoop.apache.org
>
>

Re: Discussion about NameNode Fine-grained locking

Posted by Xiaoqiao He <he...@apache.org>.

cc private@h.a.o.

On Wed, Apr 24, 2024 at 3:35 PM ZanderXu <za...@apache.org> wrote:
>
> Here are some summaries about the first phase:
> 1. There are no big changes in this phase
> 2. This phase just uses FS lock and BM lock to replace the original global
> lock
> 3. It's useful to improve the performance, since some operations just need
> to hold FS lock or BM lock instead of the global lock
> 4. This feature is turned off by default, you can enable it by setting
> dfs.namenode.lock.model.provider.class to
> org.apache.hadoop.hdfs.server.namenode.fgl.FineGrainedFSNamesystemLock
> 5. This phase is very import for the ongoing development of the entire FGL
>
> Here I would like to express my special thanks to @kokonguyen191 and
> @yuanboliu for their contributions.  And you are also welcome to join us
> and complete it together.
>
>
> On Wed, 24 Apr 2024 at 14:54, ZanderXu <za...@apache.org> wrote:
>
> > Hi everyone
> >
> > All subtasks of the first phase of the FGL have been completed and I plan
> > to merge them into the trunk and start the second phase based on the trunk.
> >
> > Here is the PR that used to merge the first phases into trunk:
> > https://github.com/apache/hadoop/pull/6762
> > Here is the ticket: https://issues.apache.org/jira/browse/HDFS-17384
> >
> > I hope you can help to review this PR when you are available and give some
> > ideas.
> >
> >
> > HDFS-17385 <https://issues.apache.org/jira/browse/HDFS-17385> is used for
> > the second phase and I have created some subtasks to describe solutions for
> > some problems, such as: snapshot, getListing, quota.
> > You are welcome to join us to complete it together.
> >
> >
> > ---------- Forwarded message ---------
> > From: Zengqiang XU <za...@apache.org>
> > Date: Fri, 2 Feb 2024 at 11:07
> > Subject: Discussion about NameNode Fine-grained locking
> > To: <hd...@hadoop.apache.org>
> > Cc: Zengqiang XU <xu...@gmail.com>
> >
> >
> > Hi everyone
> >
> > I have started a discussion about NameNode Fine-grained Locking to improve
> > performance of write operations in NameNode.
> >
> > I started this discussion again for serval main reasons:
> > 1. We have implemented it and gained nearly 7x performance improvement in
> > our prod environment
> > 2. Many other companies made similar improvements based on their internal
> > branch.
> > 3. This topic has been discussed for a long time, but still without any
> > results.
> >
> > I hope we can push this important improvement in the community so that all
> > end-users can enjoy this significant improvement.
> >
> > I'd really appreciate you can join in and work with me to push this
> > feature forward.
> >
> > Thanks very much.
> >
> > Ticket: HDFS-17366 <https://issues.apache.org/jira/browse/HDFS-17366>
> > Design: NameNode Fine-grained locking based on directory tree
> > <https://docs.google.com/document/d/1X499gHxT0WSU1fj8uo4RuF3GqKxWkWXznXx4tspTBLY/edit?usp=sharing>
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org

Re: Discussion about NameNode Fine-grained locking

Posted by ZanderXu <za...@apache.org>.

Here are some summaries about the first phase:
1. There are no big changes in this phase
2. This phase just uses FS lock and BM lock to replace the original global
lock
3. It's useful to improve the performance, since some operations just need
to hold FS lock or BM lock instead of the global lock
4. This feature is turned off by default, you can enable it by setting
dfs.namenode.lock.model.provider.class to
org.apache.hadoop.hdfs.server.namenode.fgl.FineGrainedFSNamesystemLock
5. This phase is very import for the ongoing development of the entire FGL

Here I would like to express my special thanks to @kokonguyen191 and
@yuanboliu for their contributions.  And you are also welcome to join us
and complete it together.


On Wed, 24 Apr 2024 at 14:54, ZanderXu <za...@apache.org> wrote:

> Hi everyone
>
> All subtasks of the first phase of the FGL have been completed and I plan
> to merge them into the trunk and start the second phase based on the trunk.
>
> Here is the PR that used to merge the first phases into trunk:
> https://github.com/apache/hadoop/pull/6762
> Here is the ticket: https://issues.apache.org/jira/browse/HDFS-17384
>
> I hope you can help to review this PR when you are available and give some
> ideas.
>
>
> HDFS-17385 <https://issues.apache.org/jira/browse/HDFS-17385> is used for
> the second phase and I have created some subtasks to describe solutions for
> some problems, such as: snapshot, getListing, quota.
> You are welcome to join us to complete it together.
>
>
> ---------- Forwarded message ---------
> From: Zengqiang XU <za...@apache.org>
> Date: Fri, 2 Feb 2024 at 11:07
> Subject: Discussion about NameNode Fine-grained locking
> To: <hd...@hadoop.apache.org>
> Cc: Zengqiang XU <xu...@gmail.com>
>
>
> Hi everyone
>
> I have started a discussion about NameNode Fine-grained Locking to improve
> performance of write operations in NameNode.
>
> I started this discussion again for serval main reasons:
> 1. We have implemented it and gained nearly 7x performance improvement in
> our prod environment
> 2. Many other companies made similar improvements based on their internal
> branch.
> 3. This topic has been discussed for a long time, but still without any
> results.
>
> I hope we can push this important improvement in the community so that all
> end-users can enjoy this significant improvement.
>
> I'd really appreciate you can join in and work with me to push this
> feature forward.
>
> Thanks very much.
>
> Ticket: HDFS-17366 <https://issues.apache.org/jira/browse/HDFS-17366>
> Design: NameNode Fine-grained locking based on directory tree
> <https://docs.google.com/document/d/1X499gHxT0WSU1fj8uo4RuF3GqKxWkWXznXx4tspTBLY/edit?usp=sharing>
>