You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-dev@hadoop.apache.org by Konstantin Shvachko <sh...@gmail.com> on 2018/12/06 01:27:33 UTC

[VOTE] Merge HDFS-12943 branch to trunk - Consistent Reads from Standby

Hi Hadoop developers,

I would like to propose to merge to trunk the feature branch HDFS-12943 for
Consistent Reads from Standby Node. The feature is intended to scale read
RPC workloads. On large clusters reads comprise 95% of all RPCs to the
NameNode. We should be able to accommodate higher overall RPC workloads (up
to 4x by some estimates) by adding multiple ObserverNodes.

The main functionality has been implemented see sub-tasks of HDFS-12943.
We followed up with the test plan. Testing was done on two independent
clusters (see HDFS-14058 and HDFS-14059) with security enabled.
We ran standard HDFS commands, MR jobs, admin commands including manual
failover.
We know of one cluster running this feature in production.

There are a few outstanding issues:
1. Need to provide proper documentation - a user guide for the new feature
2. Need to fix automatic failover with ZKFC. Currently it does not doesn't
know about ObserverNodes trying to convert them to SBNs.
3. Scale testing and performance fine-tuning
4. As testing progresses, we continue fixing non-critical bugs like
HDFS-14116.

I attached a unified patch to the umbrella jira for the review and Jenkins
build.
Please vote on this thread. The vote will run for 7 days until Wed Dec 12.

Thanks,
--Konstantin

Re: [VOTE] Merge HDFS-12943 branch to trunk - Consistent Reads from Standby

Posted by Zhe Zhang <zh...@apache.org>.

+1 (binding)

Thanks Konstantin for leading the merge effort!

I worked very closely with Chen, Konstantin, and Erik in the testing stage
and I feel confident that the feature has now completed designed
functionalities and has proven to be stable.

Great team work with contributors from multiple companies!

On Wed, Dec 5, 2018 at 5:27 PM Konstantin Shvachko <sh...@gmail.com>
wrote:

> Hi Hadoop developers,
>
> I would like to propose to merge to trunk the feature branch HDFS-12943 for
> Consistent Reads from Standby Node. The feature is intended to scale read
> RPC workloads. On large clusters reads comprise 95% of all RPCs to the
> NameNode. We should be able to accommodate higher overall RPC workloads (up
> to 4x by some estimates) by adding multiple ObserverNodes.
>
> The main functionality has been implemented see sub-tasks of HDFS-12943.
> We followed up with the test plan. Testing was done on two independent
> clusters (see HDFS-14058 and HDFS-14059) with security enabled.
> We ran standard HDFS commands, MR jobs, admin commands including manual
> failover.
> We know of one cluster running this feature in production.
>
> There are a few outstanding issues:
> 1. Need to provide proper documentation - a user guide for the new feature
> 2. Need to fix automatic failover with ZKFC. Currently it does not doesn't
> know about ObserverNodes trying to convert them to SBNs.
> 3. Scale testing and performance fine-tuning
> 4. As testing progresses, we continue fixing non-critical bugs like
> HDFS-14116.
>
> I attached a unified patch to the umbrella jira for the review and Jenkins
> build.
> Please vote on this thread. The vote will run for 7 days until Wed Dec 12.
>
> Thanks,
> --Konstantin
>
-- 
Zhe Zhang
Apache Hadoop Committer
http://zhe-thoughts.github.io/about/ | @oldcap

Re: [VOTE] Merge HDFS-12943 branch to trunk - Consistent Reads from Standby

Posted by Zhe Zhang <zh...@apache.org>.

+1 (binding)

Thanks Konstantin for leading the merge effort!

I worked very closely with Chen, Konstantin, and Erik in the testing stage
and I feel confident that the feature has now completed designed
functionalities and has proven to be stable.

Great team work with contributors from multiple companies!

On Wed, Dec 5, 2018 at 5:27 PM Konstantin Shvachko <sh...@gmail.com>
wrote:

> Hi Hadoop developers,
>
> I would like to propose to merge to trunk the feature branch HDFS-12943 for
> Consistent Reads from Standby Node. The feature is intended to scale read
> RPC workloads. On large clusters reads comprise 95% of all RPCs to the
> NameNode. We should be able to accommodate higher overall RPC workloads (up
> to 4x by some estimates) by adding multiple ObserverNodes.
>
> The main functionality has been implemented see sub-tasks of HDFS-12943.
> We followed up with the test plan. Testing was done on two independent
> clusters (see HDFS-14058 and HDFS-14059) with security enabled.
> We ran standard HDFS commands, MR jobs, admin commands including manual
> failover.
> We know of one cluster running this feature in production.
>
> There are a few outstanding issues:
> 1. Need to provide proper documentation - a user guide for the new feature
> 2. Need to fix automatic failover with ZKFC. Currently it does not doesn't
> know about ObserverNodes trying to convert them to SBNs.
> 3. Scale testing and performance fine-tuning
> 4. As testing progresses, we continue fixing non-critical bugs like
> HDFS-14116.
>
> I attached a unified patch to the umbrella jira for the review and Jenkins
> build.
> Please vote on this thread. The vote will run for 7 days until Wed Dec 12.
>
> Thanks,
> --Konstantin
>
-- 
Zhe Zhang
Apache Hadoop Committer
http://zhe-thoughts.github.io/about/ | @oldcap

Re: [VOTE] Merge HDFS-12943 branch to trunk - Consistent Reads from Standby

Posted by Chen Liang <va...@gmail.com>.

Hi Daryn,

This is an interesting and valid point to consider different implications
for security.

The purpose of the alignment context is to allow clients and servers sync
on their global state, so that when clients switch between ANN/SBN or
between SBNs, the reads are always consistent. One reason of doing this on
RPC layer so that it is decoupled from client logic. Handlers reinserting
the call to the queue is a part of implementing the catch-up logic in
HDFS-13767 that standby waits until it receives all transactions to catch
up with the client's state.

By using RetriableExceptions, I assume you mean letting client retry if
server state is not ready? We did consider similar approach, but that
introduces multiple RPC calls for a single operation, adding overhead to
RPC queue which is already often a bottleneck as we've seen. To this
extend, even with RetriableException, it appears to me a buggy client can
still hurt NameNode although in a different way.

I agree that calls can potentially get stuck in the queue for a long time,
which can cause serious issues. We do have plans to introduce logic, which
makes Obsrever reject client requests if it has fallen too far behind the
client's state, please see HDFS-13873. Then Observer simply rejects the
call, and lets the client retry with other Observers or go straight to ANN.
This would free the Observer from serving this call and thus limit how much
damage a malicious client can do to it. Secondly, reinserting to queue
should by design only happen on Observer nodes, but never on ANN, so the
damage of potentially bad calls would not affect ANN. Meaning even in an
unlikely case that Observers were overloaded because of buggy/malicious
client calls, due to the rejection logic the clients will all end up
talking to ANN, which is still no worse than what we have today.

Thanks,
Chen

Konstantin Shvachko <sh...@gmail.com> 于2018年12月6日周四 上午11:23写道：

> Hi Yongjun,
>
> Automatic failover sure needs to be fixed (see HDFS-14130 and HDFS-13182).
> Along with all other outstanding issues. We plan to continue this on trunk.
> The feature is usable now without this issues (see HDFS-14067).
> And we would like to get it in, so that people could have early access,
> and so that newly developed features were aware of this functionality.
> Let us know if you have other suggestions.
>
> Thanks,
> --Konstantin
>
> On Wed, Dec 5, 2018 at 11:24 PM Yongjun Zhang <yz...@cloudera.com> wrote:
>
> > Great work guys.
> >
> > Wonder if we can elaborate what's impact of not having #2 fixed, and why
> > #2 is not needed for the feature to complete?
> > 2. Need to fix automatic failover with ZKFC. Currently it does not
> doesn't
> > know about ObserverNodes trying to convert them to SBNs.
> >
> > Thanks.
> > --Yongjun
> >
> >
> > On Wed, Dec 5, 2018 at 5:27 PM Konstantin Shvachko <shv.hadoop@gmail.com
> >
> > wrote:
> >
> >> Hi Hadoop developers,
> >>
> >> I would like to propose to merge to trunk the feature branch HDFS-12943
> >> for
> >> Consistent Reads from Standby Node. The feature is intended to scale
> read
> >> RPC workloads. On large clusters reads comprise 95% of all RPCs to the
> >> NameNode. We should be able to accommodate higher overall RPC workloads
> >> (up
> >> to 4x by some estimates) by adding multiple ObserverNodes.
> >>
> >> The main functionality has been implemented see sub-tasks of HDFS-12943.
> >> We followed up with the test plan. Testing was done on two independent
> >> clusters (see HDFS-14058 and HDFS-14059) with security enabled.
> >> We ran standard HDFS commands, MR jobs, admin commands including manual
> >> failover.
> >> We know of one cluster running this feature in production.
> >>
> >> There are a few outstanding issues:
> >> 1. Need to provide proper documentation - a user guide for the new
> feature
> >> 2. Need to fix automatic failover with ZKFC. Currently it does not
> doesn't
> >> know about ObserverNodes trying to convert them to SBNs.
> >> 3. Scale testing and performance fine-tuning
> >> 4. As testing progresses, we continue fixing non-critical bugs like
> >> HDFS-14116.
> >>
> >> I attached a unified patch to the umbrella jira for the review and
> Jenkins
> >> build.
> >> Please vote on this thread. The vote will run for 7 days until Wed Dec
> 12.
> >>
> >> Thanks,
> >> --Konstantin
> >>
> >
>

Re: [VOTE] Merge HDFS-12943 branch to trunk - Consistent Reads from Standby

Posted by Chen Liang <va...@gmail.com>.

Hi Daryn,

This is an interesting and valid point to consider different implications
for security.

The purpose of the alignment context is to allow clients and servers sync
on their global state, so that when clients switch between ANN/SBN or
between SBNs, the reads are always consistent. One reason of doing this on
RPC layer so that it is decoupled from client logic. Handlers reinserting
the call to the queue is a part of implementing the catch-up logic in
HDFS-13767 that standby waits until it receives all transactions to catch
up with the client's state.

By using RetriableExceptions, I assume you mean letting client retry if
server state is not ready? We did consider similar approach, but that
introduces multiple RPC calls for a single operation, adding overhead to
RPC queue which is already often a bottleneck as we've seen. To this
extend, even with RetriableException, it appears to me a buggy client can
still hurt NameNode although in a different way.

I agree that calls can potentially get stuck in the queue for a long time,
which can cause serious issues. We do have plans to introduce logic, which
makes Obsrever reject client requests if it has fallen too far behind the
client's state, please see HDFS-13873. Then Observer simply rejects the
call, and lets the client retry with other Observers or go straight to ANN.
This would free the Observer from serving this call and thus limit how much
damage a malicious client can do to it. Secondly, reinserting to queue
should by design only happen on Observer nodes, but never on ANN, so the
damage of potentially bad calls would not affect ANN. Meaning even in an
unlikely case that Observers were overloaded because of buggy/malicious
client calls, due to the rejection logic the clients will all end up
talking to ANN, which is still no worse than what we have today.

Thanks,
Chen

Konstantin Shvachko <sh...@gmail.com> 于2018年12月6日周四 上午11:23写道：

> Hi Yongjun,
>
> Automatic failover sure needs to be fixed (see HDFS-14130 and HDFS-13182).
> Along with all other outstanding issues. We plan to continue this on trunk.
> The feature is usable now without this issues (see HDFS-14067).
> And we would like to get it in, so that people could have early access,
> and so that newly developed features were aware of this functionality.
> Let us know if you have other suggestions.
>
> Thanks,
> --Konstantin
>
> On Wed, Dec 5, 2018 at 11:24 PM Yongjun Zhang <yz...@cloudera.com> wrote:
>
> > Great work guys.
> >
> > Wonder if we can elaborate what's impact of not having #2 fixed, and why
> > #2 is not needed for the feature to complete?
> > 2. Need to fix automatic failover with ZKFC. Currently it does not
> doesn't
> > know about ObserverNodes trying to convert them to SBNs.
> >
> > Thanks.
> > --Yongjun
> >
> >
> > On Wed, Dec 5, 2018 at 5:27 PM Konstantin Shvachko <shv.hadoop@gmail.com
> >
> > wrote:
> >
> >> Hi Hadoop developers,
> >>
> >> I would like to propose to merge to trunk the feature branch HDFS-12943
> >> for
> >> Consistent Reads from Standby Node. The feature is intended to scale
> read
> >> RPC workloads. On large clusters reads comprise 95% of all RPCs to the
> >> NameNode. We should be able to accommodate higher overall RPC workloads
> >> (up
> >> to 4x by some estimates) by adding multiple ObserverNodes.
> >>
> >> The main functionality has been implemented see sub-tasks of HDFS-12943.
> >> We followed up with the test plan. Testing was done on two independent
> >> clusters (see HDFS-14058 and HDFS-14059) with security enabled.
> >> We ran standard HDFS commands, MR jobs, admin commands including manual
> >> failover.
> >> We know of one cluster running this feature in production.
> >>
> >> There are a few outstanding issues:
> >> 1. Need to provide proper documentation - a user guide for the new
> feature
> >> 2. Need to fix automatic failover with ZKFC. Currently it does not
> doesn't
> >> know about ObserverNodes trying to convert them to SBNs.
> >> 3. Scale testing and performance fine-tuning
> >> 4. As testing progresses, we continue fixing non-critical bugs like
> >> HDFS-14116.
> >>
> >> I attached a unified patch to the umbrella jira for the review and
> Jenkins
> >> build.
> >> Please vote on this thread. The vote will run for 7 days until Wed Dec
> 12.
> >>
> >> Thanks,
> >> --Konstantin
> >>
> >
>

Re: [VOTE] Merge HDFS-12943 branch to trunk - Consistent Reads from Standby

Posted by Konstantin Shvachko <sh...@gmail.com>.

Hi Yongjun,

Automatic failover sure needs to be fixed (see HDFS-14130 and HDFS-13182).
Along with all other outstanding issues. We plan to continue this on trunk.
The feature is usable now without this issues (see HDFS-14067).
And we would like to get it in, so that people could have early access,
and so that newly developed features were aware of this functionality.
Let us know if you have other suggestions.

Thanks,
--Konstantin

On Wed, Dec 5, 2018 at 11:24 PM Yongjun Zhang <yz...@cloudera.com> wrote:

> Great work guys.
>
> Wonder if we can elaborate what's impact of not having #2 fixed, and why
> #2 is not needed for the feature to complete?
> 2. Need to fix automatic failover with ZKFC. Currently it does not doesn't
> know about ObserverNodes trying to convert them to SBNs.
>
> Thanks.
> --Yongjun
>
>
> On Wed, Dec 5, 2018 at 5:27 PM Konstantin Shvachko <sh...@gmail.com>
> wrote:
>
>> Hi Hadoop developers,
>>
>> I would like to propose to merge to trunk the feature branch HDFS-12943
>> for
>> Consistent Reads from Standby Node. The feature is intended to scale read
>> RPC workloads. On large clusters reads comprise 95% of all RPCs to the
>> NameNode. We should be able to accommodate higher overall RPC workloads
>> (up
>> to 4x by some estimates) by adding multiple ObserverNodes.
>>
>> The main functionality has been implemented see sub-tasks of HDFS-12943.
>> We followed up with the test plan. Testing was done on two independent
>> clusters (see HDFS-14058 and HDFS-14059) with security enabled.
>> We ran standard HDFS commands, MR jobs, admin commands including manual
>> failover.
>> We know of one cluster running this feature in production.
>>
>> There are a few outstanding issues:
>> 1. Need to provide proper documentation - a user guide for the new feature
>> 2. Need to fix automatic failover with ZKFC. Currently it does not doesn't
>> know about ObserverNodes trying to convert them to SBNs.
>> 3. Scale testing and performance fine-tuning
>> 4. As testing progresses, we continue fixing non-critical bugs like
>> HDFS-14116.
>>
>> I attached a unified patch to the umbrella jira for the review and Jenkins
>> build.
>> Please vote on this thread. The vote will run for 7 days until Wed Dec 12.
>>
>> Thanks,
>> --Konstantin
>>
>

Re: [VOTE] Merge HDFS-12943 branch to trunk - Consistent Reads from Standby

Posted by Konstantin Shvachko <sh...@gmail.com>.

Hi Yongjun,

Automatic failover sure needs to be fixed (see HDFS-14130 and HDFS-13182).
Along with all other outstanding issues. We plan to continue this on trunk.
The feature is usable now without this issues (see HDFS-14067).
And we would like to get it in, so that people could have early access,
and so that newly developed features were aware of this functionality.
Let us know if you have other suggestions.

Thanks,
--Konstantin

On Wed, Dec 5, 2018 at 11:24 PM Yongjun Zhang <yz...@cloudera.com> wrote:

> Great work guys.
>
> Wonder if we can elaborate what's impact of not having #2 fixed, and why
> #2 is not needed for the feature to complete?
> 2. Need to fix automatic failover with ZKFC. Currently it does not doesn't
> know about ObserverNodes trying to convert them to SBNs.
>
> Thanks.
> --Yongjun
>
>
> On Wed, Dec 5, 2018 at 5:27 PM Konstantin Shvachko <sh...@gmail.com>
> wrote:
>
>> Hi Hadoop developers,
>>
>> I would like to propose to merge to trunk the feature branch HDFS-12943
>> for
>> Consistent Reads from Standby Node. The feature is intended to scale read
>> RPC workloads. On large clusters reads comprise 95% of all RPCs to the
>> NameNode. We should be able to accommodate higher overall RPC workloads
>> (up
>> to 4x by some estimates) by adding multiple ObserverNodes.
>>
>> The main functionality has been implemented see sub-tasks of HDFS-12943.
>> We followed up with the test plan. Testing was done on two independent
>> clusters (see HDFS-14058 and HDFS-14059) with security enabled.
>> We ran standard HDFS commands, MR jobs, admin commands including manual
>> failover.
>> We know of one cluster running this feature in production.
>>
>> There are a few outstanding issues:
>> 1. Need to provide proper documentation - a user guide for the new feature
>> 2. Need to fix automatic failover with ZKFC. Currently it does not doesn't
>> know about ObserverNodes trying to convert them to SBNs.
>> 3. Scale testing and performance fine-tuning
>> 4. As testing progresses, we continue fixing non-critical bugs like
>> HDFS-14116.
>>
>> I attached a unified patch to the umbrella jira for the review and Jenkins
>> build.
>> Please vote on this thread. The vote will run for 7 days until Wed Dec 12.
>>
>> Thanks,
>> --Konstantin
>>
>

Re: [VOTE] Merge HDFS-12943 branch to trunk - Consistent Reads from Standby

Posted by Konstantin Shvachko <sh...@gmail.com>.

Hi Yongjun,

Good suggestion. This is essentially what HDFS-13873 is implementing to
mitigate the concern.

Thanks,
--Konstantin

On Wed, Dec 12, 2018 at 10:35 PM Yongjun Zhang <yz...@cloudera.com> wrote:

> Hi Konstantin,
>
> Thanks for addressing my other question about failover.
>
> Some thought to share about the suggestion Daryn made.  Seems we could try
> this: let ObserverNode throws an RetriableException back to client saying
> it has not reached the transaction ID to serve the client yet, maybe even
> include the transaction ID gap information in the exception, then when the
> client received the RetriableException, it can decide whether the continue
> to send the request to the observer node again, or to the active NN when
> the gap is too big.
>
> Though saving another RPC would help the performance with the current
> implementation, I expect the above mentioned exception only happens
> infrequently, so the performance won't be too bad, plus the client has a
> chance to try ANN when knowing that the observer is too behind at extreme
> case.
>
> I wonder how different the performance is between these two approaches in
> cluster with real workload.
>
> Comments?
>
> --Yongjun
>
> On Fri, Dec 7, 2018 at 4:10 PM Konstantin Shvachko <sh...@gmail.com>
> wrote:
>
>> Hi Daryn,
>>
>> Wanted to backup Chen's earlier response to your concerns about rotating
>> calls in the call queue.
>> Our design
>> 1. targets directly the livelock problem by rejecting calls on the
>> Observer
>> that are not likely to be responded in timely matter: HDFS-13873.
>> 2. The call queue rotation is only done on Observers, and never on the
>> active NN, so it stays free of attacks like you suggest.
>>
>> If this is a satisfactory mitigation for the problem could you please
>> reconsider your -1, so that people could continue voting on this thread.
>>
>> Thanks,
>> --Konst
>>
>> On Thu, Dec 6, 2018 at 10:38 AM Daryn Sharp <da...@oath.com> wrote:
>>
>> > -1 pending additional info.  After a cursory scan, I have serious
>> concerns
>> > regarding the design.  This seems like a feature that should have been
>> > purely implemented in hdfs w/o touching the common IPC layer.
>> >
>> > The biggest issue in the alignment context.  It's purpose appears to be
>> > for allowing handlers to reinsert calls back into the call queue.
>> That's
>> > completely unacceptable.  A buggy or malicious client can easily cause
>> > livelock in the IPC layer with handlers only looping on calls that never
>> > satisfy the condition.  Why is this not implemented via
>> RetriableExceptions?
>> >
>> > On Thu, Dec 6, 2018 at 1:24 AM Yongjun Zhang
>> <yz...@cloudera.com.invalid>
>> > wrote:
>> >
>> >> Great work guys.
>> >>
>> >> Wonder if we can elaborate what's impact of not having #2 fixed, and
>> why
>> >> #2
>> >> is not needed for the feature to complete?
>> >> 2. Need to fix automatic failover with ZKFC. Currently it does not
>> doesn't
>> >> know about ObserverNodes trying to convert them to SBNs.
>> >>
>> >> Thanks.
>> >> --Yongjun
>> >>
>> >>
>> >> On Wed, Dec 5, 2018 at 5:27 PM Konstantin Shvachko <
>> shv.hadoop@gmail.com>
>> >> wrote:
>> >>
>> >> > Hi Hadoop developers,
>> >> >
>> >> > I would like to propose to merge to trunk the feature branch
>> HDFS-12943
>> >> for
>> >> > Consistent Reads from Standby Node. The feature is intended to scale
>> >> read
>> >> > RPC workloads. On large clusters reads comprise 95% of all RPCs to
>> the
>> >> > NameNode. We should be able to accommodate higher overall RPC
>> workloads
>> >> (up
>> >> > to 4x by some estimates) by adding multiple ObserverNodes.
>> >> >
>> >> > The main functionality has been implemented see sub-tasks of
>> HDFS-12943.
>> >> > We followed up with the test plan. Testing was done on two
>> independent
>> >> > clusters (see HDFS-14058 and HDFS-14059) with security enabled.
>> >> > We ran standard HDFS commands, MR jobs, admin commands including
>> manual
>> >> > failover.
>> >> > We know of one cluster running this feature in production.
>> >> >
>> >> > There are a few outstanding issues:
>> >> > 1. Need to provide proper documentation - a user guide for the new
>> >> feature
>> >> > 2. Need to fix automatic failover with ZKFC. Currently it does not
>> >> doesn't
>> >> > know about ObserverNodes trying to convert them to SBNs.
>> >> > 3. Scale testing and performance fine-tuning
>> >> > 4. As testing progresses, we continue fixing non-critical bugs like
>> >> > HDFS-14116.
>> >> >
>> >> > I attached a unified patch to the umbrella jira for the review and
>> >> Jenkins
>> >> > build.
>> >> > Please vote on this thread. The vote will run for 7 days until Wed
>> Dec
>> >> 12.
>> >> >
>> >> > Thanks,
>> >> > --Konstantin
>> >> >
>> >>
>> >
>> >
>> > --
>> >
>> > Daryn
>> >
>>
>

Re: [VOTE] Merge HDFS-12943 branch to trunk - Consistent Reads from Standby

Posted by Konstantin Shvachko <sh...@gmail.com>.

Hi Yongjun,

Good suggestion. This is essentially what HDFS-13873 is implementing to
mitigate the concern.

Thanks,
--Konstantin

On Wed, Dec 12, 2018 at 10:35 PM Yongjun Zhang <yz...@cloudera.com> wrote:

> Hi Konstantin,
>
> Thanks for addressing my other question about failover.
>
> Some thought to share about the suggestion Daryn made.  Seems we could try
> this: let ObserverNode throws an RetriableException back to client saying
> it has not reached the transaction ID to serve the client yet, maybe even
> include the transaction ID gap information in the exception, then when the
> client received the RetriableException, it can decide whether the continue
> to send the request to the observer node again, or to the active NN when
> the gap is too big.
>
> Though saving another RPC would help the performance with the current
> implementation, I expect the above mentioned exception only happens
> infrequently, so the performance won't be too bad, plus the client has a
> chance to try ANN when knowing that the observer is too behind at extreme
> case.
>
> I wonder how different the performance is between these two approaches in
> cluster with real workload.
>
> Comments?
>
> --Yongjun
>
> On Fri, Dec 7, 2018 at 4:10 PM Konstantin Shvachko <sh...@gmail.com>
> wrote:
>
>> Hi Daryn,
>>
>> Wanted to backup Chen's earlier response to your concerns about rotating
>> calls in the call queue.
>> Our design
>> 1. targets directly the livelock problem by rejecting calls on the
>> Observer
>> that are not likely to be responded in timely matter: HDFS-13873.
>> 2. The call queue rotation is only done on Observers, and never on the
>> active NN, so it stays free of attacks like you suggest.
>>
>> If this is a satisfactory mitigation for the problem could you please
>> reconsider your -1, so that people could continue voting on this thread.
>>
>> Thanks,
>> --Konst
>>
>> On Thu, Dec 6, 2018 at 10:38 AM Daryn Sharp <da...@oath.com> wrote:
>>
>> > -1 pending additional info.  After a cursory scan, I have serious
>> concerns
>> > regarding the design.  This seems like a feature that should have been
>> > purely implemented in hdfs w/o touching the common IPC layer.
>> >
>> > The biggest issue in the alignment context.  It's purpose appears to be
>> > for allowing handlers to reinsert calls back into the call queue.
>> That's
>> > completely unacceptable.  A buggy or malicious client can easily cause
>> > livelock in the IPC layer with handlers only looping on calls that never
>> > satisfy the condition.  Why is this not implemented via
>> RetriableExceptions?
>> >
>> > On Thu, Dec 6, 2018 at 1:24 AM Yongjun Zhang
>> <yz...@cloudera.com.invalid>
>> > wrote:
>> >
>> >> Great work guys.
>> >>
>> >> Wonder if we can elaborate what's impact of not having #2 fixed, and
>> why
>> >> #2
>> >> is not needed for the feature to complete?
>> >> 2. Need to fix automatic failover with ZKFC. Currently it does not
>> doesn't
>> >> know about ObserverNodes trying to convert them to SBNs.
>> >>
>> >> Thanks.
>> >> --Yongjun
>> >>
>> >>
>> >> On Wed, Dec 5, 2018 at 5:27 PM Konstantin Shvachko <
>> shv.hadoop@gmail.com>
>> >> wrote:
>> >>
>> >> > Hi Hadoop developers,
>> >> >
>> >> > I would like to propose to merge to trunk the feature branch
>> HDFS-12943
>> >> for
>> >> > Consistent Reads from Standby Node. The feature is intended to scale
>> >> read
>> >> > RPC workloads. On large clusters reads comprise 95% of all RPCs to
>> the
>> >> > NameNode. We should be able to accommodate higher overall RPC
>> workloads
>> >> (up
>> >> > to 4x by some estimates) by adding multiple ObserverNodes.
>> >> >
>> >> > The main functionality has been implemented see sub-tasks of
>> HDFS-12943.
>> >> > We followed up with the test plan. Testing was done on two
>> independent
>> >> > clusters (see HDFS-14058 and HDFS-14059) with security enabled.
>> >> > We ran standard HDFS commands, MR jobs, admin commands including
>> manual
>> >> > failover.
>> >> > We know of one cluster running this feature in production.
>> >> >
>> >> > There are a few outstanding issues:
>> >> > 1. Need to provide proper documentation - a user guide for the new
>> >> feature
>> >> > 2. Need to fix automatic failover with ZKFC. Currently it does not
>> >> doesn't
>> >> > know about ObserverNodes trying to convert them to SBNs.
>> >> > 3. Scale testing and performance fine-tuning
>> >> > 4. As testing progresses, we continue fixing non-critical bugs like
>> >> > HDFS-14116.
>> >> >
>> >> > I attached a unified patch to the umbrella jira for the review and
>> >> Jenkins
>> >> > build.
>> >> > Please vote on this thread. The vote will run for 7 days until Wed
>> Dec
>> >> 12.
>> >> >
>> >> > Thanks,
>> >> > --Konstantin
>> >> >
>> >>
>> >
>> >
>> > --
>> >
>> > Daryn
>> >
>>
>

Re: [VOTE] Merge HDFS-12943 branch to trunk - Consistent Reads from Standby

Posted by Konstantin Shvachko <sh...@gmail.com>.

Hi Yongjun,

Good suggestion. This is essentially what HDFS-13873 is implementing to
mitigate the concern.

Thanks,
--Konstantin

On Wed, Dec 12, 2018 at 10:35 PM Yongjun Zhang <yz...@cloudera.com> wrote:

> Hi Konstantin,
>
> Thanks for addressing my other question about failover.
>
> Some thought to share about the suggestion Daryn made.  Seems we could try
> this: let ObserverNode throws an RetriableException back to client saying
> it has not reached the transaction ID to serve the client yet, maybe even
> include the transaction ID gap information in the exception, then when the
> client received the RetriableException, it can decide whether the continue
> to send the request to the observer node again, or to the active NN when
> the gap is too big.
>
> Though saving another RPC would help the performance with the current
> implementation, I expect the above mentioned exception only happens
> infrequently, so the performance won't be too bad, plus the client has a
> chance to try ANN when knowing that the observer is too behind at extreme
> case.
>
> I wonder how different the performance is between these two approaches in
> cluster with real workload.
>
> Comments?
>
> --Yongjun
>
> On Fri, Dec 7, 2018 at 4:10 PM Konstantin Shvachko <sh...@gmail.com>
> wrote:
>
>> Hi Daryn,
>>
>> Wanted to backup Chen's earlier response to your concerns about rotating
>> calls in the call queue.
>> Our design
>> 1. targets directly the livelock problem by rejecting calls on the
>> Observer
>> that are not likely to be responded in timely matter: HDFS-13873.
>> 2. The call queue rotation is only done on Observers, and never on the
>> active NN, so it stays free of attacks like you suggest.
>>
>> If this is a satisfactory mitigation for the problem could you please
>> reconsider your -1, so that people could continue voting on this thread.
>>
>> Thanks,
>> --Konst
>>
>> On Thu, Dec 6, 2018 at 10:38 AM Daryn Sharp <da...@oath.com> wrote:
>>
>> > -1 pending additional info.  After a cursory scan, I have serious
>> concerns
>> > regarding the design.  This seems like a feature that should have been
>> > purely implemented in hdfs w/o touching the common IPC layer.
>> >
>> > The biggest issue in the alignment context.  It's purpose appears to be
>> > for allowing handlers to reinsert calls back into the call queue.
>> That's
>> > completely unacceptable.  A buggy or malicious client can easily cause
>> > livelock in the IPC layer with handlers only looping on calls that never
>> > satisfy the condition.  Why is this not implemented via
>> RetriableExceptions?
>> >
>> > On Thu, Dec 6, 2018 at 1:24 AM Yongjun Zhang
>> <yz...@cloudera.com.invalid>
>> > wrote:
>> >
>> >> Great work guys.
>> >>
>> >> Wonder if we can elaborate what's impact of not having #2 fixed, and
>> why
>> >> #2
>> >> is not needed for the feature to complete?
>> >> 2. Need to fix automatic failover with ZKFC. Currently it does not
>> doesn't
>> >> know about ObserverNodes trying to convert them to SBNs.
>> >>
>> >> Thanks.
>> >> --Yongjun
>> >>
>> >>
>> >> On Wed, Dec 5, 2018 at 5:27 PM Konstantin Shvachko <
>> shv.hadoop@gmail.com>
>> >> wrote:
>> >>
>> >> > Hi Hadoop developers,
>> >> >
>> >> > I would like to propose to merge to trunk the feature branch
>> HDFS-12943
>> >> for
>> >> > Consistent Reads from Standby Node. The feature is intended to scale
>> >> read
>> >> > RPC workloads. On large clusters reads comprise 95% of all RPCs to
>> the
>> >> > NameNode. We should be able to accommodate higher overall RPC
>> workloads
>> >> (up
>> >> > to 4x by some estimates) by adding multiple ObserverNodes.
>> >> >
>> >> > The main functionality has been implemented see sub-tasks of
>> HDFS-12943.
>> >> > We followed up with the test plan. Testing was done on two
>> independent
>> >> > clusters (see HDFS-14058 and HDFS-14059) with security enabled.
>> >> > We ran standard HDFS commands, MR jobs, admin commands including
>> manual
>> >> > failover.
>> >> > We know of one cluster running this feature in production.
>> >> >
>> >> > There are a few outstanding issues:
>> >> > 1. Need to provide proper documentation - a user guide for the new
>> >> feature
>> >> > 2. Need to fix automatic failover with ZKFC. Currently it does not
>> >> doesn't
>> >> > know about ObserverNodes trying to convert them to SBNs.
>> >> > 3. Scale testing and performance fine-tuning
>> >> > 4. As testing progresses, we continue fixing non-critical bugs like
>> >> > HDFS-14116.
>> >> >
>> >> > I attached a unified patch to the umbrella jira for the review and
>> >> Jenkins
>> >> > build.
>> >> > Please vote on this thread. The vote will run for 7 days until Wed
>> Dec
>> >> 12.
>> >> >
>> >> > Thanks,
>> >> > --Konstantin
>> >> >
>> >>
>> >
>> >
>> > --
>> >
>> > Daryn
>> >
>>
>

Re: [VOTE] Merge HDFS-12943 branch to trunk - Consistent Reads from Standby

Posted by Konstantin Shvachko <sh...@gmail.com>.

Hi Yongjun,

Good suggestion. This is essentially what HDFS-13873 is implementing to
mitigate the concern.

Thanks,
--Konstantin

On Wed, Dec 12, 2018 at 10:35 PM Yongjun Zhang <yz...@cloudera.com> wrote:

> Hi Konstantin,
>
> Thanks for addressing my other question about failover.
>
> Some thought to share about the suggestion Daryn made.  Seems we could try
> this: let ObserverNode throws an RetriableException back to client saying
> it has not reached the transaction ID to serve the client yet, maybe even
> include the transaction ID gap information in the exception, then when the
> client received the RetriableException, it can decide whether the continue
> to send the request to the observer node again, or to the active NN when
> the gap is too big.
>
> Though saving another RPC would help the performance with the current
> implementation, I expect the above mentioned exception only happens
> infrequently, so the performance won't be too bad, plus the client has a
> chance to try ANN when knowing that the observer is too behind at extreme
> case.
>
> I wonder how different the performance is between these two approaches in
> cluster with real workload.
>
> Comments?
>
> --Yongjun
>
> On Fri, Dec 7, 2018 at 4:10 PM Konstantin Shvachko <sh...@gmail.com>
> wrote:
>
>> Hi Daryn,
>>
>> Wanted to backup Chen's earlier response to your concerns about rotating
>> calls in the call queue.
>> Our design
>> 1. targets directly the livelock problem by rejecting calls on the
>> Observer
>> that are not likely to be responded in timely matter: HDFS-13873.
>> 2. The call queue rotation is only done on Observers, and never on the
>> active NN, so it stays free of attacks like you suggest.
>>
>> If this is a satisfactory mitigation for the problem could you please
>> reconsider your -1, so that people could continue voting on this thread.
>>
>> Thanks,
>> --Konst
>>
>> On Thu, Dec 6, 2018 at 10:38 AM Daryn Sharp <da...@oath.com> wrote:
>>
>> > -1 pending additional info.  After a cursory scan, I have serious
>> concerns
>> > regarding the design.  This seems like a feature that should have been
>> > purely implemented in hdfs w/o touching the common IPC layer.
>> >
>> > The biggest issue in the alignment context.  It's purpose appears to be
>> > for allowing handlers to reinsert calls back into the call queue.
>> That's
>> > completely unacceptable.  A buggy or malicious client can easily cause
>> > livelock in the IPC layer with handlers only looping on calls that never
>> > satisfy the condition.  Why is this not implemented via
>> RetriableExceptions?
>> >
>> > On Thu, Dec 6, 2018 at 1:24 AM Yongjun Zhang
>> <yz...@cloudera.com.invalid>
>> > wrote:
>> >
>> >> Great work guys.
>> >>
>> >> Wonder if we can elaborate what's impact of not having #2 fixed, and
>> why
>> >> #2
>> >> is not needed for the feature to complete?
>> >> 2. Need to fix automatic failover with ZKFC. Currently it does not
>> doesn't
>> >> know about ObserverNodes trying to convert them to SBNs.
>> >>
>> >> Thanks.
>> >> --Yongjun
>> >>
>> >>
>> >> On Wed, Dec 5, 2018 at 5:27 PM Konstantin Shvachko <
>> shv.hadoop@gmail.com>
>> >> wrote:
>> >>
>> >> > Hi Hadoop developers,
>> >> >
>> >> > I would like to propose to merge to trunk the feature branch
>> HDFS-12943
>> >> for
>> >> > Consistent Reads from Standby Node. The feature is intended to scale
>> >> read
>> >> > RPC workloads. On large clusters reads comprise 95% of all RPCs to
>> the
>> >> > NameNode. We should be able to accommodate higher overall RPC
>> workloads
>> >> (up
>> >> > to 4x by some estimates) by adding multiple ObserverNodes.
>> >> >
>> >> > The main functionality has been implemented see sub-tasks of
>> HDFS-12943.
>> >> > We followed up with the test plan. Testing was done on two
>> independent
>> >> > clusters (see HDFS-14058 and HDFS-14059) with security enabled.
>> >> > We ran standard HDFS commands, MR jobs, admin commands including
>> manual
>> >> > failover.
>> >> > We know of one cluster running this feature in production.
>> >> >
>> >> > There are a few outstanding issues:
>> >> > 1. Need to provide proper documentation - a user guide for the new
>> >> feature
>> >> > 2. Need to fix automatic failover with ZKFC. Currently it does not
>> >> doesn't
>> >> > know about ObserverNodes trying to convert them to SBNs.
>> >> > 3. Scale testing and performance fine-tuning
>> >> > 4. As testing progresses, we continue fixing non-critical bugs like
>> >> > HDFS-14116.
>> >> >
>> >> > I attached a unified patch to the umbrella jira for the review and
>> >> Jenkins
>> >> > build.
>> >> > Please vote on this thread. The vote will run for 7 days until Wed
>> Dec
>> >> 12.
>> >> >
>> >> > Thanks,
>> >> > --Konstantin
>> >> >
>> >>
>> >
>> >
>> > --
>> >
>> > Daryn
>> >
>>
>

Re: [VOTE] Merge HDFS-12943 branch to trunk - Consistent Reads from Standby

Posted by Yongjun Zhang <yz...@cloudera.com.INVALID>.

Hi Konstantin,

Thanks for addressing my other question about failover.

Some thought to share about the suggestion Daryn made.  Seems we could try
this: let ObserverNode throws an RetriableException back to client saying
it has not reached the transaction ID to serve the client yet, maybe even
include the transaction ID gap information in the exception, then when the
client received the RetriableException, it can decide whether the continue
to send the request to the observer node again, or to the active NN when
the gap is too big.

Though saving another RPC would help the performance with the current
implementation, I expect the above mentioned exception only happens
infrequently, so the performance won't be too bad, plus the client has a
chance to try ANN when knowing that the observer is too behind at extreme
case.

I wonder how different the performance is between these two approaches in
cluster with real workload.

Comments?

--Yongjun

On Fri, Dec 7, 2018 at 4:10 PM Konstantin Shvachko <sh...@gmail.com>
wrote:

> Hi Daryn,
>
> Wanted to backup Chen's earlier response to your concerns about rotating
> calls in the call queue.
> Our design
> 1. targets directly the livelock problem by rejecting calls on the Observer
> that are not likely to be responded in timely matter: HDFS-13873.
> 2. The call queue rotation is only done on Observers, and never on the
> active NN, so it stays free of attacks like you suggest.
>
> If this is a satisfactory mitigation for the problem could you please
> reconsider your -1, so that people could continue voting on this thread.
>
> Thanks,
> --Konst
>
> On Thu, Dec 6, 2018 at 10:38 AM Daryn Sharp <da...@oath.com> wrote:
>
> > -1 pending additional info.  After a cursory scan, I have serious
> concerns
> > regarding the design.  This seems like a feature that should have been
> > purely implemented in hdfs w/o touching the common IPC layer.
> >
> > The biggest issue in the alignment context.  It's purpose appears to be
> > for allowing handlers to reinsert calls back into the call queue.  That's
> > completely unacceptable.  A buggy or malicious client can easily cause
> > livelock in the IPC layer with handlers only looping on calls that never
> > satisfy the condition.  Why is this not implemented via
> RetriableExceptions?
> >
> > On Thu, Dec 6, 2018 at 1:24 AM Yongjun Zhang <yzhang@cloudera.com.invalid
> >
> > wrote:
> >
> >> Great work guys.
> >>
> >> Wonder if we can elaborate what's impact of not having #2 fixed, and why
> >> #2
> >> is not needed for the feature to complete?
> >> 2. Need to fix automatic failover with ZKFC. Currently it does not
> doesn't
> >> know about ObserverNodes trying to convert them to SBNs.
> >>
> >> Thanks.
> >> --Yongjun
> >>
> >>
> >> On Wed, Dec 5, 2018 at 5:27 PM Konstantin Shvachko <
> shv.hadoop@gmail.com>
> >> wrote:
> >>
> >> > Hi Hadoop developers,
> >> >
> >> > I would like to propose to merge to trunk the feature branch
> HDFS-12943
> >> for
> >> > Consistent Reads from Standby Node. The feature is intended to scale
> >> read
> >> > RPC workloads. On large clusters reads comprise 95% of all RPCs to the
> >> > NameNode. We should be able to accommodate higher overall RPC
> workloads
> >> (up
> >> > to 4x by some estimates) by adding multiple ObserverNodes.
> >> >
> >> > The main functionality has been implemented see sub-tasks of
> HDFS-12943.
> >> > We followed up with the test plan. Testing was done on two independent
> >> > clusters (see HDFS-14058 and HDFS-14059) with security enabled.
> >> > We ran standard HDFS commands, MR jobs, admin commands including
> manual
> >> > failover.
> >> > We know of one cluster running this feature in production.
> >> >
> >> > There are a few outstanding issues:
> >> > 1. Need to provide proper documentation - a user guide for the new
> >> feature
> >> > 2. Need to fix automatic failover with ZKFC. Currently it does not
> >> doesn't
> >> > know about ObserverNodes trying to convert them to SBNs.
> >> > 3. Scale testing and performance fine-tuning
> >> > 4. As testing progresses, we continue fixing non-critical bugs like
> >> > HDFS-14116.
> >> >
> >> > I attached a unified patch to the umbrella jira for the review and
> >> Jenkins
> >> > build.
> >> > Please vote on this thread. The vote will run for 7 days until Wed Dec
> >> 12.
> >> >
> >> > Thanks,
> >> > --Konstantin
> >> >
> >>
> >
> >
> > --
> >
> > Daryn
> >
>

Re: [VOTE] Merge HDFS-12943 branch to trunk - Consistent Reads from Standby

Posted by Yongjun Zhang <yz...@cloudera.com.INVALID>.

Hi Konstantin,

Thanks for addressing my other question about failover.

Some thought to share about the suggestion Daryn made.  Seems we could try
this: let ObserverNode throws an RetriableException back to client saying
it has not reached the transaction ID to serve the client yet, maybe even
include the transaction ID gap information in the exception, then when the
client received the RetriableException, it can decide whether the continue
to send the request to the observer node again, or to the active NN when
the gap is too big.

Though saving another RPC would help the performance with the current
implementation, I expect the above mentioned exception only happens
infrequently, so the performance won't be too bad, plus the client has a
chance to try ANN when knowing that the observer is too behind at extreme
case.

I wonder how different the performance is between these two approaches in
cluster with real workload.

Comments?

--Yongjun

On Fri, Dec 7, 2018 at 4:10 PM Konstantin Shvachko <sh...@gmail.com>
wrote:

> Hi Daryn,
>
> Wanted to backup Chen's earlier response to your concerns about rotating
> calls in the call queue.
> Our design
> 1. targets directly the livelock problem by rejecting calls on the Observer
> that are not likely to be responded in timely matter: HDFS-13873.
> 2. The call queue rotation is only done on Observers, and never on the
> active NN, so it stays free of attacks like you suggest.
>
> If this is a satisfactory mitigation for the problem could you please
> reconsider your -1, so that people could continue voting on this thread.
>
> Thanks,
> --Konst
>
> On Thu, Dec 6, 2018 at 10:38 AM Daryn Sharp <da...@oath.com> wrote:
>
> > -1 pending additional info.  After a cursory scan, I have serious
> concerns
> > regarding the design.  This seems like a feature that should have been
> > purely implemented in hdfs w/o touching the common IPC layer.
> >
> > The biggest issue in the alignment context.  It's purpose appears to be
> > for allowing handlers to reinsert calls back into the call queue.  That's
> > completely unacceptable.  A buggy or malicious client can easily cause
> > livelock in the IPC layer with handlers only looping on calls that never
> > satisfy the condition.  Why is this not implemented via
> RetriableExceptions?
> >
> > On Thu, Dec 6, 2018 at 1:24 AM Yongjun Zhang <yzhang@cloudera.com.invalid
> >
> > wrote:
> >
> >> Great work guys.
> >>
> >> Wonder if we can elaborate what's impact of not having #2 fixed, and why
> >> #2
> >> is not needed for the feature to complete?
> >> 2. Need to fix automatic failover with ZKFC. Currently it does not
> doesn't
> >> know about ObserverNodes trying to convert them to SBNs.
> >>
> >> Thanks.
> >> --Yongjun
> >>
> >>
> >> On Wed, Dec 5, 2018 at 5:27 PM Konstantin Shvachko <
> shv.hadoop@gmail.com>
> >> wrote:
> >>
> >> > Hi Hadoop developers,
> >> >
> >> > I would like to propose to merge to trunk the feature branch
> HDFS-12943
> >> for
> >> > Consistent Reads from Standby Node. The feature is intended to scale
> >> read
> >> > RPC workloads. On large clusters reads comprise 95% of all RPCs to the
> >> > NameNode. We should be able to accommodate higher overall RPC
> workloads
> >> (up
> >> > to 4x by some estimates) by adding multiple ObserverNodes.
> >> >
> >> > The main functionality has been implemented see sub-tasks of
> HDFS-12943.
> >> > We followed up with the test plan. Testing was done on two independent
> >> > clusters (see HDFS-14058 and HDFS-14059) with security enabled.
> >> > We ran standard HDFS commands, MR jobs, admin commands including
> manual
> >> > failover.
> >> > We know of one cluster running this feature in production.
> >> >
> >> > There are a few outstanding issues:
> >> > 1. Need to provide proper documentation - a user guide for the new
> >> feature
> >> > 2. Need to fix automatic failover with ZKFC. Currently it does not
> >> doesn't
> >> > know about ObserverNodes trying to convert them to SBNs.
> >> > 3. Scale testing and performance fine-tuning
> >> > 4. As testing progresses, we continue fixing non-critical bugs like
> >> > HDFS-14116.
> >> >
> >> > I attached a unified patch to the umbrella jira for the review and
> >> Jenkins
> >> > build.
> >> > Please vote on this thread. The vote will run for 7 days until Wed Dec
> >> 12.
> >> >
> >> > Thanks,
> >> > --Konstantin
> >> >
> >>
> >
> >
> > --
> >
> > Daryn
> >
>

Re: [Result] [VOTE] Merge HDFS-12943 branch to trunk - Consistent Reads from Standby

Posted by Vinod Kumar Vavilapalli <vi...@apache.org>.

Agree, it isn't productive this way.

I can't seem to find it, but was there a DISCUSS thread for this branch-merge? I usually recommend addressing issues on a DISCUSS thread instead of fighting things over a VOTE.

+Vinod

> On Dec 13, 2018, at 10:09 AM, Konstantin Shvachko <sh...@gmail.com> wrote:
> 
> This vote failed due to Daryn Sharp's veto.
> The concern is being addressed by HDFS-13873. I will start a new vote once
> this is committed.
> 
> Note for Daryn. Your non-responsive handling of the veto makes a bad
> precedence and is a bad example of communication on the lists from a
> respected member of this community. Please check your availability for
> followup discussions if you choose to get involved with important decisions.
> 
> On Fri, Dec 7, 2018 at 4:10 PM Konstantin Shvachko <sh...@gmail.com>
> wrote:
> 
>> Hi Daryn,
>> 
>> Wanted to backup Chen's earlier response to your concerns about rotating
>> calls in the call queue.
>> Our design
>> 1. targets directly the livelock problem by rejecting calls on the
>> Observer that are not likely to be responded in timely matter: HDFS-13873.
>> 2. The call queue rotation is only done on Observers, and never on the
>> active NN, so it stays free of attacks like you suggest.
>> 
>> If this is a satisfactory mitigation for the problem could you please
>> reconsider your -1, so that people could continue voting on this thread.
>> 
>> Thanks,
>> --Konst
>> 
>> On Thu, Dec 6, 2018 at 10:38 AM Daryn Sharp <da...@oath.com> wrote:
>> 
>>> -1 pending additional info.  After a cursory scan, I have serious
>>> concerns regarding the design.  This seems like a feature that should have
>>> been purely implemented in hdfs w/o touching the common IPC layer.
>>> 
>>> The biggest issue in the alignment context.  It's purpose appears to be
>>> for allowing handlers to reinsert calls back into the call queue.  That's
>>> completely unacceptable.  A buggy or malicious client can easily cause
>>> livelock in the IPC layer with handlers only looping on calls that never
>>> satisfy the condition.  Why is this not implemented via RetriableExceptions?
>>> 
>>> On Thu, Dec 6, 2018 at 1:24 AM Yongjun Zhang <yz...@cloudera.com.invalid>
>>> wrote:
>>> 
>>>> Great work guys.
>>>> 
>>>> Wonder if we can elaborate what's impact of not having #2 fixed, and why
>>>> #2
>>>> is not needed for the feature to complete?
>>>> 2. Need to fix automatic failover with ZKFC. Currently it does not
>>>> doesn't
>>>> know about ObserverNodes trying to convert them to SBNs.
>>>> 
>>>> Thanks.
>>>> --Yongjun
>>>> 
>>>> 
>>>> On Wed, Dec 5, 2018 at 5:27 PM Konstantin Shvachko <shv.hadoop@gmail.com
>>>>> 
>>>> wrote:
>>>> 
>>>>> Hi Hadoop developers,
>>>>> 
>>>>> I would like to propose to merge to trunk the feature branch
>>>> HDFS-12943 for
>>>>> Consistent Reads from Standby Node. The feature is intended to scale
>>>> read
>>>>> RPC workloads. On large clusters reads comprise 95% of all RPCs to the
>>>>> NameNode. We should be able to accommodate higher overall RPC
>>>> workloads (up
>>>>> to 4x by some estimates) by adding multiple ObserverNodes.
>>>>> 
>>>>> The main functionality has been implemented see sub-tasks of
>>>> HDFS-12943.
>>>>> We followed up with the test plan. Testing was done on two independent
>>>>> clusters (see HDFS-14058 and HDFS-14059) with security enabled.
>>>>> We ran standard HDFS commands, MR jobs, admin commands including manual
>>>>> failover.
>>>>> We know of one cluster running this feature in production.
>>>>> 
>>>>> There are a few outstanding issues:
>>>>> 1. Need to provide proper documentation - a user guide for the new
>>>> feature
>>>>> 2. Need to fix automatic failover with ZKFC. Currently it does not
>>>> doesn't
>>>>> know about ObserverNodes trying to convert them to SBNs.
>>>>> 3. Scale testing and performance fine-tuning
>>>>> 4. As testing progresses, we continue fixing non-critical bugs like
>>>>> HDFS-14116.
>>>>> 
>>>>> I attached a unified patch to the umbrella jira for the review and
>>>> Jenkins
>>>>> build.
>>>>> Please vote on this thread. The vote will run for 7 days until Wed Dec
>>>> 12.
>>>>> 
>>>>> Thanks,
>>>>> --Konstantin
>>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> 
>>> Daryn
>>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org

Re: [Result] [VOTE] Merge HDFS-12943 branch to trunk - Consistent Reads from Standby

Posted by Vinod Kumar Vavilapalli <vi...@apache.org>.

Agree, it isn't productive this way.

I can't seem to find it, but was there a DISCUSS thread for this branch-merge? I usually recommend addressing issues on a DISCUSS thread instead of fighting things over a VOTE.

+Vinod

> On Dec 13, 2018, at 10:09 AM, Konstantin Shvachko <sh...@gmail.com> wrote:
> 
> This vote failed due to Daryn Sharp's veto.
> The concern is being addressed by HDFS-13873. I will start a new vote once
> this is committed.
> 
> Note for Daryn. Your non-responsive handling of the veto makes a bad
> precedence and is a bad example of communication on the lists from a
> respected member of this community. Please check your availability for
> followup discussions if you choose to get involved with important decisions.
> 
> On Fri, Dec 7, 2018 at 4:10 PM Konstantin Shvachko <sh...@gmail.com>
> wrote:
> 
>> Hi Daryn,
>> 
>> Wanted to backup Chen's earlier response to your concerns about rotating
>> calls in the call queue.
>> Our design
>> 1. targets directly the livelock problem by rejecting calls on the
>> Observer that are not likely to be responded in timely matter: HDFS-13873.
>> 2. The call queue rotation is only done on Observers, and never on the
>> active NN, so it stays free of attacks like you suggest.
>> 
>> If this is a satisfactory mitigation for the problem could you please
>> reconsider your -1, so that people could continue voting on this thread.
>> 
>> Thanks,
>> --Konst
>> 
>> On Thu, Dec 6, 2018 at 10:38 AM Daryn Sharp <da...@oath.com> wrote:
>> 
>>> -1 pending additional info.  After a cursory scan, I have serious
>>> concerns regarding the design.  This seems like a feature that should have
>>> been purely implemented in hdfs w/o touching the common IPC layer.
>>> 
>>> The biggest issue in the alignment context.  It's purpose appears to be
>>> for allowing handlers to reinsert calls back into the call queue.  That's
>>> completely unacceptable.  A buggy or malicious client can easily cause
>>> livelock in the IPC layer with handlers only looping on calls that never
>>> satisfy the condition.  Why is this not implemented via RetriableExceptions?
>>> 
>>> On Thu, Dec 6, 2018 at 1:24 AM Yongjun Zhang <yz...@cloudera.com.invalid>
>>> wrote:
>>> 
>>>> Great work guys.
>>>> 
>>>> Wonder if we can elaborate what's impact of not having #2 fixed, and why
>>>> #2
>>>> is not needed for the feature to complete?
>>>> 2. Need to fix automatic failover with ZKFC. Currently it does not
>>>> doesn't
>>>> know about ObserverNodes trying to convert them to SBNs.
>>>> 
>>>> Thanks.
>>>> --Yongjun
>>>> 
>>>> 
>>>> On Wed, Dec 5, 2018 at 5:27 PM Konstantin Shvachko <shv.hadoop@gmail.com
>>>>> 
>>>> wrote:
>>>> 
>>>>> Hi Hadoop developers,
>>>>> 
>>>>> I would like to propose to merge to trunk the feature branch
>>>> HDFS-12943 for
>>>>> Consistent Reads from Standby Node. The feature is intended to scale
>>>> read
>>>>> RPC workloads. On large clusters reads comprise 95% of all RPCs to the
>>>>> NameNode. We should be able to accommodate higher overall RPC
>>>> workloads (up
>>>>> to 4x by some estimates) by adding multiple ObserverNodes.
>>>>> 
>>>>> The main functionality has been implemented see sub-tasks of
>>>> HDFS-12943.
>>>>> We followed up with the test plan. Testing was done on two independent
>>>>> clusters (see HDFS-14058 and HDFS-14059) with security enabled.
>>>>> We ran standard HDFS commands, MR jobs, admin commands including manual
>>>>> failover.
>>>>> We know of one cluster running this feature in production.
>>>>> 
>>>>> There are a few outstanding issues:
>>>>> 1. Need to provide proper documentation - a user guide for the new
>>>> feature
>>>>> 2. Need to fix automatic failover with ZKFC. Currently it does not
>>>> doesn't
>>>>> know about ObserverNodes trying to convert them to SBNs.
>>>>> 3. Scale testing and performance fine-tuning
>>>>> 4. As testing progresses, we continue fixing non-critical bugs like
>>>>> HDFS-14116.
>>>>> 
>>>>> I attached a unified patch to the umbrella jira for the review and
>>>> Jenkins
>>>>> build.
>>>>> Please vote on this thread. The vote will run for 7 days until Wed Dec
>>>> 12.
>>>>> 
>>>>> Thanks,
>>>>> --Konstantin
>>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> 
>>> Daryn
>>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-help@hadoop.apache.org

Re: [Result] [VOTE] Merge HDFS-12943 branch to trunk - Consistent Reads from Standby

Posted by Vinod Kumar Vavilapalli <vi...@apache.org>.

Agree, it isn't productive this way.

I can't seem to find it, but was there a DISCUSS thread for this branch-merge? I usually recommend addressing issues on a DISCUSS thread instead of fighting things over a VOTE.

+Vinod

> On Dec 13, 2018, at 10:09 AM, Konstantin Shvachko <sh...@gmail.com> wrote:
> 
> This vote failed due to Daryn Sharp's veto.
> The concern is being addressed by HDFS-13873. I will start a new vote once
> this is committed.
> 
> Note for Daryn. Your non-responsive handling of the veto makes a bad
> precedence and is a bad example of communication on the lists from a
> respected member of this community. Please check your availability for
> followup discussions if you choose to get involved with important decisions.
> 
> On Fri, Dec 7, 2018 at 4:10 PM Konstantin Shvachko <sh...@gmail.com>
> wrote:
> 
>> Hi Daryn,
>> 
>> Wanted to backup Chen's earlier response to your concerns about rotating
>> calls in the call queue.
>> Our design
>> 1. targets directly the livelock problem by rejecting calls on the
>> Observer that are not likely to be responded in timely matter: HDFS-13873.
>> 2. The call queue rotation is only done on Observers, and never on the
>> active NN, so it stays free of attacks like you suggest.
>> 
>> If this is a satisfactory mitigation for the problem could you please
>> reconsider your -1, so that people could continue voting on this thread.
>> 
>> Thanks,
>> --Konst
>> 
>> On Thu, Dec 6, 2018 at 10:38 AM Daryn Sharp <da...@oath.com> wrote:
>> 
>>> -1 pending additional info.  After a cursory scan, I have serious
>>> concerns regarding the design.  This seems like a feature that should have
>>> been purely implemented in hdfs w/o touching the common IPC layer.
>>> 
>>> The biggest issue in the alignment context.  It's purpose appears to be
>>> for allowing handlers to reinsert calls back into the call queue.  That's
>>> completely unacceptable.  A buggy or malicious client can easily cause
>>> livelock in the IPC layer with handlers only looping on calls that never
>>> satisfy the condition.  Why is this not implemented via RetriableExceptions?
>>> 
>>> On Thu, Dec 6, 2018 at 1:24 AM Yongjun Zhang <yz...@cloudera.com.invalid>
>>> wrote:
>>> 
>>>> Great work guys.
>>>> 
>>>> Wonder if we can elaborate what's impact of not having #2 fixed, and why
>>>> #2
>>>> is not needed for the feature to complete?
>>>> 2. Need to fix automatic failover with ZKFC. Currently it does not
>>>> doesn't
>>>> know about ObserverNodes trying to convert them to SBNs.
>>>> 
>>>> Thanks.
>>>> --Yongjun
>>>> 
>>>> 
>>>> On Wed, Dec 5, 2018 at 5:27 PM Konstantin Shvachko <shv.hadoop@gmail.com
>>>>> 
>>>> wrote:
>>>> 
>>>>> Hi Hadoop developers,
>>>>> 
>>>>> I would like to propose to merge to trunk the feature branch
>>>> HDFS-12943 for
>>>>> Consistent Reads from Standby Node. The feature is intended to scale
>>>> read
>>>>> RPC workloads. On large clusters reads comprise 95% of all RPCs to the
>>>>> NameNode. We should be able to accommodate higher overall RPC
>>>> workloads (up
>>>>> to 4x by some estimates) by adding multiple ObserverNodes.
>>>>> 
>>>>> The main functionality has been implemented see sub-tasks of
>>>> HDFS-12943.
>>>>> We followed up with the test plan. Testing was done on two independent
>>>>> clusters (see HDFS-14058 and HDFS-14059) with security enabled.
>>>>> We ran standard HDFS commands, MR jobs, admin commands including manual
>>>>> failover.
>>>>> We know of one cluster running this feature in production.
>>>>> 
>>>>> There are a few outstanding issues:
>>>>> 1. Need to provide proper documentation - a user guide for the new
>>>> feature
>>>>> 2. Need to fix automatic failover with ZKFC. Currently it does not
>>>> doesn't
>>>>> know about ObserverNodes trying to convert them to SBNs.
>>>>> 3. Scale testing and performance fine-tuning
>>>>> 4. As testing progresses, we continue fixing non-critical bugs like
>>>>> HDFS-14116.
>>>>> 
>>>>> I attached a unified patch to the umbrella jira for the review and
>>>> Jenkins
>>>>> build.
>>>>> Please vote on this thread. The vote will run for 7 days until Wed Dec
>>>> 12.
>>>>> 
>>>>> Thanks,
>>>>> --Konstantin
>>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> 
>>> Daryn
>>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org

Re: [Result] [VOTE] Merge HDFS-12943 branch to trunk - Consistent Reads from Standby

Posted by Vinod Kumar Vavilapalli <vi...@apache.org>.

Agree, it isn't productive this way.

I can't seem to find it, but was there a DISCUSS thread for this branch-merge? I usually recommend addressing issues on a DISCUSS thread instead of fighting things over a VOTE.

+Vinod

> On Dec 13, 2018, at 10:09 AM, Konstantin Shvachko <sh...@gmail.com> wrote:
> 
> This vote failed due to Daryn Sharp's veto.
> The concern is being addressed by HDFS-13873. I will start a new vote once
> this is committed.
> 
> Note for Daryn. Your non-responsive handling of the veto makes a bad
> precedence and is a bad example of communication on the lists from a
> respected member of this community. Please check your availability for
> followup discussions if you choose to get involved with important decisions.
> 
> On Fri, Dec 7, 2018 at 4:10 PM Konstantin Shvachko <sh...@gmail.com>
> wrote:
> 
>> Hi Daryn,
>> 
>> Wanted to backup Chen's earlier response to your concerns about rotating
>> calls in the call queue.
>> Our design
>> 1. targets directly the livelock problem by rejecting calls on the
>> Observer that are not likely to be responded in timely matter: HDFS-13873.
>> 2. The call queue rotation is only done on Observers, and never on the
>> active NN, so it stays free of attacks like you suggest.
>> 
>> If this is a satisfactory mitigation for the problem could you please
>> reconsider your -1, so that people could continue voting on this thread.
>> 
>> Thanks,
>> --Konst
>> 
>> On Thu, Dec 6, 2018 at 10:38 AM Daryn Sharp <da...@oath.com> wrote:
>> 
>>> -1 pending additional info.  After a cursory scan, I have serious
>>> concerns regarding the design.  This seems like a feature that should have
>>> been purely implemented in hdfs w/o touching the common IPC layer.
>>> 
>>> The biggest issue in the alignment context.  It's purpose appears to be
>>> for allowing handlers to reinsert calls back into the call queue.  That's
>>> completely unacceptable.  A buggy or malicious client can easily cause
>>> livelock in the IPC layer with handlers only looping on calls that never
>>> satisfy the condition.  Why is this not implemented via RetriableExceptions?
>>> 
>>> On Thu, Dec 6, 2018 at 1:24 AM Yongjun Zhang <yz...@cloudera.com.invalid>
>>> wrote:
>>> 
>>>> Great work guys.
>>>> 
>>>> Wonder if we can elaborate what's impact of not having #2 fixed, and why
>>>> #2
>>>> is not needed for the feature to complete?
>>>> 2. Need to fix automatic failover with ZKFC. Currently it does not
>>>> doesn't
>>>> know about ObserverNodes trying to convert them to SBNs.
>>>> 
>>>> Thanks.
>>>> --Yongjun
>>>> 
>>>> 
>>>> On Wed, Dec 5, 2018 at 5:27 PM Konstantin Shvachko <shv.hadoop@gmail.com
>>>>> 
>>>> wrote:
>>>> 
>>>>> Hi Hadoop developers,
>>>>> 
>>>>> I would like to propose to merge to trunk the feature branch
>>>> HDFS-12943 for
>>>>> Consistent Reads from Standby Node. The feature is intended to scale
>>>> read
>>>>> RPC workloads. On large clusters reads comprise 95% of all RPCs to the
>>>>> NameNode. We should be able to accommodate higher overall RPC
>>>> workloads (up
>>>>> to 4x by some estimates) by adding multiple ObserverNodes.
>>>>> 
>>>>> The main functionality has been implemented see sub-tasks of
>>>> HDFS-12943.
>>>>> We followed up with the test plan. Testing was done on two independent
>>>>> clusters (see HDFS-14058 and HDFS-14059) with security enabled.
>>>>> We ran standard HDFS commands, MR jobs, admin commands including manual
>>>>> failover.
>>>>> We know of one cluster running this feature in production.
>>>>> 
>>>>> There are a few outstanding issues:
>>>>> 1. Need to provide proper documentation - a user guide for the new
>>>> feature
>>>>> 2. Need to fix automatic failover with ZKFC. Currently it does not
>>>> doesn't
>>>>> know about ObserverNodes trying to convert them to SBNs.
>>>>> 3. Scale testing and performance fine-tuning
>>>>> 4. As testing progresses, we continue fixing non-critical bugs like
>>>>> HDFS-14116.
>>>>> 
>>>>> I attached a unified patch to the umbrella jira for the review and
>>>> Jenkins
>>>>> build.
>>>>> Please vote on this thread. The vote will run for 7 days until Wed Dec
>>>> 12.
>>>>> 
>>>>> Thanks,
>>>>> --Konstantin
>>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> 
>>> Daryn
>>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org

[Result] [VOTE] Merge HDFS-12943 branch to trunk - Consistent Reads from Standby

Posted by Konstantin Shvachko <sh...@gmail.com>.

This vote failed due to Daryn Sharp's veto.
The concern is being addressed by HDFS-13873. I will start a new vote once
this is committed.

Note for Daryn. Your non-responsive handling of the veto makes a bad
precedence and is a bad example of communication on the lists from a
respected member of this community. Please check your availability for
followup discussions if you choose to get involved with important decisions.

On Fri, Dec 7, 2018 at 4:10 PM Konstantin Shvachko <sh...@gmail.com>
wrote:

> Hi Daryn,
>
> Wanted to backup Chen's earlier response to your concerns about rotating
> calls in the call queue.
> Our design
> 1. targets directly the livelock problem by rejecting calls on the
> Observer that are not likely to be responded in timely matter: HDFS-13873.
> 2. The call queue rotation is only done on Observers, and never on the
> active NN, so it stays free of attacks like you suggest.
>
> If this is a satisfactory mitigation for the problem could you please
> reconsider your -1, so that people could continue voting on this thread.
>
> Thanks,
> --Konst
>
> On Thu, Dec 6, 2018 at 10:38 AM Daryn Sharp <da...@oath.com> wrote:
>
>> -1 pending additional info.  After a cursory scan, I have serious
>> concerns regarding the design.  This seems like a feature that should have
>> been purely implemented in hdfs w/o touching the common IPC layer.
>>
>> The biggest issue in the alignment context.  It's purpose appears to be
>> for allowing handlers to reinsert calls back into the call queue.  That's
>> completely unacceptable.  A buggy or malicious client can easily cause
>> livelock in the IPC layer with handlers only looping on calls that never
>> satisfy the condition.  Why is this not implemented via RetriableExceptions?
>>
>> On Thu, Dec 6, 2018 at 1:24 AM Yongjun Zhang <yz...@cloudera.com.invalid>
>> wrote:
>>
>>> Great work guys.
>>>
>>> Wonder if we can elaborate what's impact of not having #2 fixed, and why
>>> #2
>>> is not needed for the feature to complete?
>>> 2. Need to fix automatic failover with ZKFC. Currently it does not
>>> doesn't
>>> know about ObserverNodes trying to convert them to SBNs.
>>>
>>> Thanks.
>>> --Yongjun
>>>
>>>
>>> On Wed, Dec 5, 2018 at 5:27 PM Konstantin Shvachko <shv.hadoop@gmail.com
>>> >
>>> wrote:
>>>
>>> > Hi Hadoop developers,
>>> >
>>> > I would like to propose to merge to trunk the feature branch
>>> HDFS-12943 for
>>> > Consistent Reads from Standby Node. The feature is intended to scale
>>> read
>>> > RPC workloads. On large clusters reads comprise 95% of all RPCs to the
>>> > NameNode. We should be able to accommodate higher overall RPC
>>> workloads (up
>>> > to 4x by some estimates) by adding multiple ObserverNodes.
>>> >
>>> > The main functionality has been implemented see sub-tasks of
>>> HDFS-12943.
>>> > We followed up with the test plan. Testing was done on two independent
>>> > clusters (see HDFS-14058 and HDFS-14059) with security enabled.
>>> > We ran standard HDFS commands, MR jobs, admin commands including manual
>>> > failover.
>>> > We know of one cluster running this feature in production.
>>> >
>>> > There are a few outstanding issues:
>>> > 1. Need to provide proper documentation - a user guide for the new
>>> feature
>>> > 2. Need to fix automatic failover with ZKFC. Currently it does not
>>> doesn't
>>> > know about ObserverNodes trying to convert them to SBNs.
>>> > 3. Scale testing and performance fine-tuning
>>> > 4. As testing progresses, we continue fixing non-critical bugs like
>>> > HDFS-14116.
>>> >
>>> > I attached a unified patch to the umbrella jira for the review and
>>> Jenkins
>>> > build.
>>> > Please vote on this thread. The vote will run for 7 days until Wed Dec
>>> 12.
>>> >
>>> > Thanks,
>>> > --Konstantin
>>> >
>>>
>>
>>
>> --
>>
>> Daryn
>>
>

[Result] [VOTE] Merge HDFS-12943 branch to trunk - Consistent Reads from Standby

Posted by Konstantin Shvachko <sh...@gmail.com>.

This vote failed due to Daryn Sharp's veto.
The concern is being addressed by HDFS-13873. I will start a new vote once
this is committed.

Note for Daryn. Your non-responsive handling of the veto makes a bad
precedence and is a bad example of communication on the lists from a
respected member of this community. Please check your availability for
followup discussions if you choose to get involved with important decisions.

On Fri, Dec 7, 2018 at 4:10 PM Konstantin Shvachko <sh...@gmail.com>
wrote:

> Hi Daryn,
>
> Wanted to backup Chen's earlier response to your concerns about rotating
> calls in the call queue.
> Our design
> 1. targets directly the livelock problem by rejecting calls on the
> Observer that are not likely to be responded in timely matter: HDFS-13873.
> 2. The call queue rotation is only done on Observers, and never on the
> active NN, so it stays free of attacks like you suggest.
>
> If this is a satisfactory mitigation for the problem could you please
> reconsider your -1, so that people could continue voting on this thread.
>
> Thanks,
> --Konst
>
> On Thu, Dec 6, 2018 at 10:38 AM Daryn Sharp <da...@oath.com> wrote:
>
>> -1 pending additional info.  After a cursory scan, I have serious
>> concerns regarding the design.  This seems like a feature that should have
>> been purely implemented in hdfs w/o touching the common IPC layer.
>>
>> The biggest issue in the alignment context.  It's purpose appears to be
>> for allowing handlers to reinsert calls back into the call queue.  That's
>> completely unacceptable.  A buggy or malicious client can easily cause
>> livelock in the IPC layer with handlers only looping on calls that never
>> satisfy the condition.  Why is this not implemented via RetriableExceptions?
>>
>> On Thu, Dec 6, 2018 at 1:24 AM Yongjun Zhang <yz...@cloudera.com.invalid>
>> wrote:
>>
>>> Great work guys.
>>>
>>> Wonder if we can elaborate what's impact of not having #2 fixed, and why
>>> #2
>>> is not needed for the feature to complete?
>>> 2. Need to fix automatic failover with ZKFC. Currently it does not
>>> doesn't
>>> know about ObserverNodes trying to convert them to SBNs.
>>>
>>> Thanks.
>>> --Yongjun
>>>
>>>
>>> On Wed, Dec 5, 2018 at 5:27 PM Konstantin Shvachko <shv.hadoop@gmail.com
>>> >
>>> wrote:
>>>
>>> > Hi Hadoop developers,
>>> >
>>> > I would like to propose to merge to trunk the feature branch
>>> HDFS-12943 for
>>> > Consistent Reads from Standby Node. The feature is intended to scale
>>> read
>>> > RPC workloads. On large clusters reads comprise 95% of all RPCs to the
>>> > NameNode. We should be able to accommodate higher overall RPC
>>> workloads (up
>>> > to 4x by some estimates) by adding multiple ObserverNodes.
>>> >
>>> > The main functionality has been implemented see sub-tasks of
>>> HDFS-12943.
>>> > We followed up with the test plan. Testing was done on two independent
>>> > clusters (see HDFS-14058 and HDFS-14059) with security enabled.
>>> > We ran standard HDFS commands, MR jobs, admin commands including manual
>>> > failover.
>>> > We know of one cluster running this feature in production.
>>> >
>>> > There are a few outstanding issues:
>>> > 1. Need to provide proper documentation - a user guide for the new
>>> feature
>>> > 2. Need to fix automatic failover with ZKFC. Currently it does not
>>> doesn't
>>> > know about ObserverNodes trying to convert them to SBNs.
>>> > 3. Scale testing and performance fine-tuning
>>> > 4. As testing progresses, we continue fixing non-critical bugs like
>>> > HDFS-14116.
>>> >
>>> > I attached a unified patch to the umbrella jira for the review and
>>> Jenkins
>>> > build.
>>> > Please vote on this thread. The vote will run for 7 days until Wed Dec
>>> 12.
>>> >
>>> > Thanks,
>>> > --Konstantin
>>> >
>>>
>>
>>
>> --
>>
>> Daryn
>>
>

[Result] [VOTE] Merge HDFS-12943 branch to trunk - Consistent Reads from Standby

Posted by Konstantin Shvachko <sh...@gmail.com>.

This vote failed due to Daryn Sharp's veto.
The concern is being addressed by HDFS-13873. I will start a new vote once
this is committed.

Note for Daryn. Your non-responsive handling of the veto makes a bad
precedence and is a bad example of communication on the lists from a
respected member of this community. Please check your availability for
followup discussions if you choose to get involved with important decisions.

On Fri, Dec 7, 2018 at 4:10 PM Konstantin Shvachko <sh...@gmail.com>
wrote:

> Hi Daryn,
>
> Wanted to backup Chen's earlier response to your concerns about rotating
> calls in the call queue.
> Our design
> 1. targets directly the livelock problem by rejecting calls on the
> Observer that are not likely to be responded in timely matter: HDFS-13873.
> 2. The call queue rotation is only done on Observers, and never on the
> active NN, so it stays free of attacks like you suggest.
>
> If this is a satisfactory mitigation for the problem could you please
> reconsider your -1, so that people could continue voting on this thread.
>
> Thanks,
> --Konst
>
> On Thu, Dec 6, 2018 at 10:38 AM Daryn Sharp <da...@oath.com> wrote:
>
>> -1 pending additional info.  After a cursory scan, I have serious
>> concerns regarding the design.  This seems like a feature that should have
>> been purely implemented in hdfs w/o touching the common IPC layer.
>>
>> The biggest issue in the alignment context.  It's purpose appears to be
>> for allowing handlers to reinsert calls back into the call queue.  That's
>> completely unacceptable.  A buggy or malicious client can easily cause
>> livelock in the IPC layer with handlers only looping on calls that never
>> satisfy the condition.  Why is this not implemented via RetriableExceptions?
>>
>> On Thu, Dec 6, 2018 at 1:24 AM Yongjun Zhang <yz...@cloudera.com.invalid>
>> wrote:
>>
>>> Great work guys.
>>>
>>> Wonder if we can elaborate what's impact of not having #2 fixed, and why
>>> #2
>>> is not needed for the feature to complete?
>>> 2. Need to fix automatic failover with ZKFC. Currently it does not
>>> doesn't
>>> know about ObserverNodes trying to convert them to SBNs.
>>>
>>> Thanks.
>>> --Yongjun
>>>
>>>
>>> On Wed, Dec 5, 2018 at 5:27 PM Konstantin Shvachko <shv.hadoop@gmail.com
>>> >
>>> wrote:
>>>
>>> > Hi Hadoop developers,
>>> >
>>> > I would like to propose to merge to trunk the feature branch
>>> HDFS-12943 for
>>> > Consistent Reads from Standby Node. The feature is intended to scale
>>> read
>>> > RPC workloads. On large clusters reads comprise 95% of all RPCs to the
>>> > NameNode. We should be able to accommodate higher overall RPC
>>> workloads (up
>>> > to 4x by some estimates) by adding multiple ObserverNodes.
>>> >
>>> > The main functionality has been implemented see sub-tasks of
>>> HDFS-12943.
>>> > We followed up with the test plan. Testing was done on two independent
>>> > clusters (see HDFS-14058 and HDFS-14059) with security enabled.
>>> > We ran standard HDFS commands, MR jobs, admin commands including manual
>>> > failover.
>>> > We know of one cluster running this feature in production.
>>> >
>>> > There are a few outstanding issues:
>>> > 1. Need to provide proper documentation - a user guide for the new
>>> feature
>>> > 2. Need to fix automatic failover with ZKFC. Currently it does not
>>> doesn't
>>> > know about ObserverNodes trying to convert them to SBNs.
>>> > 3. Scale testing and performance fine-tuning
>>> > 4. As testing progresses, we continue fixing non-critical bugs like
>>> > HDFS-14116.
>>> >
>>> > I attached a unified patch to the umbrella jira for the review and
>>> Jenkins
>>> > build.
>>> > Please vote on this thread. The vote will run for 7 days until Wed Dec
>>> 12.
>>> >
>>> > Thanks,
>>> > --Konstantin
>>> >
>>>
>>
>>
>> --
>>
>> Daryn
>>
>

Re: [VOTE] Merge HDFS-12943 branch to trunk - Consistent Reads from Standby

Posted by Yongjun Zhang <yz...@cloudera.com.INVALID>.

Hi Konstantin,

Thanks for addressing my other question about failover.

Some thought to share about the suggestion Daryn made.  Seems we could try
this: let ObserverNode throws an RetriableException back to client saying
it has not reached the transaction ID to serve the client yet, maybe even
include the transaction ID gap information in the exception, then when the
client received the RetriableException, it can decide whether the continue
to send the request to the observer node again, or to the active NN when
the gap is too big.

Though saving another RPC would help the performance with the current
implementation, I expect the above mentioned exception only happens
infrequently, so the performance won't be too bad, plus the client has a
chance to try ANN when knowing that the observer is too behind at extreme
case.

I wonder how different the performance is between these two approaches in
cluster with real workload.

Comments?

--Yongjun

On Fri, Dec 7, 2018 at 4:10 PM Konstantin Shvachko <sh...@gmail.com>
wrote:

> Hi Daryn,
>
> Wanted to backup Chen's earlier response to your concerns about rotating
> calls in the call queue.
> Our design
> 1. targets directly the livelock problem by rejecting calls on the Observer
> that are not likely to be responded in timely matter: HDFS-13873.
> 2. The call queue rotation is only done on Observers, and never on the
> active NN, so it stays free of attacks like you suggest.
>
> If this is a satisfactory mitigation for the problem could you please
> reconsider your -1, so that people could continue voting on this thread.
>
> Thanks,
> --Konst
>
> On Thu, Dec 6, 2018 at 10:38 AM Daryn Sharp <da...@oath.com> wrote:
>
> > -1 pending additional info.  After a cursory scan, I have serious
> concerns
> > regarding the design.  This seems like a feature that should have been
> > purely implemented in hdfs w/o touching the common IPC layer.
> >
> > The biggest issue in the alignment context.  It's purpose appears to be
> > for allowing handlers to reinsert calls back into the call queue.  That's
> > completely unacceptable.  A buggy or malicious client can easily cause
> > livelock in the IPC layer with handlers only looping on calls that never
> > satisfy the condition.  Why is this not implemented via
> RetriableExceptions?
> >
> > On Thu, Dec 6, 2018 at 1:24 AM Yongjun Zhang <yzhang@cloudera.com.invalid
> >
> > wrote:
> >
> >> Great work guys.
> >>
> >> Wonder if we can elaborate what's impact of not having #2 fixed, and why
> >> #2
> >> is not needed for the feature to complete?
> >> 2. Need to fix automatic failover with ZKFC. Currently it does not
> doesn't
> >> know about ObserverNodes trying to convert them to SBNs.
> >>
> >> Thanks.
> >> --Yongjun
> >>
> >>
> >> On Wed, Dec 5, 2018 at 5:27 PM Konstantin Shvachko <
> shv.hadoop@gmail.com>
> >> wrote:
> >>
> >> > Hi Hadoop developers,
> >> >
> >> > I would like to propose to merge to trunk the feature branch
> HDFS-12943
> >> for
> >> > Consistent Reads from Standby Node. The feature is intended to scale
> >> read
> >> > RPC workloads. On large clusters reads comprise 95% of all RPCs to the
> >> > NameNode. We should be able to accommodate higher overall RPC
> workloads
> >> (up
> >> > to 4x by some estimates) by adding multiple ObserverNodes.
> >> >
> >> > The main functionality has been implemented see sub-tasks of
> HDFS-12943.
> >> > We followed up with the test plan. Testing was done on two independent
> >> > clusters (see HDFS-14058 and HDFS-14059) with security enabled.
> >> > We ran standard HDFS commands, MR jobs, admin commands including
> manual
> >> > failover.
> >> > We know of one cluster running this feature in production.
> >> >
> >> > There are a few outstanding issues:
> >> > 1. Need to provide proper documentation - a user guide for the new
> >> feature
> >> > 2. Need to fix automatic failover with ZKFC. Currently it does not
> >> doesn't
> >> > know about ObserverNodes trying to convert them to SBNs.
> >> > 3. Scale testing and performance fine-tuning
> >> > 4. As testing progresses, we continue fixing non-critical bugs like
> >> > HDFS-14116.
> >> >
> >> > I attached a unified patch to the umbrella jira for the review and
> >> Jenkins
> >> > build.
> >> > Please vote on this thread. The vote will run for 7 days until Wed Dec
> >> 12.
> >> >
> >> > Thanks,
> >> > --Konstantin
> >> >
> >>
> >
> >
> > --
> >
> > Daryn
> >
>

Re: [VOTE] Merge HDFS-12943 branch to trunk - Consistent Reads from Standby

Posted by Yongjun Zhang <yz...@cloudera.com.INVALID>.

Hi Konstantin,

Thanks for addressing my other question about failover.

Some thought to share about the suggestion Daryn made.  Seems we could try
this: let ObserverNode throws an RetriableException back to client saying
it has not reached the transaction ID to serve the client yet, maybe even
include the transaction ID gap information in the exception, then when the
client received the RetriableException, it can decide whether the continue
to send the request to the observer node again, or to the active NN when
the gap is too big.

Though saving another RPC would help the performance with the current
implementation, I expect the above mentioned exception only happens
infrequently, so the performance won't be too bad, plus the client has a
chance to try ANN when knowing that the observer is too behind at extreme
case.

I wonder how different the performance is between these two approaches in
cluster with real workload.

Comments?

--Yongjun

On Fri, Dec 7, 2018 at 4:10 PM Konstantin Shvachko <sh...@gmail.com>
wrote:

> Hi Daryn,
>
> Wanted to backup Chen's earlier response to your concerns about rotating
> calls in the call queue.
> Our design
> 1. targets directly the livelock problem by rejecting calls on the Observer
> that are not likely to be responded in timely matter: HDFS-13873.
> 2. The call queue rotation is only done on Observers, and never on the
> active NN, so it stays free of attacks like you suggest.
>
> If this is a satisfactory mitigation for the problem could you please
> reconsider your -1, so that people could continue voting on this thread.
>
> Thanks,
> --Konst
>
> On Thu, Dec 6, 2018 at 10:38 AM Daryn Sharp <da...@oath.com> wrote:
>
> > -1 pending additional info.  After a cursory scan, I have serious
> concerns
> > regarding the design.  This seems like a feature that should have been
> > purely implemented in hdfs w/o touching the common IPC layer.
> >
> > The biggest issue in the alignment context.  It's purpose appears to be
> > for allowing handlers to reinsert calls back into the call queue.  That's
> > completely unacceptable.  A buggy or malicious client can easily cause
> > livelock in the IPC layer with handlers only looping on calls that never
> > satisfy the condition.  Why is this not implemented via
> RetriableExceptions?
> >
> > On Thu, Dec 6, 2018 at 1:24 AM Yongjun Zhang <yzhang@cloudera.com.invalid
> >
> > wrote:
> >
> >> Great work guys.
> >>
> >> Wonder if we can elaborate what's impact of not having #2 fixed, and why
> >> #2
> >> is not needed for the feature to complete?
> >> 2. Need to fix automatic failover with ZKFC. Currently it does not
> doesn't
> >> know about ObserverNodes trying to convert them to SBNs.
> >>
> >> Thanks.
> >> --Yongjun
> >>
> >>
> >> On Wed, Dec 5, 2018 at 5:27 PM Konstantin Shvachko <
> shv.hadoop@gmail.com>
> >> wrote:
> >>
> >> > Hi Hadoop developers,
> >> >
> >> > I would like to propose to merge to trunk the feature branch
> HDFS-12943
> >> for
> >> > Consistent Reads from Standby Node. The feature is intended to scale
> >> read
> >> > RPC workloads. On large clusters reads comprise 95% of all RPCs to the
> >> > NameNode. We should be able to accommodate higher overall RPC
> workloads
> >> (up
> >> > to 4x by some estimates) by adding multiple ObserverNodes.
> >> >
> >> > The main functionality has been implemented see sub-tasks of
> HDFS-12943.
> >> > We followed up with the test plan. Testing was done on two independent
> >> > clusters (see HDFS-14058 and HDFS-14059) with security enabled.
> >> > We ran standard HDFS commands, MR jobs, admin commands including
> manual
> >> > failover.
> >> > We know of one cluster running this feature in production.
> >> >
> >> > There are a few outstanding issues:
> >> > 1. Need to provide proper documentation - a user guide for the new
> >> feature
> >> > 2. Need to fix automatic failover with ZKFC. Currently it does not
> >> doesn't
> >> > know about ObserverNodes trying to convert them to SBNs.
> >> > 3. Scale testing and performance fine-tuning
> >> > 4. As testing progresses, we continue fixing non-critical bugs like
> >> > HDFS-14116.
> >> >
> >> > I attached a unified patch to the umbrella jira for the review and
> >> Jenkins
> >> > build.
> >> > Please vote on this thread. The vote will run for 7 days until Wed Dec
> >> 12.
> >> >
> >> > Thanks,
> >> > --Konstantin
> >> >
> >>
> >
> >
> > --
> >
> > Daryn
> >
>

[Result] [VOTE] Merge HDFS-12943 branch to trunk - Consistent Reads from Standby

Posted by Konstantin Shvachko <sh...@gmail.com>.

This vote failed due to Daryn Sharp's veto.
The concern is being addressed by HDFS-13873. I will start a new vote once
this is committed.

Note for Daryn. Your non-responsive handling of the veto makes a bad
precedence and is a bad example of communication on the lists from a
respected member of this community. Please check your availability for
followup discussions if you choose to get involved with important decisions.

On Fri, Dec 7, 2018 at 4:10 PM Konstantin Shvachko <sh...@gmail.com>
wrote:

> Hi Daryn,
>
> Wanted to backup Chen's earlier response to your concerns about rotating
> calls in the call queue.
> Our design
> 1. targets directly the livelock problem by rejecting calls on the
> Observer that are not likely to be responded in timely matter: HDFS-13873.
> 2. The call queue rotation is only done on Observers, and never on the
> active NN, so it stays free of attacks like you suggest.
>
> If this is a satisfactory mitigation for the problem could you please
> reconsider your -1, so that people could continue voting on this thread.
>
> Thanks,
> --Konst
>
> On Thu, Dec 6, 2018 at 10:38 AM Daryn Sharp <da...@oath.com> wrote:
>
>> -1 pending additional info.  After a cursory scan, I have serious
>> concerns regarding the design.  This seems like a feature that should have
>> been purely implemented in hdfs w/o touching the common IPC layer.
>>
>> The biggest issue in the alignment context.  It's purpose appears to be
>> for allowing handlers to reinsert calls back into the call queue.  That's
>> completely unacceptable.  A buggy or malicious client can easily cause
>> livelock in the IPC layer with handlers only looping on calls that never
>> satisfy the condition.  Why is this not implemented via RetriableExceptions?
>>
>> On Thu, Dec 6, 2018 at 1:24 AM Yongjun Zhang <yz...@cloudera.com.invalid>
>> wrote:
>>
>>> Great work guys.
>>>
>>> Wonder if we can elaborate what's impact of not having #2 fixed, and why
>>> #2
>>> is not needed for the feature to complete?
>>> 2. Need to fix automatic failover with ZKFC. Currently it does not
>>> doesn't
>>> know about ObserverNodes trying to convert them to SBNs.
>>>
>>> Thanks.
>>> --Yongjun
>>>
>>>
>>> On Wed, Dec 5, 2018 at 5:27 PM Konstantin Shvachko <shv.hadoop@gmail.com
>>> >
>>> wrote:
>>>
>>> > Hi Hadoop developers,
>>> >
>>> > I would like to propose to merge to trunk the feature branch
>>> HDFS-12943 for
>>> > Consistent Reads from Standby Node. The feature is intended to scale
>>> read
>>> > RPC workloads. On large clusters reads comprise 95% of all RPCs to the
>>> > NameNode. We should be able to accommodate higher overall RPC
>>> workloads (up
>>> > to 4x by some estimates) by adding multiple ObserverNodes.
>>> >
>>> > The main functionality has been implemented see sub-tasks of
>>> HDFS-12943.
>>> > We followed up with the test plan. Testing was done on two independent
>>> > clusters (see HDFS-14058 and HDFS-14059) with security enabled.
>>> > We ran standard HDFS commands, MR jobs, admin commands including manual
>>> > failover.
>>> > We know of one cluster running this feature in production.
>>> >
>>> > There are a few outstanding issues:
>>> > 1. Need to provide proper documentation - a user guide for the new
>>> feature
>>> > 2. Need to fix automatic failover with ZKFC. Currently it does not
>>> doesn't
>>> > know about ObserverNodes trying to convert them to SBNs.
>>> > 3. Scale testing and performance fine-tuning
>>> > 4. As testing progresses, we continue fixing non-critical bugs like
>>> > HDFS-14116.
>>> >
>>> > I attached a unified patch to the umbrella jira for the review and
>>> Jenkins
>>> > build.
>>> > Please vote on this thread. The vote will run for 7 days until Wed Dec
>>> 12.
>>> >
>>> > Thanks,
>>> > --Konstantin
>>> >
>>>
>>
>>
>> --
>>
>> Daryn
>>
>

Re: [VOTE] Merge HDFS-12943 branch to trunk - Consistent Reads from Standby

Posted by Konstantin Shvachko <sh...@gmail.com>.

Hi Daryn,

Wanted to backup Chen's earlier response to your concerns about rotating
calls in the call queue.
Our design
1. targets directly the livelock problem by rejecting calls on the Observer
that are not likely to be responded in timely matter: HDFS-13873.
2. The call queue rotation is only done on Observers, and never on the
active NN, so it stays free of attacks like you suggest.

If this is a satisfactory mitigation for the problem could you please
reconsider your -1, so that people could continue voting on this thread.

Thanks,
--Konst

On Thu, Dec 6, 2018 at 10:38 AM Daryn Sharp <da...@oath.com> wrote:

> -1 pending additional info.  After a cursory scan, I have serious concerns
> regarding the design.  This seems like a feature that should have been
> purely implemented in hdfs w/o touching the common IPC layer.
>
> The biggest issue in the alignment context.  It's purpose appears to be
> for allowing handlers to reinsert calls back into the call queue.  That's
> completely unacceptable.  A buggy or malicious client can easily cause
> livelock in the IPC layer with handlers only looping on calls that never
> satisfy the condition.  Why is this not implemented via RetriableExceptions?
>
> On Thu, Dec 6, 2018 at 1:24 AM Yongjun Zhang <yz...@cloudera.com.invalid>
> wrote:
>
>> Great work guys.
>>
>> Wonder if we can elaborate what's impact of not having #2 fixed, and why
>> #2
>> is not needed for the feature to complete?
>> 2. Need to fix automatic failover with ZKFC. Currently it does not doesn't
>> know about ObserverNodes trying to convert them to SBNs.
>>
>> Thanks.
>> --Yongjun
>>
>>
>> On Wed, Dec 5, 2018 at 5:27 PM Konstantin Shvachko <sh...@gmail.com>
>> wrote:
>>
>> > Hi Hadoop developers,
>> >
>> > I would like to propose to merge to trunk the feature branch HDFS-12943
>> for
>> > Consistent Reads from Standby Node. The feature is intended to scale
>> read
>> > RPC workloads. On large clusters reads comprise 95% of all RPCs to the
>> > NameNode. We should be able to accommodate higher overall RPC workloads
>> (up
>> > to 4x by some estimates) by adding multiple ObserverNodes.
>> >
>> > The main functionality has been implemented see sub-tasks of HDFS-12943.
>> > We followed up with the test plan. Testing was done on two independent
>> > clusters (see HDFS-14058 and HDFS-14059) with security enabled.
>> > We ran standard HDFS commands, MR jobs, admin commands including manual
>> > failover.
>> > We know of one cluster running this feature in production.
>> >
>> > There are a few outstanding issues:
>> > 1. Need to provide proper documentation - a user guide for the new
>> feature
>> > 2. Need to fix automatic failover with ZKFC. Currently it does not
>> doesn't
>> > know about ObserverNodes trying to convert them to SBNs.
>> > 3. Scale testing and performance fine-tuning
>> > 4. As testing progresses, we continue fixing non-critical bugs like
>> > HDFS-14116.
>> >
>> > I attached a unified patch to the umbrella jira for the review and
>> Jenkins
>> > build.
>> > Please vote on this thread. The vote will run for 7 days until Wed Dec
>> 12.
>> >
>> > Thanks,
>> > --Konstantin
>> >
>>
>
>
> --
>
> Daryn
>

Re: [VOTE] Merge HDFS-12943 branch to trunk - Consistent Reads from Standby

Posted by Anu Engineer <ae...@hortonworks.com>.

Hi Daryn,

I have just started reading the patch. Hence my apologies if my question has a response somewhere hidden in the patch.

Are you concerned that FSEditLock is taken in GlobalStateIdContext on Server side, and worried that a malicious or stupid client would 
cause this lock to be held up for a long time?

How do retriable exceptions help? Wouldn’t the system eventually hold the lock similarly?

I am asking to understand this better so that I get a better sense when I am reading the code.

Thanks
Anu


On 12/6/18, 10:38 AM, "Daryn Sharp" <da...@oath.com.INVALID> wrote:

    -1 pending additional info.  After a cursory scan, I have serious concerns
    regarding the design.  This seems like a feature that should have been
    purely implemented in hdfs w/o touching the common IPC layer.
    
    The biggest issue in the alignment context.  It's purpose appears to be for
    allowing handlers to reinsert calls back into the call queue.  That's
    completely unacceptable.  A buggy or malicious client can easily cause
    livelock in the IPC layer with handlers only looping on calls that never
    satisfy the condition.  Why is this not implemented via RetriableExceptions?
    
    On Thu, Dec 6, 2018 at 1:24 AM Yongjun Zhang <yz...@cloudera.com.invalid>
    wrote:
    
    > Great work guys.
    >
    > Wonder if we can elaborate what's impact of not having #2 fixed, and why #2
    > is not needed for the feature to complete?
    > 2. Need to fix automatic failover with ZKFC. Currently it does not doesn't
    > know about ObserverNodes trying to convert them to SBNs.
    >
    > Thanks.
    > --Yongjun
    >
    >
    > On Wed, Dec 5, 2018 at 5:27 PM Konstantin Shvachko <sh...@gmail.com>
    > wrote:
    >
    > > Hi Hadoop developers,
    > >
    > > I would like to propose to merge to trunk the feature branch HDFS-12943
    > for
    > > Consistent Reads from Standby Node. The feature is intended to scale read
    > > RPC workloads. On large clusters reads comprise 95% of all RPCs to the
    > > NameNode. We should be able to accommodate higher overall RPC workloads
    > (up
    > > to 4x by some estimates) by adding multiple ObserverNodes.
    > >
    > > The main functionality has been implemented see sub-tasks of HDFS-12943.
    > > We followed up with the test plan. Testing was done on two independent
    > > clusters (see HDFS-14058 and HDFS-14059) with security enabled.
    > > We ran standard HDFS commands, MR jobs, admin commands including manual
    > > failover.
    > > We know of one cluster running this feature in production.
    > >
    > > There are a few outstanding issues:
    > > 1. Need to provide proper documentation - a user guide for the new
    > feature
    > > 2. Need to fix automatic failover with ZKFC. Currently it does not
    > doesn't
    > > know about ObserverNodes trying to convert them to SBNs.
    > > 3. Scale testing and performance fine-tuning
    > > 4. As testing progresses, we continue fixing non-critical bugs like
    > > HDFS-14116.
    > >
    > > I attached a unified patch to the umbrella jira for the review and
    > Jenkins
    > > build.
    > > Please vote on this thread. The vote will run for 7 days until Wed Dec
    > 12.
    > >
    > > Thanks,
    > > --Konstantin
    > >
    >
    
    
    -- 
    
    Daryn

Re: [VOTE] Merge HDFS-12943 branch to trunk - Consistent Reads from Standby

Posted by Konstantin Shvachko <sh...@gmail.com>.

Hi Daryn,

Wanted to backup Chen's earlier response to your concerns about rotating
calls in the call queue.
Our design
1. targets directly the livelock problem by rejecting calls on the Observer
that are not likely to be responded in timely matter: HDFS-13873.
2. The call queue rotation is only done on Observers, and never on the
active NN, so it stays free of attacks like you suggest.

If this is a satisfactory mitigation for the problem could you please
reconsider your -1, so that people could continue voting on this thread.

Thanks,
--Konst

On Thu, Dec 6, 2018 at 10:38 AM Daryn Sharp <da...@oath.com> wrote:

> -1 pending additional info.  After a cursory scan, I have serious concerns
> regarding the design.  This seems like a feature that should have been
> purely implemented in hdfs w/o touching the common IPC layer.
>
> The biggest issue in the alignment context.  It's purpose appears to be
> for allowing handlers to reinsert calls back into the call queue.  That's
> completely unacceptable.  A buggy or malicious client can easily cause
> livelock in the IPC layer with handlers only looping on calls that never
> satisfy the condition.  Why is this not implemented via RetriableExceptions?
>
> On Thu, Dec 6, 2018 at 1:24 AM Yongjun Zhang <yz...@cloudera.com.invalid>
> wrote:
>
>> Great work guys.
>>
>> Wonder if we can elaborate what's impact of not having #2 fixed, and why
>> #2
>> is not needed for the feature to complete?
>> 2. Need to fix automatic failover with ZKFC. Currently it does not doesn't
>> know about ObserverNodes trying to convert them to SBNs.
>>
>> Thanks.
>> --Yongjun
>>
>>
>> On Wed, Dec 5, 2018 at 5:27 PM Konstantin Shvachko <sh...@gmail.com>
>> wrote:
>>
>> > Hi Hadoop developers,
>> >
>> > I would like to propose to merge to trunk the feature branch HDFS-12943
>> for
>> > Consistent Reads from Standby Node. The feature is intended to scale
>> read
>> > RPC workloads. On large clusters reads comprise 95% of all RPCs to the
>> > NameNode. We should be able to accommodate higher overall RPC workloads
>> (up
>> > to 4x by some estimates) by adding multiple ObserverNodes.
>> >
>> > The main functionality has been implemented see sub-tasks of HDFS-12943.
>> > We followed up with the test plan. Testing was done on two independent
>> > clusters (see HDFS-14058 and HDFS-14059) with security enabled.
>> > We ran standard HDFS commands, MR jobs, admin commands including manual
>> > failover.
>> > We know of one cluster running this feature in production.
>> >
>> > There are a few outstanding issues:
>> > 1. Need to provide proper documentation - a user guide for the new
>> feature
>> > 2. Need to fix automatic failover with ZKFC. Currently it does not
>> doesn't
>> > know about ObserverNodes trying to convert them to SBNs.
>> > 3. Scale testing and performance fine-tuning
>> > 4. As testing progresses, we continue fixing non-critical bugs like
>> > HDFS-14116.
>> >
>> > I attached a unified patch to the umbrella jira for the review and
>> Jenkins
>> > build.
>> > Please vote on this thread. The vote will run for 7 days until Wed Dec
>> 12.
>> >
>> > Thanks,
>> > --Konstantin
>> >
>>
>
>
> --
>
> Daryn
>

Re: [VOTE] Merge HDFS-12943 branch to trunk - Consistent Reads from Standby

Posted by Anu Engineer <ae...@hortonworks.com>.

Hi Daryn,

I have just started reading the patch. Hence my apologies if my question has a response somewhere hidden in the patch.

Are you concerned that FSEditLock is taken in GlobalStateIdContext on Server side, and worried that a malicious or stupid client would 
cause this lock to be held up for a long time?

How do retriable exceptions help? Wouldn’t the system eventually hold the lock similarly?

I am asking to understand this better so that I get a better sense when I am reading the code.

Thanks
Anu


On 12/6/18, 10:38 AM, "Daryn Sharp" <da...@oath.com.INVALID> wrote:

    -1 pending additional info.  After a cursory scan, I have serious concerns
    regarding the design.  This seems like a feature that should have been
    purely implemented in hdfs w/o touching the common IPC layer.
    
    The biggest issue in the alignment context.  It's purpose appears to be for
    allowing handlers to reinsert calls back into the call queue.  That's
    completely unacceptable.  A buggy or malicious client can easily cause
    livelock in the IPC layer with handlers only looping on calls that never
    satisfy the condition.  Why is this not implemented via RetriableExceptions?
    
    On Thu, Dec 6, 2018 at 1:24 AM Yongjun Zhang <yz...@cloudera.com.invalid>
    wrote:
    
    > Great work guys.
    >
    > Wonder if we can elaborate what's impact of not having #2 fixed, and why #2
    > is not needed for the feature to complete?
    > 2. Need to fix automatic failover with ZKFC. Currently it does not doesn't
    > know about ObserverNodes trying to convert them to SBNs.
    >
    > Thanks.
    > --Yongjun
    >
    >
    > On Wed, Dec 5, 2018 at 5:27 PM Konstantin Shvachko <sh...@gmail.com>
    > wrote:
    >
    > > Hi Hadoop developers,
    > >
    > > I would like to propose to merge to trunk the feature branch HDFS-12943
    > for
    > > Consistent Reads from Standby Node. The feature is intended to scale read
    > > RPC workloads. On large clusters reads comprise 95% of all RPCs to the
    > > NameNode. We should be able to accommodate higher overall RPC workloads
    > (up
    > > to 4x by some estimates) by adding multiple ObserverNodes.
    > >
    > > The main functionality has been implemented see sub-tasks of HDFS-12943.
    > > We followed up with the test plan. Testing was done on two independent
    > > clusters (see HDFS-14058 and HDFS-14059) with security enabled.
    > > We ran standard HDFS commands, MR jobs, admin commands including manual
    > > failover.
    > > We know of one cluster running this feature in production.
    > >
    > > There are a few outstanding issues:
    > > 1. Need to provide proper documentation - a user guide for the new
    > feature
    > > 2. Need to fix automatic failover with ZKFC. Currently it does not
    > doesn't
    > > know about ObserverNodes trying to convert them to SBNs.
    > > 3. Scale testing and performance fine-tuning
    > > 4. As testing progresses, we continue fixing non-critical bugs like
    > > HDFS-14116.
    > >
    > > I attached a unified patch to the umbrella jira for the review and
    > Jenkins
    > > build.
    > > Please vote on this thread. The vote will run for 7 days until Wed Dec
    > 12.
    > >
    > > Thanks,
    > > --Konstantin
    > >
    >
    
    
    -- 
    
    Daryn

Re: [VOTE] Merge HDFS-12943 branch to trunk - Consistent Reads from Standby

Posted by Konstantin Shvachko <sh...@gmail.com>.

Hi Daryn,

Wanted to backup Chen's earlier response to your concerns about rotating
calls in the call queue.
Our design
1. targets directly the livelock problem by rejecting calls on the Observer
that are not likely to be responded in timely matter: HDFS-13873.
2. The call queue rotation is only done on Observers, and never on the
active NN, so it stays free of attacks like you suggest.

If this is a satisfactory mitigation for the problem could you please
reconsider your -1, so that people could continue voting on this thread.

Thanks,
--Konst

On Thu, Dec 6, 2018 at 10:38 AM Daryn Sharp <da...@oath.com> wrote:

> -1 pending additional info.  After a cursory scan, I have serious concerns
> regarding the design.  This seems like a feature that should have been
> purely implemented in hdfs w/o touching the common IPC layer.
>
> The biggest issue in the alignment context.  It's purpose appears to be
> for allowing handlers to reinsert calls back into the call queue.  That's
> completely unacceptable.  A buggy or malicious client can easily cause
> livelock in the IPC layer with handlers only looping on calls that never
> satisfy the condition.  Why is this not implemented via RetriableExceptions?
>
> On Thu, Dec 6, 2018 at 1:24 AM Yongjun Zhang <yz...@cloudera.com.invalid>
> wrote:
>
>> Great work guys.
>>
>> Wonder if we can elaborate what's impact of not having #2 fixed, and why
>> #2
>> is not needed for the feature to complete?
>> 2. Need to fix automatic failover with ZKFC. Currently it does not doesn't
>> know about ObserverNodes trying to convert them to SBNs.
>>
>> Thanks.
>> --Yongjun
>>
>>
>> On Wed, Dec 5, 2018 at 5:27 PM Konstantin Shvachko <sh...@gmail.com>
>> wrote:
>>
>> > Hi Hadoop developers,
>> >
>> > I would like to propose to merge to trunk the feature branch HDFS-12943
>> for
>> > Consistent Reads from Standby Node. The feature is intended to scale
>> read
>> > RPC workloads. On large clusters reads comprise 95% of all RPCs to the
>> > NameNode. We should be able to accommodate higher overall RPC workloads
>> (up
>> > to 4x by some estimates) by adding multiple ObserverNodes.
>> >
>> > The main functionality has been implemented see sub-tasks of HDFS-12943.
>> > We followed up with the test plan. Testing was done on two independent
>> > clusters (see HDFS-14058 and HDFS-14059) with security enabled.
>> > We ran standard HDFS commands, MR jobs, admin commands including manual
>> > failover.
>> > We know of one cluster running this feature in production.
>> >
>> > There are a few outstanding issues:
>> > 1. Need to provide proper documentation - a user guide for the new
>> feature
>> > 2. Need to fix automatic failover with ZKFC. Currently it does not
>> doesn't
>> > know about ObserverNodes trying to convert them to SBNs.
>> > 3. Scale testing and performance fine-tuning
>> > 4. As testing progresses, we continue fixing non-critical bugs like
>> > HDFS-14116.
>> >
>> > I attached a unified patch to the umbrella jira for the review and
>> Jenkins
>> > build.
>> > Please vote on this thread. The vote will run for 7 days until Wed Dec
>> 12.
>> >
>> > Thanks,
>> > --Konstantin
>> >
>>
>
>
> --
>
> Daryn
>

Re: [VOTE] Merge HDFS-12943 branch to trunk - Consistent Reads from Standby

Posted by Anu Engineer <ae...@hortonworks.com>.

Hi Daryn,

I have just started reading the patch. Hence my apologies if my question has a response somewhere hidden in the patch.

Are you concerned that FSEditLock is taken in GlobalStateIdContext on Server side, and worried that a malicious or stupid client would 
cause this lock to be held up for a long time?

How do retriable exceptions help? Wouldn’t the system eventually hold the lock similarly?

I am asking to understand this better so that I get a better sense when I am reading the code.

Thanks
Anu


On 12/6/18, 10:38 AM, "Daryn Sharp" <da...@oath.com.INVALID> wrote:

    -1 pending additional info.  After a cursory scan, I have serious concerns
    regarding the design.  This seems like a feature that should have been
    purely implemented in hdfs w/o touching the common IPC layer.
    
    The biggest issue in the alignment context.  It's purpose appears to be for
    allowing handlers to reinsert calls back into the call queue.  That's
    completely unacceptable.  A buggy or malicious client can easily cause
    livelock in the IPC layer with handlers only looping on calls that never
    satisfy the condition.  Why is this not implemented via RetriableExceptions?
    
    On Thu, Dec 6, 2018 at 1:24 AM Yongjun Zhang <yz...@cloudera.com.invalid>
    wrote:
    
    > Great work guys.
    >
    > Wonder if we can elaborate what's impact of not having #2 fixed, and why #2
    > is not needed for the feature to complete?
    > 2. Need to fix automatic failover with ZKFC. Currently it does not doesn't
    > know about ObserverNodes trying to convert them to SBNs.
    >
    > Thanks.
    > --Yongjun
    >
    >
    > On Wed, Dec 5, 2018 at 5:27 PM Konstantin Shvachko <sh...@gmail.com>
    > wrote:
    >
    > > Hi Hadoop developers,
    > >
    > > I would like to propose to merge to trunk the feature branch HDFS-12943
    > for
    > > Consistent Reads from Standby Node. The feature is intended to scale read
    > > RPC workloads. On large clusters reads comprise 95% of all RPCs to the
    > > NameNode. We should be able to accommodate higher overall RPC workloads
    > (up
    > > to 4x by some estimates) by adding multiple ObserverNodes.
    > >
    > > The main functionality has been implemented see sub-tasks of HDFS-12943.
    > > We followed up with the test plan. Testing was done on two independent
    > > clusters (see HDFS-14058 and HDFS-14059) with security enabled.
    > > We ran standard HDFS commands, MR jobs, admin commands including manual
    > > failover.
    > > We know of one cluster running this feature in production.
    > >
    > > There are a few outstanding issues:
    > > 1. Need to provide proper documentation - a user guide for the new
    > feature
    > > 2. Need to fix automatic failover with ZKFC. Currently it does not
    > doesn't
    > > know about ObserverNodes trying to convert them to SBNs.
    > > 3. Scale testing and performance fine-tuning
    > > 4. As testing progresses, we continue fixing non-critical bugs like
    > > HDFS-14116.
    > >
    > > I attached a unified patch to the umbrella jira for the review and
    > Jenkins
    > > build.
    > > Please vote on this thread. The vote will run for 7 days until Wed Dec
    > 12.
    > >
    > > Thanks,
    > > --Konstantin
    > >
    >
    
    
    -- 
    
    Daryn

Re: [VOTE] Merge HDFS-12943 branch to trunk - Consistent Reads from Standby

Posted by Konstantin Shvachko <sh...@gmail.com>.

Hi Daryn,

Wanted to backup Chen's earlier response to your concerns about rotating
calls in the call queue.
Our design
1. targets directly the livelock problem by rejecting calls on the Observer
that are not likely to be responded in timely matter: HDFS-13873.
2. The call queue rotation is only done on Observers, and never on the
active NN, so it stays free of attacks like you suggest.

If this is a satisfactory mitigation for the problem could you please
reconsider your -1, so that people could continue voting on this thread.

Thanks,
--Konst

On Thu, Dec 6, 2018 at 10:38 AM Daryn Sharp <da...@oath.com> wrote:

> -1 pending additional info.  After a cursory scan, I have serious concerns
> regarding the design.  This seems like a feature that should have been
> purely implemented in hdfs w/o touching the common IPC layer.
>
> The biggest issue in the alignment context.  It's purpose appears to be
> for allowing handlers to reinsert calls back into the call queue.  That's
> completely unacceptable.  A buggy or malicious client can easily cause
> livelock in the IPC layer with handlers only looping on calls that never
> satisfy the condition.  Why is this not implemented via RetriableExceptions?
>
> On Thu, Dec 6, 2018 at 1:24 AM Yongjun Zhang <yz...@cloudera.com.invalid>
> wrote:
>
>> Great work guys.
>>
>> Wonder if we can elaborate what's impact of not having #2 fixed, and why
>> #2
>> is not needed for the feature to complete?
>> 2. Need to fix automatic failover with ZKFC. Currently it does not doesn't
>> know about ObserverNodes trying to convert them to SBNs.
>>
>> Thanks.
>> --Yongjun
>>
>>
>> On Wed, Dec 5, 2018 at 5:27 PM Konstantin Shvachko <sh...@gmail.com>
>> wrote:
>>
>> > Hi Hadoop developers,
>> >
>> > I would like to propose to merge to trunk the feature branch HDFS-12943
>> for
>> > Consistent Reads from Standby Node. The feature is intended to scale
>> read
>> > RPC workloads. On large clusters reads comprise 95% of all RPCs to the
>> > NameNode. We should be able to accommodate higher overall RPC workloads
>> (up
>> > to 4x by some estimates) by adding multiple ObserverNodes.
>> >
>> > The main functionality has been implemented see sub-tasks of HDFS-12943.
>> > We followed up with the test plan. Testing was done on two independent
>> > clusters (see HDFS-14058 and HDFS-14059) with security enabled.
>> > We ran standard HDFS commands, MR jobs, admin commands including manual
>> > failover.
>> > We know of one cluster running this feature in production.
>> >
>> > There are a few outstanding issues:
>> > 1. Need to provide proper documentation - a user guide for the new
>> feature
>> > 2. Need to fix automatic failover with ZKFC. Currently it does not
>> doesn't
>> > know about ObserverNodes trying to convert them to SBNs.
>> > 3. Scale testing and performance fine-tuning
>> > 4. As testing progresses, we continue fixing non-critical bugs like
>> > HDFS-14116.
>> >
>> > I attached a unified patch to the umbrella jira for the review and
>> Jenkins
>> > build.
>> > Please vote on this thread. The vote will run for 7 days until Wed Dec
>> 12.
>> >
>> > Thanks,
>> > --Konstantin
>> >
>>
>
>
> --
>
> Daryn
>

Re: [VOTE] Merge HDFS-12943 branch to trunk - Consistent Reads from Standby

Posted by Anu Engineer <ae...@hortonworks.com>.

Hi Daryn,

I have just started reading the patch. Hence my apologies if my question has a response somewhere hidden in the patch.

Are you concerned that FSEditLock is taken in GlobalStateIdContext on Server side, and worried that a malicious or stupid client would 
cause this lock to be held up for a long time?

How do retriable exceptions help? Wouldn’t the system eventually hold the lock similarly?

I am asking to understand this better so that I get a better sense when I am reading the code.

Thanks
Anu


On 12/6/18, 10:38 AM, "Daryn Sharp" <da...@oath.com.INVALID> wrote:

    -1 pending additional info.  After a cursory scan, I have serious concerns
    regarding the design.  This seems like a feature that should have been
    purely implemented in hdfs w/o touching the common IPC layer.
    
    The biggest issue in the alignment context.  It's purpose appears to be for
    allowing handlers to reinsert calls back into the call queue.  That's
    completely unacceptable.  A buggy or malicious client can easily cause
    livelock in the IPC layer with handlers only looping on calls that never
    satisfy the condition.  Why is this not implemented via RetriableExceptions?
    
    On Thu, Dec 6, 2018 at 1:24 AM Yongjun Zhang <yz...@cloudera.com.invalid>
    wrote:
    
    > Great work guys.
    >
    > Wonder if we can elaborate what's impact of not having #2 fixed, and why #2
    > is not needed for the feature to complete?
    > 2. Need to fix automatic failover with ZKFC. Currently it does not doesn't
    > know about ObserverNodes trying to convert them to SBNs.
    >
    > Thanks.
    > --Yongjun
    >
    >
    > On Wed, Dec 5, 2018 at 5:27 PM Konstantin Shvachko <sh...@gmail.com>
    > wrote:
    >
    > > Hi Hadoop developers,
    > >
    > > I would like to propose to merge to trunk the feature branch HDFS-12943
    > for
    > > Consistent Reads from Standby Node. The feature is intended to scale read
    > > RPC workloads. On large clusters reads comprise 95% of all RPCs to the
    > > NameNode. We should be able to accommodate higher overall RPC workloads
    > (up
    > > to 4x by some estimates) by adding multiple ObserverNodes.
    > >
    > > The main functionality has been implemented see sub-tasks of HDFS-12943.
    > > We followed up with the test plan. Testing was done on two independent
    > > clusters (see HDFS-14058 and HDFS-14059) with security enabled.
    > > We ran standard HDFS commands, MR jobs, admin commands including manual
    > > failover.
    > > We know of one cluster running this feature in production.
    > >
    > > There are a few outstanding issues:
    > > 1. Need to provide proper documentation - a user guide for the new
    > feature
    > > 2. Need to fix automatic failover with ZKFC. Currently it does not
    > doesn't
    > > know about ObserverNodes trying to convert them to SBNs.
    > > 3. Scale testing and performance fine-tuning
    > > 4. As testing progresses, we continue fixing non-critical bugs like
    > > HDFS-14116.
    > >
    > > I attached a unified patch to the umbrella jira for the review and
    > Jenkins
    > > build.
    > > Please vote on this thread. The vote will run for 7 days until Wed Dec
    > 12.
    > >
    > > Thanks,
    > > --Konstantin
    > >
    >
    
    
    -- 
    
    Daryn

Re: [VOTE] Merge HDFS-12943 branch to trunk - Consistent Reads from Standby

Posted by Daryn Sharp <da...@oath.com.INVALID>.

-1 pending additional info.  After a cursory scan, I have serious concerns
regarding the design.  This seems like a feature that should have been
purely implemented in hdfs w/o touching the common IPC layer.

The biggest issue in the alignment context.  It's purpose appears to be for
allowing handlers to reinsert calls back into the call queue.  That's
completely unacceptable.  A buggy or malicious client can easily cause
livelock in the IPC layer with handlers only looping on calls that never
satisfy the condition.  Why is this not implemented via RetriableExceptions?

On Thu, Dec 6, 2018 at 1:24 AM Yongjun Zhang <yz...@cloudera.com.invalid>
wrote:

> Great work guys.
>
> Wonder if we can elaborate what's impact of not having #2 fixed, and why #2
> is not needed for the feature to complete?
> 2. Need to fix automatic failover with ZKFC. Currently it does not doesn't
> know about ObserverNodes trying to convert them to SBNs.
>
> Thanks.
> --Yongjun
>
>
> On Wed, Dec 5, 2018 at 5:27 PM Konstantin Shvachko <sh...@gmail.com>
> wrote:
>
> > Hi Hadoop developers,
> >
> > I would like to propose to merge to trunk the feature branch HDFS-12943
> for
> > Consistent Reads from Standby Node. The feature is intended to scale read
> > RPC workloads. On large clusters reads comprise 95% of all RPCs to the
> > NameNode. We should be able to accommodate higher overall RPC workloads
> (up
> > to 4x by some estimates) by adding multiple ObserverNodes.
> >
> > The main functionality has been implemented see sub-tasks of HDFS-12943.
> > We followed up with the test plan. Testing was done on two independent
> > clusters (see HDFS-14058 and HDFS-14059) with security enabled.
> > We ran standard HDFS commands, MR jobs, admin commands including manual
> > failover.
> > We know of one cluster running this feature in production.
> >
> > There are a few outstanding issues:
> > 1. Need to provide proper documentation - a user guide for the new
> feature
> > 2. Need to fix automatic failover with ZKFC. Currently it does not
> doesn't
> > know about ObserverNodes trying to convert them to SBNs.
> > 3. Scale testing and performance fine-tuning
> > 4. As testing progresses, we continue fixing non-critical bugs like
> > HDFS-14116.
> >
> > I attached a unified patch to the umbrella jira for the review and
> Jenkins
> > build.
> > Please vote on this thread. The vote will run for 7 days until Wed Dec
> 12.
> >
> > Thanks,
> > --Konstantin
> >
>


-- 

Daryn

Re: [VOTE] Merge HDFS-12943 branch to trunk - Consistent Reads from Standby

Posted by Daryn Sharp <da...@oath.com.INVALID>.

-1 pending additional info.  After a cursory scan, I have serious concerns
regarding the design.  This seems like a feature that should have been
purely implemented in hdfs w/o touching the common IPC layer.

The biggest issue in the alignment context.  It's purpose appears to be for
allowing handlers to reinsert calls back into the call queue.  That's
completely unacceptable.  A buggy or malicious client can easily cause
livelock in the IPC layer with handlers only looping on calls that never
satisfy the condition.  Why is this not implemented via RetriableExceptions?

On Thu, Dec 6, 2018 at 1:24 AM Yongjun Zhang <yz...@cloudera.com.invalid>
wrote:

> Great work guys.
>
> Wonder if we can elaborate what's impact of not having #2 fixed, and why #2
> is not needed for the feature to complete?
> 2. Need to fix automatic failover with ZKFC. Currently it does not doesn't
> know about ObserverNodes trying to convert them to SBNs.
>
> Thanks.
> --Yongjun
>
>
> On Wed, Dec 5, 2018 at 5:27 PM Konstantin Shvachko <sh...@gmail.com>
> wrote:
>
> > Hi Hadoop developers,
> >
> > I would like to propose to merge to trunk the feature branch HDFS-12943
> for
> > Consistent Reads from Standby Node. The feature is intended to scale read
> > RPC workloads. On large clusters reads comprise 95% of all RPCs to the
> > NameNode. We should be able to accommodate higher overall RPC workloads
> (up
> > to 4x by some estimates) by adding multiple ObserverNodes.
> >
> > The main functionality has been implemented see sub-tasks of HDFS-12943.
> > We followed up with the test plan. Testing was done on two independent
> > clusters (see HDFS-14058 and HDFS-14059) with security enabled.
> > We ran standard HDFS commands, MR jobs, admin commands including manual
> > failover.
> > We know of one cluster running this feature in production.
> >
> > There are a few outstanding issues:
> > 1. Need to provide proper documentation - a user guide for the new
> feature
> > 2. Need to fix automatic failover with ZKFC. Currently it does not
> doesn't
> > know about ObserverNodes trying to convert them to SBNs.
> > 3. Scale testing and performance fine-tuning
> > 4. As testing progresses, we continue fixing non-critical bugs like
> > HDFS-14116.
> >
> > I attached a unified patch to the umbrella jira for the review and
> Jenkins
> > build.
> > Please vote on this thread. The vote will run for 7 days until Wed Dec
> 12.
> >
> > Thanks,
> > --Konstantin
> >
>


-- 

Daryn

Re: [VOTE] Merge HDFS-12943 branch to trunk - Consistent Reads from Standby

Posted by Konstantin Shvachko <sh...@gmail.com>.

Hi Yongjun,

Automatic failover sure needs to be fixed (see HDFS-14130 and HDFS-13182).
Along with all other outstanding issues. We plan to continue this on trunk.
The feature is usable now without this issues (see HDFS-14067).
And we would like to get it in, so that people could have early access,
and so that newly developed features were aware of this functionality.
Let us know if you have other suggestions.

Thanks,
--Konstantin

On Wed, Dec 5, 2018 at 11:24 PM Yongjun Zhang <yz...@cloudera.com> wrote:

> Great work guys.
>
> Wonder if we can elaborate what's impact of not having #2 fixed, and why
> #2 is not needed for the feature to complete?
> 2. Need to fix automatic failover with ZKFC. Currently it does not doesn't
> know about ObserverNodes trying to convert them to SBNs.
>
> Thanks.
> --Yongjun
>
>
> On Wed, Dec 5, 2018 at 5:27 PM Konstantin Shvachko <sh...@gmail.com>
> wrote:
>
>> Hi Hadoop developers,
>>
>> I would like to propose to merge to trunk the feature branch HDFS-12943
>> for
>> Consistent Reads from Standby Node. The feature is intended to scale read
>> RPC workloads. On large clusters reads comprise 95% of all RPCs to the
>> NameNode. We should be able to accommodate higher overall RPC workloads
>> (up
>> to 4x by some estimates) by adding multiple ObserverNodes.
>>
>> The main functionality has been implemented see sub-tasks of HDFS-12943.
>> We followed up with the test plan. Testing was done on two independent
>> clusters (see HDFS-14058 and HDFS-14059) with security enabled.
>> We ran standard HDFS commands, MR jobs, admin commands including manual
>> failover.
>> We know of one cluster running this feature in production.
>>
>> There are a few outstanding issues:
>> 1. Need to provide proper documentation - a user guide for the new feature
>> 2. Need to fix automatic failover with ZKFC. Currently it does not doesn't
>> know about ObserverNodes trying to convert them to SBNs.
>> 3. Scale testing and performance fine-tuning
>> 4. As testing progresses, we continue fixing non-critical bugs like
>> HDFS-14116.
>>
>> I attached a unified patch to the umbrella jira for the review and Jenkins
>> build.
>> Please vote on this thread. The vote will run for 7 days until Wed Dec 12.
>>
>> Thanks,
>> --Konstantin
>>
>

Re: [VOTE] Merge HDFS-12943 branch to trunk - Consistent Reads from Standby

Posted by Daryn Sharp <da...@oath.com.INVALID>.

-1 pending additional info.  After a cursory scan, I have serious concerns
regarding the design.  This seems like a feature that should have been
purely implemented in hdfs w/o touching the common IPC layer.

The biggest issue in the alignment context.  It's purpose appears to be for
allowing handlers to reinsert calls back into the call queue.  That's
completely unacceptable.  A buggy or malicious client can easily cause
livelock in the IPC layer with handlers only looping on calls that never
satisfy the condition.  Why is this not implemented via RetriableExceptions?

On Thu, Dec 6, 2018 at 1:24 AM Yongjun Zhang <yz...@cloudera.com.invalid>
wrote:

> Great work guys.
>
> Wonder if we can elaborate what's impact of not having #2 fixed, and why #2
> is not needed for the feature to complete?
> 2. Need to fix automatic failover with ZKFC. Currently it does not doesn't
> know about ObserverNodes trying to convert them to SBNs.
>
> Thanks.
> --Yongjun
>
>
> On Wed, Dec 5, 2018 at 5:27 PM Konstantin Shvachko <sh...@gmail.com>
> wrote:
>
> > Hi Hadoop developers,
> >
> > I would like to propose to merge to trunk the feature branch HDFS-12943
> for
> > Consistent Reads from Standby Node. The feature is intended to scale read
> > RPC workloads. On large clusters reads comprise 95% of all RPCs to the
> > NameNode. We should be able to accommodate higher overall RPC workloads
> (up
> > to 4x by some estimates) by adding multiple ObserverNodes.
> >
> > The main functionality has been implemented see sub-tasks of HDFS-12943.
> > We followed up with the test plan. Testing was done on two independent
> > clusters (see HDFS-14058 and HDFS-14059) with security enabled.
> > We ran standard HDFS commands, MR jobs, admin commands including manual
> > failover.
> > We know of one cluster running this feature in production.
> >
> > There are a few outstanding issues:
> > 1. Need to provide proper documentation - a user guide for the new
> feature
> > 2. Need to fix automatic failover with ZKFC. Currently it does not
> doesn't
> > know about ObserverNodes trying to convert them to SBNs.
> > 3. Scale testing and performance fine-tuning
> > 4. As testing progresses, we continue fixing non-critical bugs like
> > HDFS-14116.
> >
> > I attached a unified patch to the umbrella jira for the review and
> Jenkins
> > build.
> > Please vote on this thread. The vote will run for 7 days until Wed Dec
> 12.
> >
> > Thanks,
> > --Konstantin
> >
>


-- 

Daryn

Re: [VOTE] Merge HDFS-12943 branch to trunk - Consistent Reads from Standby

Posted by Daryn Sharp <da...@oath.com.INVALID>.

-1 pending additional info.  After a cursory scan, I have serious concerns
regarding the design.  This seems like a feature that should have been
purely implemented in hdfs w/o touching the common IPC layer.

The biggest issue in the alignment context.  It's purpose appears to be for
allowing handlers to reinsert calls back into the call queue.  That's
completely unacceptable.  A buggy or malicious client can easily cause
livelock in the IPC layer with handlers only looping on calls that never
satisfy the condition.  Why is this not implemented via RetriableExceptions?

On Thu, Dec 6, 2018 at 1:24 AM Yongjun Zhang <yz...@cloudera.com.invalid>
wrote:

> Great work guys.
>
> Wonder if we can elaborate what's impact of not having #2 fixed, and why #2
> is not needed for the feature to complete?
> 2. Need to fix automatic failover with ZKFC. Currently it does not doesn't
> know about ObserverNodes trying to convert them to SBNs.
>
> Thanks.
> --Yongjun
>
>
> On Wed, Dec 5, 2018 at 5:27 PM Konstantin Shvachko <sh...@gmail.com>
> wrote:
>
> > Hi Hadoop developers,
> >
> > I would like to propose to merge to trunk the feature branch HDFS-12943
> for
> > Consistent Reads from Standby Node. The feature is intended to scale read
> > RPC workloads. On large clusters reads comprise 95% of all RPCs to the
> > NameNode. We should be able to accommodate higher overall RPC workloads
> (up
> > to 4x by some estimates) by adding multiple ObserverNodes.
> >
> > The main functionality has been implemented see sub-tasks of HDFS-12943.
> > We followed up with the test plan. Testing was done on two independent
> > clusters (see HDFS-14058 and HDFS-14059) with security enabled.
> > We ran standard HDFS commands, MR jobs, admin commands including manual
> > failover.
> > We know of one cluster running this feature in production.
> >
> > There are a few outstanding issues:
> > 1. Need to provide proper documentation - a user guide for the new
> feature
> > 2. Need to fix automatic failover with ZKFC. Currently it does not
> doesn't
> > know about ObserverNodes trying to convert them to SBNs.
> > 3. Scale testing and performance fine-tuning
> > 4. As testing progresses, we continue fixing non-critical bugs like
> > HDFS-14116.
> >
> > I attached a unified patch to the umbrella jira for the review and
> Jenkins
> > build.
> > Please vote on this thread. The vote will run for 7 days until Wed Dec
> 12.
> >
> > Thanks,
> > --Konstantin
> >
>


-- 

Daryn

Re: [VOTE] Merge HDFS-12943 branch to trunk - Consistent Reads from Standby

Posted by Konstantin Shvachko <sh...@gmail.com>.

Hi Yongjun,

Automatic failover sure needs to be fixed (see HDFS-14130 and HDFS-13182).
Along with all other outstanding issues. We plan to continue this on trunk.
The feature is usable now without this issues (see HDFS-14067).
And we would like to get it in, so that people could have early access,
and so that newly developed features were aware of this functionality.
Let us know if you have other suggestions.

Thanks,
--Konstantin

On Wed, Dec 5, 2018 at 11:24 PM Yongjun Zhang <yz...@cloudera.com> wrote:

> Great work guys.
>
> Wonder if we can elaborate what's impact of not having #2 fixed, and why
> #2 is not needed for the feature to complete?
> 2. Need to fix automatic failover with ZKFC. Currently it does not doesn't
> know about ObserverNodes trying to convert them to SBNs.
>
> Thanks.
> --Yongjun
>
>
> On Wed, Dec 5, 2018 at 5:27 PM Konstantin Shvachko <sh...@gmail.com>
> wrote:
>
>> Hi Hadoop developers,
>>
>> I would like to propose to merge to trunk the feature branch HDFS-12943
>> for
>> Consistent Reads from Standby Node. The feature is intended to scale read
>> RPC workloads. On large clusters reads comprise 95% of all RPCs to the
>> NameNode. We should be able to accommodate higher overall RPC workloads
>> (up
>> to 4x by some estimates) by adding multiple ObserverNodes.
>>
>> The main functionality has been implemented see sub-tasks of HDFS-12943.
>> We followed up with the test plan. Testing was done on two independent
>> clusters (see HDFS-14058 and HDFS-14059) with security enabled.
>> We ran standard HDFS commands, MR jobs, admin commands including manual
>> failover.
>> We know of one cluster running this feature in production.
>>
>> There are a few outstanding issues:
>> 1. Need to provide proper documentation - a user guide for the new feature
>> 2. Need to fix automatic failover with ZKFC. Currently it does not doesn't
>> know about ObserverNodes trying to convert them to SBNs.
>> 3. Scale testing and performance fine-tuning
>> 4. As testing progresses, we continue fixing non-critical bugs like
>> HDFS-14116.
>>
>> I attached a unified patch to the umbrella jira for the review and Jenkins
>> build.
>> Please vote on this thread. The vote will run for 7 days until Wed Dec 12.
>>
>> Thanks,
>> --Konstantin
>>
>

Re: [VOTE] Merge HDFS-12943 branch to trunk - Consistent Reads from Standby

Posted by Yongjun Zhang <yz...@cloudera.com.INVALID>.

Great work guys.

Wonder if we can elaborate what's impact of not having #2 fixed, and why #2
is not needed for the feature to complete?
2. Need to fix automatic failover with ZKFC. Currently it does not doesn't
know about ObserverNodes trying to convert them to SBNs.

Thanks.
--Yongjun


On Wed, Dec 5, 2018 at 5:27 PM Konstantin Shvachko <sh...@gmail.com>
wrote:

> Hi Hadoop developers,
>
> I would like to propose to merge to trunk the feature branch HDFS-12943 for
> Consistent Reads from Standby Node. The feature is intended to scale read
> RPC workloads. On large clusters reads comprise 95% of all RPCs to the
> NameNode. We should be able to accommodate higher overall RPC workloads (up
> to 4x by some estimates) by adding multiple ObserverNodes.
>
> The main functionality has been implemented see sub-tasks of HDFS-12943.
> We followed up with the test plan. Testing was done on two independent
> clusters (see HDFS-14058 and HDFS-14059) with security enabled.
> We ran standard HDFS commands, MR jobs, admin commands including manual
> failover.
> We know of one cluster running this feature in production.
>
> There are a few outstanding issues:
> 1. Need to provide proper documentation - a user guide for the new feature
> 2. Need to fix automatic failover with ZKFC. Currently it does not doesn't
> know about ObserverNodes trying to convert them to SBNs.
> 3. Scale testing and performance fine-tuning
> 4. As testing progresses, we continue fixing non-critical bugs like
> HDFS-14116.
>
> I attached a unified patch to the umbrella jira for the review and Jenkins
> build.
> Please vote on this thread. The vote will run for 7 days until Wed Dec 12.
>
> Thanks,
> --Konstantin
>

Re: [VOTE] Merge HDFS-12943 branch to trunk - Consistent Reads from Standby

Posted by Yongjun Zhang <yz...@cloudera.com.INVALID>.

Great work guys.

Wonder if we can elaborate what's impact of not having #2 fixed, and why #2
is not needed for the feature to complete?
2. Need to fix automatic failover with ZKFC. Currently it does not doesn't
know about ObserverNodes trying to convert them to SBNs.

Thanks.
--Yongjun


On Wed, Dec 5, 2018 at 5:27 PM Konstantin Shvachko <sh...@gmail.com>
wrote:

> Hi Hadoop developers,
>
> I would like to propose to merge to trunk the feature branch HDFS-12943 for
> Consistent Reads from Standby Node. The feature is intended to scale read
> RPC workloads. On large clusters reads comprise 95% of all RPCs to the
> NameNode. We should be able to accommodate higher overall RPC workloads (up
> to 4x by some estimates) by adding multiple ObserverNodes.
>
> The main functionality has been implemented see sub-tasks of HDFS-12943.
> We followed up with the test plan. Testing was done on two independent
> clusters (see HDFS-14058 and HDFS-14059) with security enabled.
> We ran standard HDFS commands, MR jobs, admin commands including manual
> failover.
> We know of one cluster running this feature in production.
>
> There are a few outstanding issues:
> 1. Need to provide proper documentation - a user guide for the new feature
> 2. Need to fix automatic failover with ZKFC. Currently it does not doesn't
> know about ObserverNodes trying to convert them to SBNs.
> 3. Scale testing and performance fine-tuning
> 4. As testing progresses, we continue fixing non-critical bugs like
> HDFS-14116.
>
> I attached a unified patch to the umbrella jira for the review and Jenkins
> build.
> Please vote on this thread. The vote will run for 7 days until Wed Dec 12.
>
> Thanks,
> --Konstantin
>

Re: [VOTE] Merge HDFS-12943 branch to trunk - Consistent Reads from Standby

Posted by Yongjun Zhang <yz...@cloudera.com.INVALID>.

Great work guys.

Wonder if we can elaborate what's impact of not having #2 fixed, and why #2
is not needed for the feature to complete?
2. Need to fix automatic failover with ZKFC. Currently it does not doesn't
know about ObserverNodes trying to convert them to SBNs.

Thanks.
--Yongjun


On Wed, Dec 5, 2018 at 5:27 PM Konstantin Shvachko <sh...@gmail.com>
wrote:

> Hi Hadoop developers,
>
> I would like to propose to merge to trunk the feature branch HDFS-12943 for
> Consistent Reads from Standby Node. The feature is intended to scale read
> RPC workloads. On large clusters reads comprise 95% of all RPCs to the
> NameNode. We should be able to accommodate higher overall RPC workloads (up
> to 4x by some estimates) by adding multiple ObserverNodes.
>
> The main functionality has been implemented see sub-tasks of HDFS-12943.
> We followed up with the test plan. Testing was done on two independent
> clusters (see HDFS-14058 and HDFS-14059) with security enabled.
> We ran standard HDFS commands, MR jobs, admin commands including manual
> failover.
> We know of one cluster running this feature in production.
>
> There are a few outstanding issues:
> 1. Need to provide proper documentation - a user guide for the new feature
> 2. Need to fix automatic failover with ZKFC. Currently it does not doesn't
> know about ObserverNodes trying to convert them to SBNs.
> 3. Scale testing and performance fine-tuning
> 4. As testing progresses, we continue fixing non-critical bugs like
> HDFS-14116.
>
> I attached a unified patch to the umbrella jira for the review and Jenkins
> build.
> Please vote on this thread. The vote will run for 7 days until Wed Dec 12.
>
> Thanks,
> --Konstantin
>

Re: [VOTE] Merge HDFS-12943 branch to trunk - Consistent Reads from Standby

Posted by Zhe Zhang <zh...@apache.org>.

+1 (binding)

Thanks Konstantin for leading the merge effort!

I worked very closely with Chen, Konstantin, and Erik in the testing stage
and I feel confident that the feature has now completed designed
functionalities and has proven to be stable.

Great team work with contributors from multiple companies!

On Wed, Dec 5, 2018 at 5:27 PM Konstantin Shvachko <sh...@gmail.com>
wrote:

> Hi Hadoop developers,
>
> I would like to propose to merge to trunk the feature branch HDFS-12943 for
> Consistent Reads from Standby Node. The feature is intended to scale read
> RPC workloads. On large clusters reads comprise 95% of all RPCs to the
> NameNode. We should be able to accommodate higher overall RPC workloads (up
> to 4x by some estimates) by adding multiple ObserverNodes.
>
> The main functionality has been implemented see sub-tasks of HDFS-12943.
> We followed up with the test plan. Testing was done on two independent
> clusters (see HDFS-14058 and HDFS-14059) with security enabled.
> We ran standard HDFS commands, MR jobs, admin commands including manual
> failover.
> We know of one cluster running this feature in production.
>
> There are a few outstanding issues:
> 1. Need to provide proper documentation - a user guide for the new feature
> 2. Need to fix automatic failover with ZKFC. Currently it does not doesn't
> know about ObserverNodes trying to convert them to SBNs.
> 3. Scale testing and performance fine-tuning
> 4. As testing progresses, we continue fixing non-critical bugs like
> HDFS-14116.
>
> I attached a unified patch to the umbrella jira for the review and Jenkins
> build.
> Please vote on this thread. The vote will run for 7 days until Wed Dec 12.
>
> Thanks,
> --Konstantin
>
-- 
Zhe Zhang
Apache Hadoop Committer
http://zhe-thoughts.github.io/about/ | @oldcap

Re: [VOTE] Merge HDFS-12943 branch to trunk - Consistent Reads from Standby

Posted by Zhe Zhang <zh...@apache.org>.

+1 (binding)

Thanks Konstantin for leading the merge effort!

I worked very closely with Chen, Konstantin, and Erik in the testing stage
and I feel confident that the feature has now completed designed
functionalities and has proven to be stable.

Great team work with contributors from multiple companies!

On Wed, Dec 5, 2018 at 5:27 PM Konstantin Shvachko <sh...@gmail.com>
wrote:

> Hi Hadoop developers,
>
> I would like to propose to merge to trunk the feature branch HDFS-12943 for
> Consistent Reads from Standby Node. The feature is intended to scale read
> RPC workloads. On large clusters reads comprise 95% of all RPCs to the
> NameNode. We should be able to accommodate higher overall RPC workloads (up
> to 4x by some estimates) by adding multiple ObserverNodes.
>
> The main functionality has been implemented see sub-tasks of HDFS-12943.
> We followed up with the test plan. Testing was done on two independent
> clusters (see HDFS-14058 and HDFS-14059) with security enabled.
> We ran standard HDFS commands, MR jobs, admin commands including manual
> failover.
> We know of one cluster running this feature in production.
>
> There are a few outstanding issues:
> 1. Need to provide proper documentation - a user guide for the new feature
> 2. Need to fix automatic failover with ZKFC. Currently it does not doesn't
> know about ObserverNodes trying to convert them to SBNs.
> 3. Scale testing and performance fine-tuning
> 4. As testing progresses, we continue fixing non-critical bugs like
> HDFS-14116.
>
> I attached a unified patch to the umbrella jira for the review and Jenkins
> build.
> Please vote on this thread. The vote will run for 7 days until Wed Dec 12.
>
> Thanks,
> --Konstantin
>
-- 
Zhe Zhang
Apache Hadoop Committer
http://zhe-thoughts.github.io/about/ | @oldcap

Re: [VOTE] Merge HDFS-12943 branch to trunk - Consistent Reads from Standby

Posted by Yongjun Zhang <yz...@cloudera.com.INVALID>.

Great work guys.

Wonder if we can elaborate what's impact of not having #2 fixed, and why #2
is not needed for the feature to complete?
2. Need to fix automatic failover with ZKFC. Currently it does not doesn't
know about ObserverNodes trying to convert them to SBNs.

Thanks.
--Yongjun


On Wed, Dec 5, 2018 at 5:27 PM Konstantin Shvachko <sh...@gmail.com>
wrote:

> Hi Hadoop developers,
>
> I would like to propose to merge to trunk the feature branch HDFS-12943 for
> Consistent Reads from Standby Node. The feature is intended to scale read
> RPC workloads. On large clusters reads comprise 95% of all RPCs to the
> NameNode. We should be able to accommodate higher overall RPC workloads (up
> to 4x by some estimates) by adding multiple ObserverNodes.
>
> The main functionality has been implemented see sub-tasks of HDFS-12943.
> We followed up with the test plan. Testing was done on two independent
> clusters (see HDFS-14058 and HDFS-14059) with security enabled.
> We ran standard HDFS commands, MR jobs, admin commands including manual
> failover.
> We know of one cluster running this feature in production.
>
> There are a few outstanding issues:
> 1. Need to provide proper documentation - a user guide for the new feature
> 2. Need to fix automatic failover with ZKFC. Currently it does not doesn't
> know about ObserverNodes trying to convert them to SBNs.
> 3. Scale testing and performance fine-tuning
> 4. As testing progresses, we continue fixing non-critical bugs like
> HDFS-14116.
>
> I attached a unified patch to the umbrella jira for the review and Jenkins
> build.
> Please vote on this thread. The vote will run for 7 days until Wed Dec 12.
>
> Thanks,
> --Konstantin
>