You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hadoop.apache.org by Vinayakumar B <vi...@apache.org> on 2022/03/02 20:49:27 UTC

Re: [DISCUSS] Race condition in ProtobufRpcEngine2

Call.toString() was used only while sending the response, in the response
processor thread. At this point, header would be already read and
processed. So no chance of race conditions.

This race condition was not introduced by the HADOOP-17046
<https://issues.apache.org/jira/browse/HADOOP-17046>, instead existing code
was never intended to be used for logging purposes by multiple threads.
Though the locking mechanism proposed in HADOOP-18143
<https://issues.apache.org/jira/browse/HADOOP-18143> solves the problems,
it adds overhead in both memory as well as performance.

This is a critical path for RPC processing, and adding a
locking/synchronization in this place would affect the overall performance
(may not be huge).

I would advise to remove logging and locking both altogether and leave the
code as is in Reader thread.

If logging is required, then add it in the response processor thread or the
handler threads.

-Vinay


On Mon, Feb 28, 2022 at 10:02 PM Ayush Saxena <ay...@gmail.com> wrote:

> Hey Andras,
> I had a quick look at HADOOP-18143, the methods in question in
> ProtobufRpcEngine2 are identical to the ones in ProtobufRpcEngine. So, I am
> not very sure how the  race condition doesn't happen in  ProtobufRpcEngine.
> I have to debug and spend some more time, considering that I have
> reverted HADOOP-18082 for now to unblock YARN. Though the issue would still
> be there as you said, but will give us some time to analyse.
>
> Thanks
> -Ayush
>
> On Mon, 28 Feb 2022 at 21:26, Gyori Andras <ga...@cloudera.com.invalid>
> wrote:
>
>> Hey everyone!
>>
>> We have started seeing test failures in YARN PRs for a while. We have
>> identified the problematic commit, which is HADOOP-18082
>> <https://issues.apache.org/jira/browse/HADOOP-18082>, however, this
>> change
>> just revealed the race condition lying in ProtobufRpcEngine2 introduced in
>> HADOOP-17046 <https://issues.apache.org/jira/browse/HADOOP-17046>. We
>> have
>> also fixed the underlying issue via a locking mechanism, presented in
>> HADOOP-18143 <https://issues.apache.org/jira/browse/HADOOP-18143>, but
>> since it is out of our area of expertise, we can neither verify nor
>> guarantee that it will not cause some subtle issues in the RPC system.
>> As we think it is a core part of Hadoop, we would use feedback from
>> someone
>> who is proficient in this part.
>>
>> Regards:
>> Andras
>>
>