You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@thrift.apache.org by "Antoine Pitrou (Jira)" <ji...@apache.org> on 2021/09/16 15:31:00 UTC
[jira] [Updated] (THRIFT-5464) [C++] maxMessageSize possibly not
correctly observed in TBufferBase
[ https://issues.apache.org/jira/browse/THRIFT-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Antoine Pitrou updated THRIFT-5464:
-----------------------------------
Description:
First: apologies if this is a false alarm, since I'm going by my reading of the C++ library source code.
To try to understand whether the new MaxMessageSize setting is important for our (Apache Parquet) use case, I tried to go through the C++ library source code to understand how it's used exactly. (see the message I posted in THRIFT-5237)
My understanding is that there are two main facilities for checking against the max message size:
* {{TTransport::countConsumedMessageBytes(numBytes)}} raises if {{numBytes}} is greater than the remaining message size, otherwise decrements the remaining message size by {{numBytes}}
* {{TTransport::checkReadBytesAvailable(numBytes=}} also raises if {{numBytes}} is greater than the remaining message size, but _doesn't_ otherwise update the remaining message size
In {{TBufferBase::read}}, the internal buffer pointer is bumped by {{len}} bytes; _however_, {{checkReadBytesAvailable}} is called and not {{countConsumedMessageBytes}}. This means that multiple calls to {{TBufferBase::read}} will iterate through buffer memory but never update the remaining message size. In the end, the max message size limit is never upholded, except if a single read is larger than that size.
As a side note, a quick grep through the {{lib/cpp/test}} directory seems to suggest that the max message size limits are not tested anywhere, but that I may be mistaken.
was:
First: apologies if this is a false alarm, since I'm going by my reading of the C++ library source code.
To try to understand whether the new MaxMessageSize setting is important for our (Apache Parquet) use case, I tried to go through the C++ library source code to understand how it's used exactly. (see the message I posted in THRIFT-5237)
My understanding is that there are two main facilities for checking against the max message size:
* {{TTransport::countConsumedMessageBytes(numBytes)}} raises if {{numBytes}} is greater than the remaining message size, otherwise decrements the remaining message size by {{numBytes}}
* {{TTransport::checkReadBytesAvailable}} also raises if {{numBytes}} is greater than the remaining message size, but _doesn't_ otherwise update the remaining message size
> [C++] maxMessageSize possibly not correctly observed in TBufferBase
> -------------------------------------------------------------------
>
> Key: THRIFT-5464
> URL: https://issues.apache.org/jira/browse/THRIFT-5464
> Project: Thrift
> Issue Type: Bug
> Components: C++ - Library
> Affects Versions: 0.14.2
> Reporter: Antoine Pitrou
> Priority: Major
>
> First: apologies if this is a false alarm, since I'm going by my reading of the C++ library source code.
> To try to understand whether the new MaxMessageSize setting is important for our (Apache Parquet) use case, I tried to go through the C++ library source code to understand how it's used exactly. (see the message I posted in THRIFT-5237)
> My understanding is that there are two main facilities for checking against the max message size:
> * {{TTransport::countConsumedMessageBytes(numBytes)}} raises if {{numBytes}} is greater than the remaining message size, otherwise decrements the remaining message size by {{numBytes}}
> * {{TTransport::checkReadBytesAvailable(numBytes=}} also raises if {{numBytes}} is greater than the remaining message size, but _doesn't_ otherwise update the remaining message size
> In {{TBufferBase::read}}, the internal buffer pointer is bumped by {{len}} bytes; _however_, {{checkReadBytesAvailable}} is called and not {{countConsumedMessageBytes}}. This means that multiple calls to {{TBufferBase::read}} will iterate through buffer memory but never update the remaining message size. In the end, the max message size limit is never upholded, except if a single read is larger than that size.
> As a side note, a quick grep through the {{lib/cpp/test}} directory seems to suggest that the max message size limits are not tested anywhere, but that I may be mistaken.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)