You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@thrift.apache.org by "James E. King III (JIRA)" <ji...@apache.org> on 2018/07/05 11:45:00 UTC

[jira] [Updated] (THRIFT-4591) Incompatibility using non-blocking server and frame transport on C++ side?

     [ https://issues.apache.org/jira/browse/THRIFT-4591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James E. King III updated THRIFT-4591:
--------------------------------------
    Fix Version/s:     (was: 0.12.0)

> Incompatibility using non-blocking server and frame transport on C++ side?
> --------------------------------------------------------------------------
>
>                 Key: THRIFT-4591
>                 URL: https://issues.apache.org/jira/browse/THRIFT-4591
>             Project: Thrift
>          Issue Type: Bug
>          Components: C++ - Library
>    Affects Versions: 0.11.0
>            Reporter: allen_lee
>            Assignee: James E. King III
>            Priority: Blocker
>         Attachments: 9090.pcap, 9090_1.pcap
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> (jking): C++ TFramedTransport reads the frame size then attempts to read the message.  If it only gets part of the message it returns the partial read, and the upper layer will not be able to decode the message, further read may be called again, when it will go and try to read a frame size again, but it could be in the middle of message payload the underlying transport hadn't yet received.  It's amazing to see this in code that's been around so long!
> Original Bug report:
> 1) realize thrift server with TNonblockingServer via c++;
> 2) realize thrift client via lua lib and choose frame transport.
> 3) call remote interface failed with "TTransportException:0: Default (unknown)" print, and the server show "TConnection::workSocket(): THRIFT_EAGAIN (unavailable resources)" error.
> 4)investigate this fault with tcpdump tool, attachment 9090.pcap show the frame msg doesnot contains frame size field, the rifht situation of attachment 9090_1.pcap show the frame msg contains 4 bytes (00 00 00 25) before protocol id field.
> 5) dig into the fault and tried to find root cause, then i found there is an fault in TFramedTransport:flush function in TFramedTransport.lua file. the original realization is:
> -----
> function TFramedTransport:flush()
>   if self.doWrite == false then
>     return self.trans:flush()
>   end
>   -- If the write fails we still want wBuf to be clear
>   local tmp = self.wBuf
>   self.wBuf = ''
>   local frame_len_buf = libluabpack.bpack("i", string.len(tmp))
>   self.trans:write(frame_len_buf)
>   self.trans:write(tmp)
>   self.trans:flush()
> end
> -----
> which send frame size file and reset msg content independently.
>  ----------------------
> (jking) Analysis of original report: it fixes the sender to send once, but it shouldn't matter if the size is sent separately from the payload.  It's the receiver where the root cause is, in this case the C++ library.  This issue may not be limited to the C++ implementation, but we need a test to insert a pause between sending a frame size and sending the payload and see what happens on all the implementations.
> We're not going to merge the lua client fix as it doubles the memory requirements to send, despite reducing the write() count from 2 to 1.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)