You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@thrift.apache.org by "Bryan Duxbury (Commented) (JIRA)" <ji...@apache.org> on 2011/11/07 16:50:51 UTC

[jira] [Commented] (THRIFT-1423) Binary transport break up by overlapping of TFrame headers in scribe.AsyncClient

    [ https://issues.apache.org/jira/browse/THRIFT-1423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13145570#comment-13145570 ] 

Bryan Duxbury commented on THRIFT-1423:
---------------------------------------

The AsyncClient is not designed to be used in a multithreaded manner. You need to keep a pool of AsyncClients (and more importantly, their underlying transports) if you want to make more than one call at a time.
                
> Binary transport break up by overlapping of TFrame headers in scribe.AsyncClient
> --------------------------------------------------------------------------------
>
>                 Key: THRIFT-1423
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1423
>             Project: Thrift
>          Issue Type: Bug
>          Components: Java - Library
>    Affects Versions: 0.5, 0.6, 0.6.1, 0.7
>         Environment: Oracle Java JDK 1.6.0_27-b07, Ubuntu Server, Amazon EC2 large.instance
> Scribe server on local host using libthrift 0.5, TNonBlockingServer, TFramedTransport
> Java client on localhost with libthrift 0.5, scribe.AsyncClient, 1 long-lived connection. Client is multithreaded and has persistent connection to scribe.
>            Reporter: Ivan Prisyazhniy
>              Labels: async, client, java, nio
>
> I have got strange behavior of async client in libthrift. Building up described configuration on ec2, i have got a lot of moments when thrift server says that it has broken transport. I thought it is scribe problem. But digging inside of tcpdump i have found one strange thing with thrift async client (scribe.AsyncClient).
> There is no doubt that async client support for multi-threaded access. Yep, it is implemented as single threaded fst. But it has concurrent incoming pending calls queue as you know.
> So, what i have found and that is the source of connections transports breaking off:
> 17:10:17.815425 IP (tos 0x0, ttl 64, id 38164, offset 0, flags [DF], proto TCP (6), length 56)
> localhost.39412 > localhost.1463: Flags [P.], cksum 0xfe2c (incorrect -> 0x1371), seq 1110:1114, ack 337, win 257, options [nop,nop,TS val 34123260 ecr 34122685], length 4
> 0x0000: 4500 0038 9514 4000 4006 a7a9 7f00 0001
> 0x0010: 7f00 0001 99f4 05b7 7417 1499 747b 6961
> 0x0020: 8018 0101 fe2c 0000 0101 080a 0208 adfc
> 0x0030: 0208 abbd 0000 003a
> 17:10:17.815534 IP (tos 0x0, ttl 64, id 38165, offset 0, flags [DF], proto TCP (6), length 56)
> localhost.39412 > localhost.1463: Flags [P.], cksum 0xfe2c (incorrect -> 0x134b), seq 1114:1118, ack 337, win 257, options [nop,nop,TS val 34123260 ecr 34122685], length 4
> 0x0000: 4500 0038 9515 4000 4006 a7a8 7f00 0001
> 0x0010: 7f00 0001 99f4 05b7 7417 149d 747b 6961
> 0x0020: 8018 0101 fe2c 0000 0101 080a 0208 adfc
> 0x0030: 0208 abbd 0000 005c
> 17:10:17.815648 IP (tos 0x0, ttl 64, id 60631, offset 0, flags [DF], proto TCP (6), length 52)
> localhost.1463 > localhost.39412: Flags [.], cksum 0xfe28 (incorrect -> 0x1171), seq 337, ack 1118, win 256, options [nop,nop,TS val 34123260 ecr 34123260], length 0
> 0x0000: 4500 0034 ecd7 4000 4006 4fea 7f00 0001
> 0x0010: 7f00 0001 05b7 99f4 747b 6961 7417 14a1
> 0x0020: 8010 0100 fe28 0000 0101 080a 0208 adfc
> 0x0030: 0208 adfc
> 17:10:17.815658 IP (tos 0x0, ttl 64, id 38166, offset 0, flags [DF], proto TCP (6), length 144)
> localhost.39412 > localhost.1463: Flags [P.], cksum 0xfe84 (incorrect -> 0xe7af), seq 1118:1210, ack 337, win 257, options [nop,nop,TS val 34123260 ecr 34123260], length 92
> 0x0000: 4500 0090 9516 4000 4006 a74f 7f00 0001
> 0x0010: 7f00 0001 99f4 05b7 7417 14a1 747b 6961
> 0x0020: 8018 0101 fe84 0000 0101 080a 0208 adfc
> 0x0030: 0208 adfc 8001 0001 0000 0003 4c6f 6700
> 0x0040: 0000 000f 0001 0c00 0000 010b 0001 0000
> 0x0050: 0005 6974 656d 730b 0002 0000 0030 2863
> 0x0060: 6c75 622e 7573 652c 2036 3031 322c 2030
> 0x0070: 2c20 302c 2032 3830 2c20 302c 2074 7275
> 0x0080: 652c 2031 3332 3035 3838 3631 3729 0000
> 17:10:17.815971 IP (tos 0x0, ttl 64, id 60632, offset 0, flags [DF], proto TCP (6), length 52)
> localhost.1463 > localhost.39412: Flags [R.], cksum 0xfe28 (incorrect -> 0x1111), seq 337, ack 1210, win 256, options [nop,nop,TS val 34123260 ecr 34123260], length 0
> 0x0000: 4500 0034 ecd8 4000 4006 4fe9 7f00 0001
> 0x0010: 7f00 0001 05b7 99f4 747b 6961 7417 14fd
> 0x0020: 8014 0100 fe28 0000 0101 080a 0208 adfc
> 0x0030: 0208 adfc
> There are two sequentional tframedtransport headers/packets in the dump. then ack from scribe. then psh of Log_call. it is clearly ok that scribe said it is broken transport caused by tframes are mixed. 1 packet is TFrame data size for message = 3a, 2 packet is TFrame data size = 5c. Third is ACK. Forth is Log_call length of 5c. Fifth packed is ACK with RST from scribe cause it found error in binary stream. There is no Log_call packet of length 3a between two first packets. Two sequentional tframe headers are in separate packets from it's data only because of TCP_NODELAY and separated channel.write(TFrameSizeBuffer) and channel.write(TFrameDataBuffer).
> Why there are separated states and buffers to write BODY_SIZE and BODY_DATA in the client (TAsyncMethodCall)? Why there are separated states for READS? The only thing the BODY_SIZE is for, is emulation of TFramedTransport, but why it writes it's header asynchronously and separatelly from BODY_DATA? Sync version of client writes TFrameHeader and it's data in one single TCP packet, not as async client. Any way, this follows transport to mess it's data.
> Am i right this is bug of implementation of tframed support in async client?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira