You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@nuttx.apache.org by Petro Karashchenko <pe...@gmail.com> on 2022/07/20 12:16:39 UTC

[NuttX] CONFIG_NET_TCP_WRITE_BUFFERS=y deadlock issue

Hello team,

Recently I've been using NuttX on a SAMv7 based board with Ethernet. I've
tried to use the FTP server for remote access to the SD card. During
implementation I faced https://github.com/apache/incubator-nuttx/issues/5973
and did some further analysis.

When CONFIG_NET_TCP_WRITE_BUFFERS is enabled the TCP/IP stack tries to
allocate buffers from the IOB pool. There is a throttle mechanism applied
to send, but not recv part of TCP/IP.

The situation that happens is the following:
The host is sending files via TCP stream and the SDcard store speed is
slower than TCP send rate. At some point TCP zero window is reached and the
upload data rate is limited by device TCP recv -> SDcard store rate. At
some point of time the system runs out of IOBs (actually I have
CONFIG_IOB_THROTTLE=8, so 8 IOB buffers are still available) and the device
wants to send the FTP response message. At this point the deadlock happens.
The send can't allocate the buffer because all buffers are used for TCP
readhead to buffer incoming data that are not read because the FTP state
machine is blocked on send and can't reach the recv point (snake eating its
own tail). I was able to overcome this situation by limiting the RX side of
the socket with configuring CONFIG_NET_RECV_BUFSIZE. Also I have
"CONFIG_SYSLOG_BUFFER=y", so in some cases not only the network subsystem
hangs, but the rest of the system is affected because .

But I want to highlight the essential problems of the current TCP buffered
design:
1. The TCP buffered uses the same pool for RX and TX packets so excessive
usage of IOBs at RX side is affecting TX (and also is shared with other
parts of the system so other IOB users are affected).
2. The system can't recover even if the TCP connection is closed because
the IOB wait allocation process does not know anything about the socket
state and does not have a reliable way of waiter interruption to check if
connection is still valid or not.

I would like to get some thoughts about how to redesign the TCP buffered
mode to get more reliable operation.

Best regards,
Petro

Re: [NuttX] CONFIG_NET_TCP_WRITE_BUFFERS=y deadlock issue

Posted by Petro Karashchenko <pe...@gmail.com>.

Hello,

The FTP response actually ends up is send call on TCP socket, so those are
actually equivalent.

The problem is not ACK related, but more about that almost all TCP packet
(up to 65K in theory, but less on practice) can be buffered in readhead IOB
chain with all ACKs in place and waiting for recv to be called by
application to read those data and transfer to a file on SD card.

Best regards,
Petro

On Wed, Jul 20, 2022, 9:03 PM Fotis Panagiotopoulos <f....@gmail.com>
wrote:

> Hi Petro,
>
> > ... At
> > some point of time the system runs out of IOBs (actually I have
> > CONFIG_IOB_THROTTLE=8, so 8 IOB buffers are still available) and the
> device
> > wants to send the FTP response message.
>
> Do you refer to an actual FTP response here, or to a TCP level response?
> Is the problem caused by the TCP ACK's on the received packets?
>
>
> On Wed, Jul 20, 2022 at 3:16 PM Petro Karashchenko <
> petro.karashchenko@gmail.com> wrote:
>
> > Hello team,
> >
> > Recently I've been using NuttX on a SAMv7 based board with Ethernet. I've
> > tried to use the FTP server for remote access to the SD card. During
> > implementation I faced
> > https://github.com/apache/incubator-nuttx/issues/5973
> > and did some further analysis.
> >
> > When CONFIG_NET_TCP_WRITE_BUFFERS is enabled the TCP/IP stack tries to
> > allocate buffers from the IOB pool. There is a throttle mechanism applied
> > to send, but not recv part of TCP/IP.
> >
> > The situation that happens is the following:
> > The host is sending files via TCP stream and the SDcard store speed is
> > slower than TCP send rate. At some point TCP zero window is reached and
> the
> > upload data rate is limited by device TCP recv -> SDcard store rate. At
> > some point of time the system runs out of IOBs (actually I have
> > CONFIG_IOB_THROTTLE=8, so 8 IOB buffers are still available) and the
> device
> > wants to send the FTP response message. At this point the deadlock
> happens.
> > The send can't allocate the buffer because all buffers are used for TCP
> > readhead to buffer incoming data that are not read because the FTP state
> > machine is blocked on send and can't reach the recv point (snake eating
> its
> > own tail). I was able to overcome this situation by limiting the RX side
> of
> > the socket with configuring CONFIG_NET_RECV_BUFSIZE. Also I have
> > "CONFIG_SYSLOG_BUFFER=y", so in some cases not only the network subsystem
> > hangs, but the rest of the system is affected because .
> >
> > But I want to highlight the essential problems of the current TCP
> buffered
> > design:
> > 1. The TCP buffered uses the same pool for RX and TX packets so excessive
> > usage of IOBs at RX side is affecting TX (and also is shared with other
> > parts of the system so other IOB users are affected).
> > 2. The system can't recover even if the TCP connection is closed because
> > the IOB wait allocation process does not know anything about the socket
> > state and does not have a reliable way of waiter interruption to check if
> > connection is still valid or not.
> >
> > I would like to get some thoughts about how to redesign the TCP buffered
> > mode to get more reliable operation.
> >
> > Best regards,
> > Petro
> >
>

Re: [NuttX] CONFIG_NET_TCP_WRITE_BUFFERS=y deadlock issue

Posted by Fotis Panagiotopoulos <f....@gmail.com>.

Hi Petro,

> ... At
> some point of time the system runs out of IOBs (actually I have
> CONFIG_IOB_THROTTLE=8, so 8 IOB buffers are still available) and the
device
> wants to send the FTP response message.

Do you refer to an actual FTP response here, or to a TCP level response?
Is the problem caused by the TCP ACK's on the received packets?


On Wed, Jul 20, 2022 at 3:16 PM Petro Karashchenko <
petro.karashchenko@gmail.com> wrote:

> Hello team,
>
> Recently I've been using NuttX on a SAMv7 based board with Ethernet. I've
> tried to use the FTP server for remote access to the SD card. During
> implementation I faced
> https://github.com/apache/incubator-nuttx/issues/5973
> and did some further analysis.
>
> When CONFIG_NET_TCP_WRITE_BUFFERS is enabled the TCP/IP stack tries to
> allocate buffers from the IOB pool. There is a throttle mechanism applied
> to send, but not recv part of TCP/IP.
>
> The situation that happens is the following:
> The host is sending files via TCP stream and the SDcard store speed is
> slower than TCP send rate. At some point TCP zero window is reached and the
> upload data rate is limited by device TCP recv -> SDcard store rate. At
> some point of time the system runs out of IOBs (actually I have
> CONFIG_IOB_THROTTLE=8, so 8 IOB buffers are still available) and the device
> wants to send the FTP response message. At this point the deadlock happens.
> The send can't allocate the buffer because all buffers are used for TCP
> readhead to buffer incoming data that are not read because the FTP state
> machine is blocked on send and can't reach the recv point (snake eating its
> own tail). I was able to overcome this situation by limiting the RX side of
> the socket with configuring CONFIG_NET_RECV_BUFSIZE. Also I have
> "CONFIG_SYSLOG_BUFFER=y", so in some cases not only the network subsystem
> hangs, but the rest of the system is affected because .
>
> But I want to highlight the essential problems of the current TCP buffered
> design:
> 1. The TCP buffered uses the same pool for RX and TX packets so excessive
> usage of IOBs at RX side is affecting TX (and also is shared with other
> parts of the system so other IOB users are affected).
> 2. The system can't recover even if the TCP connection is closed because
> the IOB wait allocation process does not know anything about the socket
> state and does not have a reliable way of waiter interruption to check if
> connection is still valid or not.
>
> I would like to get some thoughts about how to redesign the TCP buffered
> mode to get more reliable operation.
>
> Best regards,
> Petro
>