You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nuttx.apache.org by Oleg <ev...@gmail.com> on 2022/08/04 08:07:45 UTC

Potential UDP packets loss, how to debug?

Hi all,

I'm working with a custom stm32f767 based board and px4 project fork with
nuttx-10.0.0+ and uploading data (~900KB) to the board using mavlink_ftp.

If mavlink is over serial port everything is fine and long transfer going
uninterruptedly, but if mavlink is over UDP link the ftp file transfer
sometimes stumbles: PC doesn't receive acknowledgment, wait for timeout,
retry request and continue to transfer normally some time until next
stumble.

According to mavlink_ftp debug, at that moment the board doesn't receive
the last ftp mavlink package, so it doesn't send ack. I've checked with
wireshark, a PC definitely sends a UDP packet out, but the board doesn't
receive this mavlink packet.
I can reproduce this issue easily, but didn't notice any regularity in the
size of data between losses.

When I've added DEBUG_FEATURES, DEBUG_ERROR, DEBUG_NET, DEBUG_NET_ERROR to
config I can't reproduce the issue, no loss in UDP transfer.

Mavlink FTP upload is quite simple: on each request there is the
acknowledgement, no burst sending, so, as I see it, the issue should not be
related to buffer overflow.

Any thoughts on how to debug this or maybe someone remember potentially
related fixes in UDP/Ethernet in recent NuttX? Thanks in advance for help!

---
With best regards, Oleg.

Re: Potential UDP packets loss, how to debug?

Posted by Alan Carvalho de Assis <ac...@gmail.com>.
Hi Oleg,

I suggest you to increase this to at least 1500 bytes. I don't
remember all the configs, but I think inside boards/ you will find
some examples.

BR,

Alan

On 8/10/22, Oleg <ev...@gmail.com> wrote:
> Hi all again,
>
> Thanks all for feedbacks. I got back from COVID and can continue to debug
> this issue.
>
> Alan, could you please clarify what things you suggested to tune? I don't
> see many options, especially about net buffers.
> NET_ETH_PKTSIZE is the default one without IPv6: 590. Length of mavlink FTP
> data packets according to wireshark are 308 bytes long.
>
> Petro, I have tried with and without CONFIG_NET_UDP_WRITE_BUFFERS, it
> didn't help.
> Also as I see this config option is related to ongoing UDP/IP packets that
> are from board to PC, but in my case the udp packets from PC to board get
> lost.
>
> Gregory, yes, I fully understand packet loss is normal UDP behavior and
> mavlink FTP in px4 is indeed designed to handle packet loss, but I agree
> with Petro that the  probability of  losing a UDP packet in the wire is
> pretty low and I'm quite sure that this is not the case in my current
> simplest environment.
>
> I've enabled NET_STATISTICS and NETDEV_STATISTICS (thanks for advice!) and
> not on the first try but reproduce the issue. Packets get lost due to a
> wrong checksum. I set a breakpoint on increasing drop statistics and found
> out that part of data in such packet is not the same as sent: there are 32
> bytes wrong bytes and I found them in the same place but in the received
> packet ten times ago. And this ten count is repeatable, only the offset of
> 32 byte could be different in different tests (but also multiple of 32:
> 0x60, 0x80 etc.)
> [image: изображение.png]
>
>
> So it looks like the problem is related to dcache. This issue reminds me
> very much of the previous random FAT files system failures I'm asked about
> some time ago. That time they were fixed in
> https://github.com/apache/incubator-nuttx/pull/4872 (and also probably
> related https://github.com/apache/incubator-nuttx/pull/5062).
>
> I'm not very familiar with dcache, but will try to investigate further.
> Will be much appreciated for suggestions on how to debug dcache.
>
> ---
> With best regards, Oleg.
>
>
> чт, 4 авг. 2022 г. в 16:46, Gregory Nutt <sp...@gmail.com>:
>
>> On 8/4/2022 7:40 AM, Petro Karashchenko wrote:
>> > Hi,
>> >
>> > Yes, the packet lost on UDP is a normal situation, but... If the board
>> > is
>> > directly connected to the PC, then from my experience the probability
>> > to
>> > lose a UDP packet in the wire is pretty low. So only loss may be done
>> > in
>> > the stack and statistics should identify this easily.
>> >
>> > Best regards,
>> > Petro
>> It is good to debug in both environments.  The simplest environment is
>> best for initial testing, but an application must also be tested in a
>> full up busy network environment and must survive significant packet
>> loss.  Packet loss should effect only efficiency and should not bring
>> the application down.
>>
>

Re: Potential UDP packets loss, how to debug?

Posted by Oleg <ev...@gmail.com>.
Petro, great thanks for the fix! It helps in my case.

чт, 11 авг. 2022 г. в 02:07, Petro Karashchenko <
petro.karashchenko@gmail.com>:

> Hi,
>
> Please try https://github.com/apache/incubator-nuttx/pull/6831
> Unfortunately I'm not able to test it sinse I do not have a sufficient
> board
>
> Best regards,
> Petro
>
> ср, 10 серп. 2022 р. о 20:03 Oleg <ev...@gmail.com> пише:
>
>> Hi all again,
>>
>> Thanks all for feedbacks. I got back from COVID and can continue to debug
>> this issue.
>>
>> Alan, could you please clarify what things you suggested to tune? I don't
>> see many options, especially about net buffers.
>> NET_ETH_PKTSIZE is the default one without IPv6: 590. Length of mavlink
>> FTP data packets according to wireshark are 308 bytes long.
>>
>> Petro, I have tried with and without CONFIG_NET_UDP_WRITE_BUFFERS, it
>> didn't help.
>> Also as I see this config option is related to ongoing UDP/IP packets
>> that are from board to PC, but in my case the udp packets from PC to board
>> get lost.
>>
>> Gregory, yes, I fully understand packet loss is normal UDP behavior and
>> mavlink FTP in px4 is indeed designed to handle packet loss, but I agree
>> with Petro that the  probability of  losing a UDP packet in the wire is
>> pretty low and I'm quite sure that this is not the case in my current
>> simplest environment.
>>
>> I've enabled NET_STATISTICS and NETDEV_STATISTICS (thanks for advice!)
>> and not on the first try but reproduce the issue. Packets get lost due to a
>> wrong checksum. I set a breakpoint on increasing drop statistics and found
>> out that part of data in such packet is not the same as sent: there are 32
>> bytes wrong bytes and I found them in the same place but in the received
>> packet ten times ago. And this ten count is repeatable, only the offset of
>> 32 byte could be different in different tests (but also multiple of 32:
>> 0x60, 0x80 etc.)
>> [image: изображение.png]
>>
>>
>> So it looks like the problem is related to dcache. This issue reminds me
>> very much of the previous random FAT files system failures I'm asked about
>> some time ago. That time they were fixed in
>> https://github.com/apache/incubator-nuttx/pull/4872 (and also probably
>> related https://github.com/apache/incubator-nuttx/pull/5062).
>>
>> I'm not very familiar with dcache, but will try to investigate further.
>> Will be much appreciated for suggestions on how to debug dcache.
>>
>> ---
>> With best regards, Oleg.
>>
>>
>> чт, 4 авг. 2022 г. в 16:46, Gregory Nutt <sp...@gmail.com>:
>>
>>> On 8/4/2022 7:40 AM, Petro Karashchenko wrote:
>>> > Hi,
>>> >
>>> > Yes, the packet lost on UDP is a normal situation, but... If the board
>>> is
>>> > directly connected to the PC, then from my experience the probability
>>> to
>>> > lose a UDP packet in the wire is pretty low. So only loss may be done
>>> in
>>> > the stack and statistics should identify this easily.
>>> >
>>> > Best regards,
>>> > Petro
>>> It is good to debug in both environments.  The simplest environment is
>>> best for initial testing, but an application must also be tested in a
>>> full up busy network environment and must survive significant packet
>>> loss.  Packet loss should effect only efficiency and should not bring
>>> the application down.
>>>
>>

Re: Potential UDP packets loss, how to debug?

Posted by Petro Karashchenko <pe...@gmail.com>.
Hi,

Please try https://github.com/apache/incubator-nuttx/pull/6831
Unfortunately I'm not able to test it sinse I do not have a sufficient board

Best regards,
Petro

ср, 10 серп. 2022 р. о 20:03 Oleg <ev...@gmail.com> пише:

> Hi all again,
>
> Thanks all for feedbacks. I got back from COVID and can continue to debug
> this issue.
>
> Alan, could you please clarify what things you suggested to tune? I don't
> see many options, especially about net buffers.
> NET_ETH_PKTSIZE is the default one without IPv6: 590. Length of mavlink
> FTP data packets according to wireshark are 308 bytes long.
>
> Petro, I have tried with and without CONFIG_NET_UDP_WRITE_BUFFERS, it
> didn't help.
> Also as I see this config option is related to ongoing UDP/IP packets that
> are from board to PC, but in my case the udp packets from PC to board get
> lost.
>
> Gregory, yes, I fully understand packet loss is normal UDP behavior and
> mavlink FTP in px4 is indeed designed to handle packet loss, but I agree
> with Petro that the  probability of  losing a UDP packet in the wire is
> pretty low and I'm quite sure that this is not the case in my current
> simplest environment.
>
> I've enabled NET_STATISTICS and NETDEV_STATISTICS (thanks for advice!) and
> not on the first try but reproduce the issue. Packets get lost due to a
> wrong checksum. I set a breakpoint on increasing drop statistics and found
> out that part of data in such packet is not the same as sent: there are 32
> bytes wrong bytes and I found them in the same place but in the received
> packet ten times ago. And this ten count is repeatable, only the offset of
> 32 byte could be different in different tests (but also multiple of 32:
> 0x60, 0x80 etc.)
> [image: изображение.png]
>
>
> So it looks like the problem is related to dcache. This issue reminds me
> very much of the previous random FAT files system failures I'm asked about
> some time ago. That time they were fixed in
> https://github.com/apache/incubator-nuttx/pull/4872 (and also probably
> related https://github.com/apache/incubator-nuttx/pull/5062).
>
> I'm not very familiar with dcache, but will try to investigate further.
> Will be much appreciated for suggestions on how to debug dcache.
>
> ---
> With best regards, Oleg.
>
>
> чт, 4 авг. 2022 г. в 16:46, Gregory Nutt <sp...@gmail.com>:
>
>> On 8/4/2022 7:40 AM, Petro Karashchenko wrote:
>> > Hi,
>> >
>> > Yes, the packet lost on UDP is a normal situation, but... If the board
>> is
>> > directly connected to the PC, then from my experience the probability to
>> > lose a UDP packet in the wire is pretty low. So only loss may be done in
>> > the stack and statistics should identify this easily.
>> >
>> > Best regards,
>> > Petro
>> It is good to debug in both environments.  The simplest environment is
>> best for initial testing, but an application must also be tested in a
>> full up busy network environment and must survive significant packet
>> loss.  Packet loss should effect only efficiency and should not bring
>> the application down.
>>
>

Re: Potential UDP packets loss, how to debug?

Posted by Oleg <ev...@gmail.com>.
Hi all again,

Thanks all for feedbacks. I got back from COVID and can continue to debug
this issue.

Alan, could you please clarify what things you suggested to tune? I don't
see many options, especially about net buffers.
NET_ETH_PKTSIZE is the default one without IPv6: 590. Length of mavlink FTP
data packets according to wireshark are 308 bytes long.

Petro, I have tried with and without CONFIG_NET_UDP_WRITE_BUFFERS, it
didn't help.
Also as I see this config option is related to ongoing UDP/IP packets that
are from board to PC, but in my case the udp packets from PC to board get
lost.

Gregory, yes, I fully understand packet loss is normal UDP behavior and
mavlink FTP in px4 is indeed designed to handle packet loss, but I agree
with Petro that the  probability of  losing a UDP packet in the wire is
pretty low and I'm quite sure that this is not the case in my current
simplest environment.

I've enabled NET_STATISTICS and NETDEV_STATISTICS (thanks for advice!) and
not on the first try but reproduce the issue. Packets get lost due to a
wrong checksum. I set a breakpoint on increasing drop statistics and found
out that part of data in such packet is not the same as sent: there are 32
bytes wrong bytes and I found them in the same place but in the received
packet ten times ago. And this ten count is repeatable, only the offset of
32 byte could be different in different tests (but also multiple of 32:
0x60, 0x80 etc.)
[image: изображение.png]


So it looks like the problem is related to dcache. This issue reminds me
very much of the previous random FAT files system failures I'm asked about
some time ago. That time they were fixed in
https://github.com/apache/incubator-nuttx/pull/4872 (and also probably
related https://github.com/apache/incubator-nuttx/pull/5062).

I'm not very familiar with dcache, but will try to investigate further.
Will be much appreciated for suggestions on how to debug dcache.

---
With best regards, Oleg.


чт, 4 авг. 2022 г. в 16:46, Gregory Nutt <sp...@gmail.com>:

> On 8/4/2022 7:40 AM, Petro Karashchenko wrote:
> > Hi,
> >
> > Yes, the packet lost on UDP is a normal situation, but... If the board is
> > directly connected to the PC, then from my experience the probability to
> > lose a UDP packet in the wire is pretty low. So only loss may be done in
> > the stack and statistics should identify this easily.
> >
> > Best regards,
> > Petro
> It is good to debug in both environments.  The simplest environment is
> best for initial testing, but an application must also be tested in a
> full up busy network environment and must survive significant packet
> loss.  Packet loss should effect only efficiency and should not bring
> the application down.
>

Re: Potential UDP packets loss, how to debug?

Posted by Gregory Nutt <sp...@gmail.com>.
On 8/4/2022 7:40 AM, Petro Karashchenko wrote:
> Hi,
>
> Yes, the packet lost on UDP is a normal situation, but... If the board is
> directly connected to the PC, then from my experience the probability to
> lose a UDP packet in the wire is pretty low. So only loss may be done in
> the stack and statistics should identify this easily.
>
> Best regards,
> Petro
It is good to debug in both environments.  The simplest environment is 
best for initial testing, but an application must also be tested in a 
full up busy network environment and must survive significant packet 
loss.  Packet loss should effect only efficiency and should not bring 
the application down.

Re: Potential UDP packets loss, how to debug?

Posted by Petro Karashchenko <pe...@gmail.com>.
Hi,

Yes, the packet lost on UDP is a normal situation, but... If the board is
directly connected to the PC, then from my experience the probability to
lose a UDP packet in the wire is pretty low. So only loss may be done in
the stack and statistics should identify this easily.

Best regards,
Petro

чт, 4 серп. 2022 р. о 16:38 Gregory Nutt <sp...@gmail.com> пише:

> Packet loss is normal UDP behavior (unless it is excessive).  If you use
> UDP, you application design must handle occasional packet loss gracefully.
>
> Buffering issues is the most common cause of major packets losses.  That
> would make sense in you case because enabling DEBUG has the effect of
> slowing down packet traffic.
>
>

Re: Potential UDP packets loss, how to debug?

Posted by Gregory Nutt <sp...@gmail.com>.
Packet loss is normal UDP behavior (unless it is excessive).  If you use 
UDP, you application design must handle occasional packet loss gracefully.

Buffering issues is the most common cause of major packets losses.  That 
would make sense in you case because enabling DEBUG has the effect of 
slowing down packet traffic.


Re: Potential UDP packets loss, how to debug?

Posted by Gregory Nutt <sp...@gmail.com>.
Packet loss is normal UDP behavior (unless it is excessive).  If you use 
UDP, you application design must handle occasional packet loss gracefully.

In order to debug this further, you will need to enable network statistics.

349 config NET_STATISTICS
350         bool "Collect network statistics"
351         default n
352         ---help---
353                 Network layer statistics on or off

104 config NETDEV_STATISTICS
105         bool "Network device driver statistics"
106         depends on NET_STATISTICS && ARCH_HAVE_NETDEV_STATISTICS
107         ---help---
108                 Enable to collect statistics from the network 
drivers (if supported
109                 by the network driver).

You view the network statistcis using ifconfig under NSH.  This should 
pinpoint where the packet it lost in the software or driver.  It will 
not detect inherent network losses such as collisions.

On 8/4/2022 2:07 AM, Oleg wrote:
> Hi all,
>
> I'm working with a custom stm32f767 based board and px4 project fork with
> nuttx-10.0.0+ and uploading data (~900KB) to the board using mavlink_ftp.
>
> If mavlink is over serial port everything is fine and long transfer going
> uninterruptedly, but if mavlink is over UDP link the ftp file transfer
> sometimes stumbles: PC doesn't receive acknowledgment, wait for timeout,
> retry request and continue to transfer normally some time until next
> stumble.
>
> According to mavlink_ftp debug, at that moment the board doesn't receive
> the last ftp mavlink package, so it doesn't send ack. I've checked with
> wireshark, a PC definitely sends a UDP packet out, but the board doesn't
> receive this mavlink packet.
> I can reproduce this issue easily, but didn't notice any regularity in the
> size of data between losses.
>
> When I've added DEBUG_FEATURES, DEBUG_ERROR, DEBUG_NET, DEBUG_NET_ERROR to
> config I can't reproduce the issue, no loss in UDP transfer.
>
> Mavlink FTP upload is quite simple: on each request there is the
> acknowledgement, no burst sending, so, as I see it, the issue should not be
> related to buffer overflow.
>
> Any thoughts on how to debug this or maybe someone remember potentially
> related fixes in UDP/Ethernet in recent NuttX? Thanks in advance for help!
>
> ---
> With best regards, Oleg.
>


Re: Potential UDP packets loss, how to debug?

Posted by Petro Karashchenko <pe...@gmail.com>.
Hello Oleg,

Can you try to test with "CONFIG_NET_UDP_WRITE_BUFFERS=y" vs
"CONFIG_NET_UDP_WRITE_BUFFERS=n"? The transfer will definitely impact the
speed, but at least we can have a idea if buffer allocation impacts the
upload process or not.

Best regards,
Petro

чт, 4 серп. 2022 р. о 15:51 Alan Carvalho de Assis <ac...@gmail.com> пише:

> Hi Oleg,
>
> Did you try to tune the configs? Try to increase the buffers size, etc.
>
> BR,
>
> Alan
>
> On 8/4/22, Oleg <ev...@gmail.com> wrote:
> > Hi all,
> >
> > I'm working with a custom stm32f767 based board and px4 project fork with
> > nuttx-10.0.0+ and uploading data (~900KB) to the board using mavlink_ftp.
> >
> > If mavlink is over serial port everything is fine and long transfer going
> > uninterruptedly, but if mavlink is over UDP link the ftp file transfer
> > sometimes stumbles: PC doesn't receive acknowledgment, wait for timeout,
> > retry request and continue to transfer normally some time until next
> > stumble.
> >
> > According to mavlink_ftp debug, at that moment the board doesn't receive
> > the last ftp mavlink package, so it doesn't send ack. I've checked with
> > wireshark, a PC definitely sends a UDP packet out, but the board doesn't
> > receive this mavlink packet.
> > I can reproduce this issue easily, but didn't notice any regularity in
> the
> > size of data between losses.
> >
> > When I've added DEBUG_FEATURES, DEBUG_ERROR, DEBUG_NET, DEBUG_NET_ERROR
> to
> > config I can't reproduce the issue, no loss in UDP transfer.
> >
> > Mavlink FTP upload is quite simple: on each request there is the
> > acknowledgement, no burst sending, so, as I see it, the issue should not
> be
> > related to buffer overflow.
> >
> > Any thoughts on how to debug this or maybe someone remember potentially
> > related fixes in UDP/Ethernet in recent NuttX? Thanks in advance for
> help!
> >
> > ---
> > With best regards, Oleg.
> >
>

Re: Potential UDP packets loss, how to debug?

Posted by Alan Carvalho de Assis <ac...@gmail.com>.
Hi Oleg,

Did you try to tune the configs? Try to increase the buffers size, etc.

BR,

Alan

On 8/4/22, Oleg <ev...@gmail.com> wrote:
> Hi all,
>
> I'm working with a custom stm32f767 based board and px4 project fork with
> nuttx-10.0.0+ and uploading data (~900KB) to the board using mavlink_ftp.
>
> If mavlink is over serial port everything is fine and long transfer going
> uninterruptedly, but if mavlink is over UDP link the ftp file transfer
> sometimes stumbles: PC doesn't receive acknowledgment, wait for timeout,
> retry request and continue to transfer normally some time until next
> stumble.
>
> According to mavlink_ftp debug, at that moment the board doesn't receive
> the last ftp mavlink package, so it doesn't send ack. I've checked with
> wireshark, a PC definitely sends a UDP packet out, but the board doesn't
> receive this mavlink packet.
> I can reproduce this issue easily, but didn't notice any regularity in the
> size of data between losses.
>
> When I've added DEBUG_FEATURES, DEBUG_ERROR, DEBUG_NET, DEBUG_NET_ERROR to
> config I can't reproduce the issue, no loss in UDP transfer.
>
> Mavlink FTP upload is quite simple: on each request there is the
> acknowledgement, no burst sending, so, as I see it, the issue should not be
> related to buffer overflow.
>
> Any thoughts on how to debug this or maybe someone remember potentially
> related fixes in UDP/Ethernet in recent NuttX? Thanks in advance for help!
>
> ---
> With best regards, Oleg.
>