You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nuttx.apache.org by GitBox <gi...@apache.org> on 2022/01/17 21:30:55 UTC

[GitHub] [incubator-nuttx] a-lunev opened a new pull request #5252: net/tcp/tcp_send*: reliably obtain the TCP connection pointer in TCP event handlers

a-lunev opened a new pull request #5252:
URL: https://github.com/apache/incubator-nuttx/pull/5252


   ## Summary
   
   Do not use pvconn argument to get the TCP connection pointer because pvconn is normally NULL for some events like NETDEV_DOWN. Instead, the TCP connection pointer can be reliably obtained from the corresponding TCP socket.
   
   This PR is related to PR #2257.
   
   ## Impact
   
   TCP
   
   ## Testing
   
   Build NuttX:
   ```
   $ ./tools/configure.sh -l sim:tcpblaster
   $ make menuconfig
   (enable CONFIG_DEBUG_ASSERTIONS,
   enable / disable CONFIG_NETUTILS_NETCAT_SENDFILE,
   enable / disable CONFIG_NET_TCP_WRITE_BUFFERS)
   $ make
   ```
   Enable TUN/TAP on Linux host:
   ```
   $ sudo setcap cap_net_admin+ep ./nuttx
   $ sudo ./tools/simhostroute.sh wlan0 on
   ```
   Run netcat server on Linux host:
   `$ netcat -l -p 31337`
   
   Run NuttX on Linux host:
   ```
   $ ./nuttx
   NuttShell (NSH) NuttX-10.2.0
   nsh> ifconfig eth0 10.0.1.2
   nsh> ifup eth0
   ifup eth0...OK
   nsh> dd if=/dev/zero of=/tmp/test.bin count=1000
   nsh> netcat LINUX_HOST_IP_ADDRESS 31337 /tmp/test.bin &
   nsh> ifdown eth0
   sendfile_eventhandler: WARNING: Lost connection
   ifdown eth0...OK
   nsh> sendfile error: Error 128
   sendfile error
   nsh>
   nsh> poweroff
   ```
   
   Disable TUN/TAP on Linux host:
   `$ sudo ./tools/simhostroute.sh wlan0 off`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] yamt commented on a change in pull request #5252: net/tcp/tcp_send*: reliably obtain the TCP connection pointer in TCP event handlers

Posted by GitBox <gi...@apache.org>.
yamt commented on a change in pull request #5252:
URL: https://github.com/apache/incubator-nuttx/pull/5252#discussion_r791448848



##########
File path: net/tcp/tcp_send_buffered.c
##########
@@ -355,10 +355,39 @@ static uint16_t psock_send_eventhandler(FAR struct net_driver_s *dev,
                                         FAR void *pvconn, FAR void *pvpriv,
                                         uint16_t flags)
 {
-  FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+  /* FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+   *
+   * Do not use pvconn argument to get the TCP connection pointer (the above
+   * commented line) because pvconn is normally NULL for some events like
+   * NETDEV_DOWN. Instead, the TCP connection pointer can be reliably
+   * obtained from the corresponding TCP socket.
+   */
+
   FAR struct socket *psock = (FAR struct socket *)pvpriv;
+  FAR struct tcp_conn_s *conn;
   bool rexmit = false;
 
+  DEBUGASSERT(psock != NULL);
+
+  /* Get the TCP connection pointer reliably from
+   * the corresponding TCP socket.
+   */
+
+  conn = psock->s_conn;
+  DEBUGASSERT(conn != NULL);
+
+  /* The TCP socket is connected and, hence, should be bound to a device.
+   * Make sure that the polling device is the one that we are bound to.
+   */
+
+  DEBUGASSERT(conn->dev != NULL);

Review comment:
       this crashed in our private ci.
   ```
   Program received signal SIGSEGV, Segmentation fault.
   0x000000000053ff81 in psock_send_eventhandler (dev=0x59ebe0 <g_sim_dev>, pvconn=
   0x59ef00 <g_tcp_connections>, pvpriv=0x7f93505f0c70, flags=16) at tcp/tcp_send_b
   uffered.c:383
   rax            0x10                16
   rbx            0x0                 0
   rcx            0x10                16
   rdx            0x7f93505f0c70      140270685326448
   rsi            0x59ef00            5893888
   rdi            0x59ebe0            5893088
   rbp            0x0                 0x0
   rsp            0x7f93505c8560      0x7f93505c8560
   r8             0x10                16
   r9             0x27d729d3b3fa6400  2870809276706350080
   r10            0x8                 8
   r11            0x246               582
   r12            0x0                 0
   r13            0x0                 0
   r14            0x0                 0
   r15            0x0                 0
   rip            0x53ff81            0x53ff81 <psock_send_eventhandler+193>
   eflags         0x10202             [ IF RF ]
   cs             0x33                51
   ss             0x2b                43
   ds             0x0                 0
   es             0x0                 0
   fs             0x0                 0
   gs             0x0                 0
   k0             0x0                 0
   k1             0x0                 0
   k2             0x0                 0
   k3             0x0                 0
   k4             0x0                 0
   k5             0x0                 0
   k6             0x0                 0
   k7             0x0                 0
   #0  0x000000000053ff81 in psock_send_eventhandler (dev=0x59ebe0 <g_sim_dev>, pvconn=0x59ef00 <g_tcp_connections>, pvpriv=0x7f93505f0c70, flags=16) at tcp/tcp_send_buffered.c:383
   #1  0x000000000043294f in devif_conn_event (dev=0x59ebe0 <g_sim_dev>, pvconn=0x59ef00 <g_tcp_connections>, flags=16, list=0x59ff00 <g_cbprealloc+672>) at devif/devif_callback.c:510
   #2  0x000000000043b9f1 in tcp_callback (dev=0x59ebe0 <g_sim_dev>, conn=0x59ef00 <g_tcp_connections>, flags=16) at tcp/tcp_callback.c:169
   #3  0x0000000000438c3f in tcp_timer (dev=0x59ebe0 <g_sim_dev>, conn=0x59ef00 <g_tcp_connections>, hsec=0) at tcp/tcp_timer.c:518
   #4  0x0000000000432224 in devif_poll_tcp_timer (dev=0x59ebe0 <g_sim_dev>, callback=0x4156b0 <netdriver_txpoll>, hsec=0) at devif/devif_poll.c:652
   #5  0x0000000000432136 in devif_timer (dev=0x59ebe0 <g_sim_dev>, delay=0, callback=0x4156b0 <netdriver_txpoll>) at devif/devif_poll.c:862
   #6  0x00000000004157ae in netdriver_txavail_work (arg=0x59ebe0 <g_sim_dev>) at sim/up_netdriver.c:313
   #7  0x0000000000404344 in work_thread (argc=2, argv=0x7f93505b7820) at wqueue/kwork_thread.c:174
   #8  0x0000000000404054 in nxtask_start () at task/task_start.c:125
   #9  0xaaaaaaaaaaaaaaaa in ?? ()
   #10 0xaaaaaaaaaaaaaaaa in ?? ()
   #11 0x8001104000000140 in ?? ()
   #12 0x00007f93505dbaf0 in ?? ()
   #13 0x0000000000000000 in ?? ()
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] yamt commented on a change in pull request #5252: net/tcp/tcp_send*: reliably obtain the TCP connection pointer in TCP event handlers

Posted by GitBox <gi...@apache.org>.
yamt commented on a change in pull request #5252:
URL: https://github.com/apache/incubator-nuttx/pull/5252#discussion_r794141039



##########
File path: net/tcp/tcp_send_buffered.c
##########
@@ -355,10 +355,39 @@ static uint16_t psock_send_eventhandler(FAR struct net_driver_s *dev,
                                         FAR void *pvconn, FAR void *pvpriv,
                                         uint16_t flags)
 {
-  FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+  /* FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+   *
+   * Do not use pvconn argument to get the TCP connection pointer (the above
+   * commented line) because pvconn is normally NULL for some events like
+   * NETDEV_DOWN. Instead, the TCP connection pointer can be reliably
+   * obtained from the corresponding TCP socket.
+   */
+
   FAR struct socket *psock = (FAR struct socket *)pvpriv;
+  FAR struct tcp_conn_s *conn;
   bool rexmit = false;
 
+  DEBUGASSERT(psock != NULL);
+
+  /* Get the TCP connection pointer reliably from
+   * the corresponding TCP socket.
+   */
+
+  conn = psock->s_conn;
+  DEBUGASSERT(conn != NULL);
+
+  /* The TCP socket is connected and, hence, should be bound to a device.
+   * Make sure that the polling device is the one that we are bound to.
+   */
+
+  DEBUGASSERT(conn->dev != NULL);

Review comment:
       i plan to enable CONFIG_DEBUG_NET_ERROR and CONFIG_DEBUG_NET_WARN.
   
   i don't think i will enable CONFIG_DEBUG_NET_INFO. (i tried and it seemed too much)
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] xiaoxiang781216 commented on a change in pull request #5252: net/tcp/tcp_send*: reliably obtain the TCP connection pointer in TCP event handlers

Posted by GitBox <gi...@apache.org>.
xiaoxiang781216 commented on a change in pull request #5252:
URL: https://github.com/apache/incubator-nuttx/pull/5252#discussion_r786390907



##########
File path: net/tcp/tcp_send_buffered.c
##########
@@ -355,10 +355,38 @@ static uint16_t psock_send_eventhandler(FAR struct net_driver_s *dev,
                                         FAR void *pvconn, FAR void *pvpriv,
                                         uint16_t flags)
 {
-  FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+  /* FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+   *
+   * Do not use pvconn argument to get the TCP connection pointer (the above
+   * commented line) because pvconn is normally NULL for some events like
+   * NETDEV_DOWN. Instead, the TCP connection pointer can be reliably
+   * obtained from the corresponding TCP socket.
+   */
+
   FAR struct socket *psock = (FAR struct socket *)pvpriv;
+  DEBUGASSERT(psock != NULL);

Review comment:
       need move DEBUGASSERT after all local variable declaration to conform C89. Please fix other place too.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] a-lunev commented on a change in pull request #5252: net/tcp/tcp_send*: reliably obtain the TCP connection pointer in TCP event handlers

Posted by GitBox <gi...@apache.org>.
a-lunev commented on a change in pull request #5252:
URL: https://github.com/apache/incubator-nuttx/pull/5252#discussion_r794077739



##########
File path: net/tcp/tcp_send_buffered.c
##########
@@ -355,10 +355,39 @@ static uint16_t psock_send_eventhandler(FAR struct net_driver_s *dev,
                                         FAR void *pvconn, FAR void *pvpriv,
                                         uint16_t flags)
 {
-  FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+  /* FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+   *
+   * Do not use pvconn argument to get the TCP connection pointer (the above
+   * commented line) because pvconn is normally NULL for some events like
+   * NETDEV_DOWN. Instead, the TCP connection pointer can be reliably
+   * obtained from the corresponding TCP socket.
+   */
+
   FAR struct socket *psock = (FAR struct socket *)pvpriv;
+  FAR struct tcp_conn_s *conn;
   bool rexmit = false;
 
+  DEBUGASSERT(psock != NULL);
+
+  /* Get the TCP connection pointer reliably from
+   * the corresponding TCP socket.
+   */
+
+  conn = psock->s_conn;
+  DEBUGASSERT(conn != NULL);
+
+  /* The TCP socket is connected and, hence, should be bound to a device.
+   * Make sure that the polling device is the one that we are bound to.
+   */
+
+  DEBUGASSERT(conn->dev != NULL);

Review comment:
       Yes, I would like to use ninfo(). I forgot to ask about CONFIG_DEBUG_NET_INFO, however now I see it is disabled.
   Please let me know can you enable CONFIG_DEBUG_NET_INFO in your config? Alternatively, I could use e.g. nerr() instead of ninfo(), and put nerr() inside of if () condition. Then CONFIG_DEBUG_NET_ERROR needs to be enabled.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] yamt commented on a change in pull request #5252: net/tcp/tcp_send*: reliably obtain the TCP connection pointer in TCP event handlers

Posted by GitBox <gi...@apache.org>.
yamt commented on a change in pull request #5252:
URL: https://github.com/apache/incubator-nuttx/pull/5252#discussion_r794054289



##########
File path: net/tcp/tcp_send_buffered.c
##########
@@ -355,10 +355,39 @@ static uint16_t psock_send_eventhandler(FAR struct net_driver_s *dev,
                                         FAR void *pvconn, FAR void *pvpriv,
                                         uint16_t flags)
 {
-  FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+  /* FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+   *
+   * Do not use pvconn argument to get the TCP connection pointer (the above
+   * commented line) because pvconn is normally NULL for some events like
+   * NETDEV_DOWN. Instead, the TCP connection pointer can be reliably
+   * obtained from the corresponding TCP socket.
+   */
+
   FAR struct socket *psock = (FAR struct socket *)pvpriv;
+  FAR struct tcp_conn_s *conn;
   bool rexmit = false;
 
+  DEBUGASSERT(psock != NULL);
+
+  /* Get the TCP connection pointer reliably from
+   * the corresponding TCP socket.
+   */
+
+  conn = psock->s_conn;
+  DEBUGASSERT(conn != NULL);
+
+  /* The TCP socket is connected and, hence, should be bound to a device.
+   * Make sure that the polling device is the one that we are bound to.
+   */
+
+  DEBUGASSERT(conn->dev != NULL);

Review comment:
       > > it rarely happens in our CI. actually i have noticed it only once.
   > 
   > Do you have CONFIG_DEBUG_INFO=y in your CI config? Beside the debug assert I am going to add logging some variables that may help.
   
   the current setting is like the following.
   i can change them unless too verbose.
   i guess you want to use ninfo right?
   ```
   spacetanuki% grep CONFIG_DEBUG test-config
   CONFIG_DEBUG_ALERT=y
   CONFIG_DEBUG_FEATURES=y
   CONFIG_DEBUG_ERROR=y
   CONFIG_DEBUG_WARN=y
   CONFIG_DEBUG_INFO=y
   CONFIG_DEBUG_ASSERTIONS=y
   CONFIG_DEBUG_BINFMT=y
   CONFIG_DEBUG_BINFMT_ERROR=y
   CONFIG_DEBUG_BINFMT_WARN=y
   # CONFIG_DEBUG_BINFMT_INFO is not set
   # CONFIG_DEBUG_FS is not set
   # CONFIG_DEBUG_GRAPHICS is not set
   # CONFIG_DEBUG_LIB is not set
   # CONFIG_DEBUG_MM is not set
   # CONFIG_DEBUG_NET is not set
   # CONFIG_DEBUG_POWER is not set
   # CONFIG_DEBUG_BATTERY is not set
   # CONFIG_DEBUG_SCHED is not set
   # CONFIG_DEBUG_IRQ is not set
   # CONFIG_DEBUG_GPIO is not set
   # CONFIG_DEBUG_RTC is not set
   # CONFIG_DEBUG_TIMER is not set
   # CONFIG_DEBUG_TCBINFO is not set
   CONFIG_DEBUG_SYMBOLS=y
   CONFIG_DEBUG_NOOPT=y
   # CONFIG_DEBUG_FULLOPT is not set
   spacetanuki% 
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] a-lunev commented on a change in pull request #5252: net/tcp/tcp_send*: reliably obtain the TCP connection pointer in TCP event handlers

Posted by GitBox <gi...@apache.org>.
a-lunev commented on a change in pull request #5252:
URL: https://github.com/apache/incubator-nuttx/pull/5252#discussion_r791664829



##########
File path: net/tcp/tcp_send_buffered.c
##########
@@ -355,10 +355,39 @@ static uint16_t psock_send_eventhandler(FAR struct net_driver_s *dev,
                                         FAR void *pvconn, FAR void *pvpriv,
                                         uint16_t flags)
 {
-  FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+  /* FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+   *
+   * Do not use pvconn argument to get the TCP connection pointer (the above
+   * commented line) because pvconn is normally NULL for some events like
+   * NETDEV_DOWN. Instead, the TCP connection pointer can be reliably
+   * obtained from the corresponding TCP socket.
+   */
+
   FAR struct socket *psock = (FAR struct socket *)pvpriv;
+  FAR struct tcp_conn_s *conn;
   bool rexmit = false;
 
+  DEBUGASSERT(psock != NULL);
+
+  /* Get the TCP connection pointer reliably from
+   * the corresponding TCP socket.
+   */
+
+  conn = psock->s_conn;
+  DEBUGASSERT(conn != NULL);
+
+  /* The TCP socket is connected and, hence, should be bound to a device.
+   * Make sure that the polling device is the one that we are bound to.
+   */
+
+  DEBUGASSERT(conn->dev != NULL);

Review comment:
       @yamt I have noticed your message. I am sorry about the issue. Please let me know if it is urgent for you or I have time until say tomorrow for the investigation.

##########
File path: net/tcp/tcp_send_buffered.c
##########
@@ -355,10 +355,39 @@ static uint16_t psock_send_eventhandler(FAR struct net_driver_s *dev,
                                         FAR void *pvconn, FAR void *pvpriv,
                                         uint16_t flags)
 {
-  FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+  /* FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+   *
+   * Do not use pvconn argument to get the TCP connection pointer (the above
+   * commented line) because pvconn is normally NULL for some events like
+   * NETDEV_DOWN. Instead, the TCP connection pointer can be reliably
+   * obtained from the corresponding TCP socket.
+   */
+
   FAR struct socket *psock = (FAR struct socket *)pvpriv;
+  FAR struct tcp_conn_s *conn;
   bool rexmit = false;
 
+  DEBUGASSERT(psock != NULL);
+
+  /* Get the TCP connection pointer reliably from
+   * the corresponding TCP socket.
+   */
+
+  conn = psock->s_conn;
+  DEBUGASSERT(conn != NULL);
+
+  /* The TCP socket is connected and, hence, should be bound to a device.
+   * Make sure that the polling device is the one that we are bound to.
+   */
+
+  DEBUGASSERT(conn->dev != NULL);

Review comment:
       @yamt preliminary I have analyzed the backtrace vs the source code. So far I am trying to understand the root cause. Could you please say what namely config file you are using? What program are you running (server / client mode)? I would like to reproduce the issue on my side.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] a-lunev commented on a change in pull request #5252: net/tcp/tcp_send*: reliably obtain the TCP connection pointer in TCP event handlers

Posted by GitBox <gi...@apache.org>.
a-lunev commented on a change in pull request #5252:
URL: https://github.com/apache/incubator-nuttx/pull/5252#discussion_r792317926



##########
File path: net/tcp/tcp_send_buffered.c
##########
@@ -355,10 +355,39 @@ static uint16_t psock_send_eventhandler(FAR struct net_driver_s *dev,
                                         FAR void *pvconn, FAR void *pvpriv,
                                         uint16_t flags)
 {
-  FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+  /* FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+   *
+   * Do not use pvconn argument to get the TCP connection pointer (the above
+   * commented line) because pvconn is normally NULL for some events like
+   * NETDEV_DOWN. Instead, the TCP connection pointer can be reliably
+   * obtained from the corresponding TCP socket.
+   */
+
   FAR struct socket *psock = (FAR struct socket *)pvpriv;
+  FAR struct tcp_conn_s *conn;
   bool rexmit = false;
 
+  DEBUGASSERT(psock != NULL);
+
+  /* Get the TCP connection pointer reliably from
+   * the corresponding TCP socket.
+   */
+
+  conn = psock->s_conn;
+  DEBUGASSERT(conn != NULL);
+
+  /* The TCP socket is connected and, hence, should be bound to a device.
+   * Make sure that the polling device is the one that we are bound to.
+   */
+
+  DEBUGASSERT(conn->dev != NULL);

Review comment:
       @yamt thank you for the details!
   
   Yesterday I was thinking it may be because the callback is incorrectly invoked from another connection that does not match the socket, or vice versa.
   Could you also add this debug assert in your copy (before line 383) just for debug?
   `DEBUGASSERT(pvconn == conn || pvconn == NULL);`
   I wonder if it catches something in your test (before `DEBUGASSERT(conn->dev != NULL);` is caught).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] yamt commented on a change in pull request #5252: net/tcp/tcp_send*: reliably obtain the TCP connection pointer in TCP event handlers

Posted by GitBox <gi...@apache.org>.
yamt commented on a change in pull request #5252:
URL: https://github.com/apache/incubator-nuttx/pull/5252#discussion_r793195538



##########
File path: net/tcp/tcp_send_buffered.c
##########
@@ -355,10 +355,39 @@ static uint16_t psock_send_eventhandler(FAR struct net_driver_s *dev,
                                         FAR void *pvconn, FAR void *pvpriv,
                                         uint16_t flags)
 {
-  FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+  /* FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+   *
+   * Do not use pvconn argument to get the TCP connection pointer (the above
+   * commented line) because pvconn is normally NULL for some events like
+   * NETDEV_DOWN. Instead, the TCP connection pointer can be reliably
+   * obtained from the corresponding TCP socket.
+   */
+
   FAR struct socket *psock = (FAR struct socket *)pvpriv;
+  FAR struct tcp_conn_s *conn;
   bool rexmit = false;
 
+  DEBUGASSERT(psock != NULL);
+
+  /* Get the TCP connection pointer reliably from
+   * the corresponding TCP socket.
+   */
+
+  conn = psock->s_conn;
+  DEBUGASSERT(conn != NULL);
+
+  /* The TCP socket is connected and, hence, should be bound to a device.
+   * Make sure that the polling device is the one that we are bound to.
+   */
+
+  DEBUGASSERT(conn->dev != NULL);

Review comment:
       > I think it is reasonable to add this debug assert soon to the code (possibly more complex condition with a better debug). Let's investigate the discovered issue first.
   
   it rarely happens in our CI. actually i have noticed it only once.
   i don't want to introduce a CI job dedicated to this investigation.
   if you think the assertions make sense, just add them.
   our CI will consume it sooner or later. (it's usually a few days behind nuttx master)
   
   > have you tried to revert PR #5252 locally?
   
   no. it isn't reproducible enough to do such bisecting.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] yamt commented on a change in pull request #5252: net/tcp/tcp_send*: reliably obtain the TCP connection pointer in TCP event handlers

Posted by GitBox <gi...@apache.org>.
yamt commented on a change in pull request #5252:
URL: https://github.com/apache/incubator-nuttx/pull/5252#discussion_r792395337



##########
File path: net/tcp/tcp_send_buffered.c
##########
@@ -355,10 +355,39 @@ static uint16_t psock_send_eventhandler(FAR struct net_driver_s *dev,
                                         FAR void *pvconn, FAR void *pvpriv,
                                         uint16_t flags)
 {
-  FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+  /* FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+   *
+   * Do not use pvconn argument to get the TCP connection pointer (the above
+   * commented line) because pvconn is normally NULL for some events like
+   * NETDEV_DOWN. Instead, the TCP connection pointer can be reliably
+   * obtained from the corresponding TCP socket.
+   */
+
   FAR struct socket *psock = (FAR struct socket *)pvpriv;
+  FAR struct tcp_conn_s *conn;
   bool rexmit = false;
 
+  DEBUGASSERT(psock != NULL);
+
+  /* Get the TCP connection pointer reliably from
+   * the corresponding TCP socket.
+   */
+
+  conn = psock->s_conn;
+  DEBUGASSERT(conn != NULL);
+
+  /* The TCP socket is connected and, hence, should be bound to a device.
+   * Make sure that the polling device is the one that we are bound to.
+   */
+
+  DEBUGASSERT(conn->dev != NULL);

Review comment:
       >  Could you also add this debug assert in your copy (before line 383) just for debug?
   
   i will consider. but it isn't that reproducible anyway.
   if they are expected to be always true, why don't you just add them?
   
   btw, what's the motivation of this PR?
   it seems that it's proven the old way was actually more reliable than the new way...




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] a-lunev commented on a change in pull request #5252: net/tcp/tcp_send*: reliably obtain the TCP connection pointer in TCP event handlers

Posted by GitBox <gi...@apache.org>.
a-lunev commented on a change in pull request #5252:
URL: https://github.com/apache/incubator-nuttx/pull/5252#discussion_r793127298



##########
File path: net/tcp/tcp_send_buffered.c
##########
@@ -355,10 +355,39 @@ static uint16_t psock_send_eventhandler(FAR struct net_driver_s *dev,
                                         FAR void *pvconn, FAR void *pvpriv,
                                         uint16_t flags)
 {
-  FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+  /* FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+   *
+   * Do not use pvconn argument to get the TCP connection pointer (the above
+   * commented line) because pvconn is normally NULL for some events like
+   * NETDEV_DOWN. Instead, the TCP connection pointer can be reliably
+   * obtained from the corresponding TCP socket.
+   */
+
   FAR struct socket *psock = (FAR struct socket *)pvpriv;
+  FAR struct tcp_conn_s *conn;
   bool rexmit = false;
 
+  DEBUGASSERT(psock != NULL);
+
+  /* Get the TCP connection pointer reliably from
+   * the corresponding TCP socket.
+   */
+
+  conn = psock->s_conn;
+  DEBUGASSERT(conn != NULL);
+
+  /* The TCP socket is connected and, hence, should be bound to a device.
+   * Make sure that the polling device is the one that we are bound to.
+   */
+
+  DEBUGASSERT(conn->dev != NULL);

Review comment:
       > if they are expected to be always true, why don't you just add them?
   
   I think it is reasonable to add this debug assert soon to the code (possibly more complex condition with a better debug). Let's investigate the discovered issue first.
   
   > btw, what's the motivation of this PR?
   > it seems that it's proven the old way was actually more reliable than the new way...
   
   The motivation of PR #5252 is that pvconn argument can be sometimes NULL. At least it is known now for NETDEV_DOWN event. As I understand, we can consider this behavior was done by design (not a bug). However, inside of the callback handler it is not good that we can not rely on pvconn argument w/o dependence on other conditions (it was even not documented). I think it gets more complex and error-prone during further code changes. Moreover, as it turned out, before PR #5252 NuttX still crashed in tcp send unbuffered mode in case of NETDEV_DOWN event (when network interface went down during TCP sending) because tcp_send_unbuffered.c checked conditions in the following order: (flags & TCP_ACKDATA) -> else if (flags & TCP_REXMIT) -> else if (flags & TCP_DISCONN_EVENTS). This order has never changed in tcp_send_unbuffered.c (and this original order looks the best concerning slightly better performance at high network traffic). The order was previously changed (to workaround pvconn=NUL
 L) only in tcp_send_buffered and tcp_sendfile callback functions.
   Thus I had idea that if we can not rely on pvconn argument, then alternatively the connection pointer can be obtained from the corresponding socket structure through pvpriv argument of the callback function (PR #5252 implements the idea). And then I have recovered the original condition check order in tcp_sendfile (#5262) towards tcp_send_unbuffered vs tcp_sendfile code unification (today these two are already unified vs each other for the most).
   
   As neither me nor you currently do not know the root cause of your discovered case, I think we should investigate the issue and understand the root cause to make right decision.
   Preliminary to me it seems that `DEBUGASSERT(conn->dev != NULL);` has helped to catch an abnormal behavior. If `DEBUGASSERT(pvconn == conn || pvconn == NULL);` also triggers, that will mean there is a serious bug (sockets are bound to wrong connections at some time, and vice versa).
   If we workaround this by masking the problem w/o understanding what is happening, I suppose it may lead to unpredictable and not desirable consequences.
   BTW based on the commit you were testing ([71d3ff1](https://github.com/apache/incubator-nuttx/commit/71d3ff104553aebf4cb938ee8abc6fd7d34fd239)) have you tried to revert PR #5252 locally? Has the floating issue fully disappeared?
   Also please consider commits [f73abc](https://github.com/apache/incubator-nuttx/commit/f73abc76d521bec76982ccc05ff01dfb2c113351) and [019fc0](https://github.com/apache/incubator-nuttx/commit/019fc0ad78a160aad7bd9182fc315e211e2850a7) that influenced sim/netdev seriously (the TCP throughput of sim/netdev was increased significantly). I suppose it may expose some hidden bugs that might exist and were silent before.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] a-lunev commented on a change in pull request #5252: net/tcp/tcp_send*: reliably obtain the TCP connection pointer in TCP event handlers

Posted by GitBox <gi...@apache.org>.
a-lunev commented on a change in pull request #5252:
URL: https://github.com/apache/incubator-nuttx/pull/5252#discussion_r803166896



##########
File path: net/tcp/tcp_send_buffered.c
##########
@@ -355,10 +355,39 @@ static uint16_t psock_send_eventhandler(FAR struct net_driver_s *dev,
                                         FAR void *pvconn, FAR void *pvpriv,
                                         uint16_t flags)
 {
-  FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+  /* FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+   *
+   * Do not use pvconn argument to get the TCP connection pointer (the above
+   * commented line) because pvconn is normally NULL for some events like
+   * NETDEV_DOWN. Instead, the TCP connection pointer can be reliably
+   * obtained from the corresponding TCP socket.
+   */
+
   FAR struct socket *psock = (FAR struct socket *)pvpriv;
+  FAR struct tcp_conn_s *conn;
   bool rexmit = false;
 
+  DEBUGASSERT(psock != NULL);
+
+  /* Get the TCP connection pointer reliably from
+   * the corresponding TCP socket.
+   */
+
+  conn = psock->s_conn;
+  DEBUGASSERT(conn != NULL);
+
+  /* The TCP socket is connected and, hence, should be bound to a device.
+   * Make sure that the polling device is the one that we are bound to.
+   */
+
+  DEBUGASSERT(conn->dev != NULL);

Review comment:
       > my app sometimes uses task_create.
   > CONFIG_FDCLONE_DISABLE is not set.
   
   @yamt And what is CONFIG_FDCLONE_STDIO option value in your CI config?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] a-lunev commented on a change in pull request #5252: net/tcp/tcp_send*: reliably obtain the TCP connection pointer in TCP event handlers

Posted by GitBox <gi...@apache.org>.
a-lunev commented on a change in pull request #5252:
URL: https://github.com/apache/incubator-nuttx/pull/5252#discussion_r795081288



##########
File path: net/tcp/tcp_send_buffered.c
##########
@@ -355,10 +355,39 @@ static uint16_t psock_send_eventhandler(FAR struct net_driver_s *dev,
                                         FAR void *pvconn, FAR void *pvpriv,
                                         uint16_t flags)
 {
-  FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+  /* FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+   *
+   * Do not use pvconn argument to get the TCP connection pointer (the above
+   * commented line) because pvconn is normally NULL for some events like
+   * NETDEV_DOWN. Instead, the TCP connection pointer can be reliably
+   * obtained from the corresponding TCP socket.
+   */
+
   FAR struct socket *psock = (FAR struct socket *)pvpriv;
+  FAR struct tcp_conn_s *conn;
   bool rexmit = false;
 
+  DEBUGASSERT(psock != NULL);
+
+  /* Get the TCP connection pointer reliably from
+   * the corresponding TCP socket.
+   */
+
+  conn = psock->s_conn;
+  DEBUGASSERT(conn != NULL);
+
+  /* The TCP socket is connected and, hence, should be bound to a device.
+   * Make sure that the polling device is the one that we are bound to.
+   */
+
+  DEBUGASSERT(conn->dev != NULL);

Review comment:
       @yamt I have created PR #5373. I have analyzed the source code more and added some more debug asserts in the suspected places.
   CONFIG_DEBUG_ASSERTIONS and CONFIG_DEBUG_NET_ERROR need to be enabled.
   
   BTW the issue is not new. It was mentioned initially in 2015 here: https://github.com/apache/incubator-nuttx/blob/63071a563a1f7646dab9a6bd718d8b4b550471c4/net/socket/net_monitor.c#L166




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] yamt commented on a change in pull request #5252: net/tcp/tcp_send*: reliably obtain the TCP connection pointer in TCP event handlers

Posted by GitBox <gi...@apache.org>.
yamt commented on a change in pull request #5252:
URL: https://github.com/apache/incubator-nuttx/pull/5252#discussion_r791448848



##########
File path: net/tcp/tcp_send_buffered.c
##########
@@ -355,10 +355,39 @@ static uint16_t psock_send_eventhandler(FAR struct net_driver_s *dev,
                                         FAR void *pvconn, FAR void *pvpriv,
                                         uint16_t flags)
 {
-  FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+  /* FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+   *
+   * Do not use pvconn argument to get the TCP connection pointer (the above
+   * commented line) because pvconn is normally NULL for some events like
+   * NETDEV_DOWN. Instead, the TCP connection pointer can be reliably
+   * obtained from the corresponding TCP socket.
+   */
+
   FAR struct socket *psock = (FAR struct socket *)pvpriv;
+  FAR struct tcp_conn_s *conn;
   bool rexmit = false;
 
+  DEBUGASSERT(psock != NULL);
+
+  /* Get the TCP connection pointer reliably from
+   * the corresponding TCP socket.
+   */
+
+  conn = psock->s_conn;
+  DEBUGASSERT(conn != NULL);
+
+  /* The TCP socket is connected and, hence, should be bound to a device.
+   * Make sure that the polling device is the one that we are bound to.
+   */
+
+  DEBUGASSERT(conn->dev != NULL);

Review comment:
       this crashed in our private ci.
   ```
   Program received signal SIGSEGV, Segmentation fault.
   0x000000000053ff81 in psock_send_eventhandler (dev=0x59ebe0 <g_sim_dev>, pvconn=
   0x59ef00 <g_tcp_connections>, pvpriv=0x7f93505f0c70, flags=16) at tcp/tcp_send_b
   uffered.c:383
   rax            0x10                16
   rbx            0x0                 0
   rcx            0x10                16
   rdx            0x7f93505f0c70      140270685326448
   rsi            0x59ef00            5893888
   rdi            0x59ebe0            5893088
   rbp            0x0                 0x0
   rsp            0x7f93505c8560      0x7f93505c8560
   r8             0x10                16
   r9             0x27d729d3b3fa6400  2870809276706350080
   r10            0x8                 8
   r11            0x246               582
   r12            0x0                 0
   r13            0x0                 0
   r14            0x0                 0
   r15            0x0                 0
   rip            0x53ff81            0x53ff81 <psock_send_eventhandler+193>
   eflags         0x10202             [ IF RF ]
   cs             0x33                51
   ss             0x2b                43
   ds             0x0                 0
   es             0x0                 0
   fs             0x0                 0
   gs             0x0                 0
   k0             0x0                 0
   k1             0x0                 0
   k2             0x0                 0
   k3             0x0                 0
   k4             0x0                 0
   k5             0x0                 0
   k6             0x0                 0
   k7             0x0                 0
   #0  0x000000000053ff81 in psock_send_eventhandler (dev=0x59ebe0 <g_sim_dev>, pvconn=0x59ef00 <g_tcp_connections>, pvpriv=0x7f93505f0c70, flags=16) at tcp/tcp_send_buffered.c:383
   #1  0x000000000043294f in devif_conn_event (dev=0x59ebe0 <g_sim_dev>, pvconn=0x59ef00 <g_tcp_connections>, flags=16, list=0x59ff00 <g_cbprealloc+672>) at devif/devif_callback.c:510
   #2  0x000000000043b9f1 in tcp_callback (dev=0x59ebe0 <g_sim_dev>, conn=0x59ef00 <g_tcp_connections>, flags=16) at tcp/tcp_callback.c:169
   #3  0x0000000000438c3f in tcp_timer (dev=0x59ebe0 <g_sim_dev>, conn=0x59ef00 <g_tcp_connections>, hsec=0) at tcp/tcp_timer.c:518
   #4  0x0000000000432224 in devif_poll_tcp_timer (dev=0x59ebe0 <g_sim_dev>, callback=0x4156b0 <netdriver_txpoll>, hsec=0) at devif/devif_poll.c:652
   #5  0x0000000000432136 in devif_timer (dev=0x59ebe0 <g_sim_dev>, delay=0, callback=0x4156b0 <netdriver_txpoll>) at devif/devif_poll.c:862
   #6  0x00000000004157ae in netdriver_txavail_work (arg=0x59ebe0 <g_sim_dev>) at sim/up_netdriver.c:313
   #7  0x0000000000404344 in work_thread (argc=2, argv=0x7f93505b7820) at wqueue/kwork_thread.c:174
   #8  0x0000000000404054 in nxtask_start () at task/task_start.c:125
   #9  0xaaaaaaaaaaaaaaaa in ?? ()
   #10 0xaaaaaaaaaaaaaaaa in ?? ()
   #11 0x8001104000000140 in ?? ()
   #12 0x00007f93505dbaf0 in ?? ()
   #13 0x0000000000000000 in ?? ()
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] anchao commented on a change in pull request #5252: net/tcp/tcp_send*: reliably obtain the TCP connection pointer in TCP event handlers

Posted by GitBox <gi...@apache.org>.
anchao commented on a change in pull request #5252:
URL: https://github.com/apache/incubator-nuttx/pull/5252#discussion_r797481639



##########
File path: net/tcp/tcp_send_buffered.c
##########
@@ -355,10 +355,39 @@ static uint16_t psock_send_eventhandler(FAR struct net_driver_s *dev,
                                         FAR void *pvconn, FAR void *pvpriv,
                                         uint16_t flags)
 {
-  FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+  /* FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+   *
+   * Do not use pvconn argument to get the TCP connection pointer (the above
+   * commented line) because pvconn is normally NULL for some events like
+   * NETDEV_DOWN. Instead, the TCP connection pointer can be reliably
+   * obtained from the corresponding TCP socket.
+   */
+
   FAR struct socket *psock = (FAR struct socket *)pvpriv;
+  FAR struct tcp_conn_s *conn;
   bool rexmit = false;
 
+  DEBUGASSERT(psock != NULL);
+
+  /* Get the TCP connection pointer reliably from
+   * the corresponding TCP socket.
+   */
+
+  conn = psock->s_conn;
+  DEBUGASSERT(conn != NULL);
+
+  /* The TCP socket is connected and, hence, should be bound to a device.
+   * Make sure that the polling device is the one that we are bound to.
+   */
+
+  DEBUGASSERT(conn->dev != NULL);

Review comment:
       I have faced a similar issue on internal code base, the socket handle becomes a wild pointer since the socket instance has been closed before devif callback, the root cause is multiple socket instances(duplicate by task_create/system/exec ...) referencing a tcp connect instance at the same time, if the socket referenced by the s_sndcb is closed, then crash happened.
   
   The solution is move all the socket members into connect instance to ensure that connect does not reference the duplicate resources.
   
   This issue has been solved last year, I will prepare the patch for upstream.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] a-lunev commented on a change in pull request #5252: net/tcp/tcp_send*: reliably obtain the TCP connection pointer in TCP event handlers

Posted by GitBox <gi...@apache.org>.
a-lunev commented on a change in pull request #5252:
URL: https://github.com/apache/incubator-nuttx/pull/5252#discussion_r794077739



##########
File path: net/tcp/tcp_send_buffered.c
##########
@@ -355,10 +355,39 @@ static uint16_t psock_send_eventhandler(FAR struct net_driver_s *dev,
                                         FAR void *pvconn, FAR void *pvpriv,
                                         uint16_t flags)
 {
-  FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+  /* FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+   *
+   * Do not use pvconn argument to get the TCP connection pointer (the above
+   * commented line) because pvconn is normally NULL for some events like
+   * NETDEV_DOWN. Instead, the TCP connection pointer can be reliably
+   * obtained from the corresponding TCP socket.
+   */
+
   FAR struct socket *psock = (FAR struct socket *)pvpriv;
+  FAR struct tcp_conn_s *conn;
   bool rexmit = false;
 
+  DEBUGASSERT(psock != NULL);
+
+  /* Get the TCP connection pointer reliably from
+   * the corresponding TCP socket.
+   */
+
+  conn = psock->s_conn;
+  DEBUGASSERT(conn != NULL);
+
+  /* The TCP socket is connected and, hence, should be bound to a device.
+   * Make sure that the polling device is the one that we are bound to.
+   */
+
+  DEBUGASSERT(conn->dev != NULL);

Review comment:
       Yes, I would like to use ninfo(). I forgot to ask about CONFIG_DEBUG_NET_INFO, however now I see it is disabled.
   Please let me know can you enable CONFIG_DEBUG_NET_INFO in your config? Alternatively, I could use e.g. nerr() instead ninfo(), and put nerr() inside of if () condition. Then CONFIG_DEBUG_NET_ERROR needs to be enabled.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] a-lunev commented on a change in pull request #5252: net/tcp/tcp_send*: reliably obtain the TCP connection pointer in TCP event handlers

Posted by GitBox <gi...@apache.org>.
a-lunev commented on a change in pull request #5252:
URL: https://github.com/apache/incubator-nuttx/pull/5252#discussion_r793127298



##########
File path: net/tcp/tcp_send_buffered.c
##########
@@ -355,10 +355,39 @@ static uint16_t psock_send_eventhandler(FAR struct net_driver_s *dev,
                                         FAR void *pvconn, FAR void *pvpriv,
                                         uint16_t flags)
 {
-  FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+  /* FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+   *
+   * Do not use pvconn argument to get the TCP connection pointer (the above
+   * commented line) because pvconn is normally NULL for some events like
+   * NETDEV_DOWN. Instead, the TCP connection pointer can be reliably
+   * obtained from the corresponding TCP socket.
+   */
+
   FAR struct socket *psock = (FAR struct socket *)pvpriv;
+  FAR struct tcp_conn_s *conn;
   bool rexmit = false;
 
+  DEBUGASSERT(psock != NULL);
+
+  /* Get the TCP connection pointer reliably from
+   * the corresponding TCP socket.
+   */
+
+  conn = psock->s_conn;
+  DEBUGASSERT(conn != NULL);
+
+  /* The TCP socket is connected and, hence, should be bound to a device.
+   * Make sure that the polling device is the one that we are bound to.
+   */
+
+  DEBUGASSERT(conn->dev != NULL);

Review comment:
       > if they are expected to be always true, why don't you just add them?
   
   I think it is reasonable to add this debug assert soon to the code (possibly more complex condition with a better debug). Let's investigate the discovered issue first.
   
   > btw, what's the motivation of this PR?
   > it seems that it's proven the old way was actually more reliable than the new way...
   
   The motivation of PR #5252 is that pvconn argument can be sometimes NULL. At least it is known now for NETDEV_DOWN event. As I understand, we can consider this behavior was done by design (not a bug). However, inside of the callback handler it is not good that we can not rely on pvconn argument w/o dependence on other conditions (it was even not documented). I think it gets complicated and error-prone during further code changes. Moreover, as it turned out, before PR #5252 NuttX still crashed in tcp send unbuffered mode in case of NETDEV_DOWN event (when network interface went down during TCP sending) because tcp_send_unbuffered.c checked conditions in the following order: (flags & TCP_ACKDATA) -> else if (flags & TCP_REXMIT) -> else if (flags & TCP_DISCONN_EVENTS). This order has never changed in tcp_send_unbuffered.c (and this original order looks the best concerning slightly better performance at high network traffic). The order was previously changed (to workaround pvconn=NULL
 ) only in tcp_send_buffered and tcp_sendfile callback functions.
   Thus I had idea that if we can not rely on pvconn argument, then alternatively the connection pointer can be obtained from the corresponding socket structure through pvpriv argument of the callback function (the connection pointer is expected to be not NULL) (PR #5252 implements the idea). And then I have recovered the original condition check order in tcp_sendfile (#5262) towards tcp_send_unbuffered vs tcp_sendfile code unification (today these two are already unified vs each other for the most).
   
   As neither me nor you currently know the root cause of your discovered case, I think we should investigate the issue and understand the root cause to make right decision.
   Preliminary to me it seems that `DEBUGASSERT(conn->dev != NULL);` has helped to catch an abnormal behavior. If `DEBUGASSERT(pvconn == conn || pvconn == NULL);` also triggers, that will mean there is a serious bug (sockets are bound to wrong connections at some time, and vice versa).
   If we workaround this by masking the problem w/o understanding what is happening, I suppose it may lead to unpredictable and not desirable consequences.
   BTW based on the commit you were testing ([71d3ff1](https://github.com/apache/incubator-nuttx/commit/71d3ff104553aebf4cb938ee8abc6fd7d34fd239)) have you tried to revert PR #5252 locally? Has the floating issue fully disappeared?
   Also please consider commits [f73abc](https://github.com/apache/incubator-nuttx/commit/f73abc76d521bec76982ccc05ff01dfb2c113351) and [019fc0](https://github.com/apache/incubator-nuttx/commit/019fc0ad78a160aad7bd9182fc315e211e2850a7) that have influenced sim/netdev seriously (the TCP throughput of sim/netdev was increased significantly). I suppose it may expose some hidden bugs that might exist and were silent before.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] xiaoxiang781216 merged pull request #5252: net/tcp/tcp_send*: reliably obtain the TCP connection pointer in TCP event handlers

Posted by GitBox <gi...@apache.org>.
xiaoxiang781216 merged pull request #5252:
URL: https://github.com/apache/incubator-nuttx/pull/5252


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] a-lunev commented on a change in pull request #5252: net/tcp/tcp_send*: reliably obtain the TCP connection pointer in TCP event handlers

Posted by GitBox <gi...@apache.org>.
a-lunev commented on a change in pull request #5252:
URL: https://github.com/apache/incubator-nuttx/pull/5252#discussion_r793561618



##########
File path: net/tcp/tcp_send_buffered.c
##########
@@ -355,10 +355,39 @@ static uint16_t psock_send_eventhandler(FAR struct net_driver_s *dev,
                                         FAR void *pvconn, FAR void *pvpriv,
                                         uint16_t flags)
 {
-  FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+  /* FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+   *
+   * Do not use pvconn argument to get the TCP connection pointer (the above
+   * commented line) because pvconn is normally NULL for some events like
+   * NETDEV_DOWN. Instead, the TCP connection pointer can be reliably
+   * obtained from the corresponding TCP socket.
+   */
+
   FAR struct socket *psock = (FAR struct socket *)pvpriv;
+  FAR struct tcp_conn_s *conn;
   bool rexmit = false;
 
+  DEBUGASSERT(psock != NULL);
+
+  /* Get the TCP connection pointer reliably from
+   * the corresponding TCP socket.
+   */
+
+  conn = psock->s_conn;
+  DEBUGASSERT(conn != NULL);
+
+  /* The TCP socket is connected and, hence, should be bound to a device.
+   * Make sure that the polling device is the one that we are bound to.
+   */
+
+  DEBUGASSERT(conn->dev != NULL);

Review comment:
       > it rarely happens in our CI. actually i have noticed it only once.
   
   Do you have CONFIG_DEBUG_INFO=y in your CI config? Beside the debug assert I am going to add logging some variables that may help.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] xiaoxiang781216 commented on a change in pull request #5252: net/tcp/tcp_send*: reliably obtain the TCP connection pointer in TCP event handlers

Posted by GitBox <gi...@apache.org>.
xiaoxiang781216 commented on a change in pull request #5252:
URL: https://github.com/apache/incubator-nuttx/pull/5252#discussion_r802281137



##########
File path: net/tcp/tcp_send_buffered.c
##########
@@ -355,10 +355,39 @@ static uint16_t psock_send_eventhandler(FAR struct net_driver_s *dev,
                                         FAR void *pvconn, FAR void *pvpriv,
                                         uint16_t flags)
 {
-  FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+  /* FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+   *
+   * Do not use pvconn argument to get the TCP connection pointer (the above
+   * commented line) because pvconn is normally NULL for some events like
+   * NETDEV_DOWN. Instead, the TCP connection pointer can be reliably
+   * obtained from the corresponding TCP socket.
+   */
+
   FAR struct socket *psock = (FAR struct socket *)pvpriv;
+  FAR struct tcp_conn_s *conn;
   bool rexmit = false;
 
+  DEBUGASSERT(psock != NULL);
+
+  /* Get the TCP connection pointer reliably from
+   * the corresponding TCP socket.
+   */
+
+  conn = psock->s_conn;
+  DEBUGASSERT(conn != NULL);
+
+  /* The TCP socket is connected and, hence, should be bound to a device.
+   * Make sure that the polling device is the one that we are bound to.
+   */
+
+  DEBUGASSERT(conn->dev != NULL);

Review comment:
       If so, you should review #5434 which fix the potential crash in this case.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] a-lunev commented on a change in pull request #5252: net/tcp/tcp_send*: reliably obtain the TCP connection pointer in TCP event handlers

Posted by GitBox <gi...@apache.org>.
a-lunev commented on a change in pull request #5252:
URL: https://github.com/apache/incubator-nuttx/pull/5252#discussion_r793127298



##########
File path: net/tcp/tcp_send_buffered.c
##########
@@ -355,10 +355,39 @@ static uint16_t psock_send_eventhandler(FAR struct net_driver_s *dev,
                                         FAR void *pvconn, FAR void *pvpriv,
                                         uint16_t flags)
 {
-  FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+  /* FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+   *
+   * Do not use pvconn argument to get the TCP connection pointer (the above
+   * commented line) because pvconn is normally NULL for some events like
+   * NETDEV_DOWN. Instead, the TCP connection pointer can be reliably
+   * obtained from the corresponding TCP socket.
+   */
+
   FAR struct socket *psock = (FAR struct socket *)pvpriv;
+  FAR struct tcp_conn_s *conn;
   bool rexmit = false;
 
+  DEBUGASSERT(psock != NULL);
+
+  /* Get the TCP connection pointer reliably from
+   * the corresponding TCP socket.
+   */
+
+  conn = psock->s_conn;
+  DEBUGASSERT(conn != NULL);
+
+  /* The TCP socket is connected and, hence, should be bound to a device.
+   * Make sure that the polling device is the one that we are bound to.
+   */
+
+  DEBUGASSERT(conn->dev != NULL);

Review comment:
       > if they are expected to be always true, why don't you just add them?
   
   I think it is reasonable to add this debug assert soon to the code (possibly more complex condition with a better debug). Let's investigate the discovered issue first.
   
   > btw, what's the motivation of this PR?
   > it seems that it's proven the old way was actually more reliable than the new way...
   
   The motivation of PR #5252 is that pvconn argument can be sometimes NULL. At least it is known now for NETDEV_DOWN event. As I understand, we can consider this behavior was done by design (not a bug). However, inside of the callback handler it is not good that we can not rely on pvconn argument w/o dependence on other conditions (it was even not documented). I think it gets complicated and error-prone during further code changes. Moreover, as it turned out, before PR #5252 NuttX still crashed in tcp send unbuffered mode in case of NETDEV_DOWN event (when network interface went down during TCP sending) because tcp_send_unbuffered.c checked conditions in the following order: (flags & TCP_ACKDATA) -> else if (flags & TCP_REXMIT) -> else if (flags & TCP_DISCONN_EVENTS). This order has never changed in tcp_send_unbuffered.c (and this original order looks the best concerning slightly better performance at high network traffic). The order was previously changed (to workaround pvconn=NULL
 ) only in tcp_send_buffered and tcp_sendfile callback functions.
   Thus I had idea that if we can not rely on pvconn argument, then alternatively the connection pointer can be obtained from the corresponding socket structure through pvpriv argument of the callback function (PR #5252 implements the idea). And then I have recovered the original condition check order in tcp_sendfile (#5262) towards tcp_send_unbuffered vs tcp_sendfile code unification (today these two are already unified vs each other for the most).
   
   As neither me nor you currently do not know the root cause of your discovered case, I think we should investigate the issue and understand the root cause to make right decision.
   Preliminary to me it seems that `DEBUGASSERT(conn->dev != NULL);` has helped to catch an abnormal behavior. If `DEBUGASSERT(pvconn == conn || pvconn == NULL);` also triggers, that will mean there is a serious bug (sockets are bound to wrong connections at some time, and vice versa).
   If we workaround this by masking the problem w/o understanding what is happening, I suppose it may lead to unpredictable and not desirable consequences.
   BTW based on the commit you were testing ([71d3ff1](https://github.com/apache/incubator-nuttx/commit/71d3ff104553aebf4cb938ee8abc6fd7d34fd239)) have you tried to revert PR #5252 locally? Has the floating issue fully disappeared?
   Also please consider commits [f73abc](https://github.com/apache/incubator-nuttx/commit/f73abc76d521bec76982ccc05ff01dfb2c113351) and [019fc0](https://github.com/apache/incubator-nuttx/commit/019fc0ad78a160aad7bd9182fc315e211e2850a7) that influenced sim/netdev seriously (the TCP throughput of sim/netdev was increased significantly). I suppose it may expose some hidden bugs that might exist and were silent before.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] a-lunev commented on a change in pull request #5252: net/tcp/tcp_send*: reliably obtain the TCP connection pointer in TCP event handlers

Posted by GitBox <gi...@apache.org>.
a-lunev commented on a change in pull request #5252:
URL: https://github.com/apache/incubator-nuttx/pull/5252#discussion_r791664829



##########
File path: net/tcp/tcp_send_buffered.c
##########
@@ -355,10 +355,39 @@ static uint16_t psock_send_eventhandler(FAR struct net_driver_s *dev,
                                         FAR void *pvconn, FAR void *pvpriv,
                                         uint16_t flags)
 {
-  FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+  /* FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+   *
+   * Do not use pvconn argument to get the TCP connection pointer (the above
+   * commented line) because pvconn is normally NULL for some events like
+   * NETDEV_DOWN. Instead, the TCP connection pointer can be reliably
+   * obtained from the corresponding TCP socket.
+   */
+
   FAR struct socket *psock = (FAR struct socket *)pvpriv;
+  FAR struct tcp_conn_s *conn;
   bool rexmit = false;
 
+  DEBUGASSERT(psock != NULL);
+
+  /* Get the TCP connection pointer reliably from
+   * the corresponding TCP socket.
+   */
+
+  conn = psock->s_conn;
+  DEBUGASSERT(conn != NULL);
+
+  /* The TCP socket is connected and, hence, should be bound to a device.
+   * Make sure that the polling device is the one that we are bound to.
+   */
+
+  DEBUGASSERT(conn->dev != NULL);

Review comment:
       @yamt I have noticed your message. I am sorry about the issue. Please let me know if it is urgent for you or I have time until say tomorrow for the investigation.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] a-lunev commented on a change in pull request #5252: net/tcp/tcp_send*: reliably obtain the TCP connection pointer in TCP event handlers

Posted by GitBox <gi...@apache.org>.
a-lunev commented on a change in pull request #5252:
URL: https://github.com/apache/incubator-nuttx/pull/5252#discussion_r792317926



##########
File path: net/tcp/tcp_send_buffered.c
##########
@@ -355,10 +355,39 @@ static uint16_t psock_send_eventhandler(FAR struct net_driver_s *dev,
                                         FAR void *pvconn, FAR void *pvpriv,
                                         uint16_t flags)
 {
-  FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+  /* FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+   *
+   * Do not use pvconn argument to get the TCP connection pointer (the above
+   * commented line) because pvconn is normally NULL for some events like
+   * NETDEV_DOWN. Instead, the TCP connection pointer can be reliably
+   * obtained from the corresponding TCP socket.
+   */
+
   FAR struct socket *psock = (FAR struct socket *)pvpriv;
+  FAR struct tcp_conn_s *conn;
   bool rexmit = false;
 
+  DEBUGASSERT(psock != NULL);
+
+  /* Get the TCP connection pointer reliably from
+   * the corresponding TCP socket.
+   */
+
+  conn = psock->s_conn;
+  DEBUGASSERT(conn != NULL);
+
+  /* The TCP socket is connected and, hence, should be bound to a device.
+   * Make sure that the polling device is the one that we are bound to.
+   */
+
+  DEBUGASSERT(conn->dev != NULL);

Review comment:
       @yamt thank you for the details!
   
   Yesterday I was thinking it may be because the callback is incorrectly invoked from another connection that does not match the socket, or vice versa.
   Could you also add this debug assert in your code (before line 383) just for debug?
   `DEBUGASSERT(pvconn == conn || pvconn == NULL);`
   I wonder if it catches something in your test (before `DEBUGASSERT(conn->dev != NULL);` is caught).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] a-lunev commented on a change in pull request #5252: net/tcp/tcp_send*: reliably obtain the TCP connection pointer in TCP event handlers

Posted by GitBox <gi...@apache.org>.
a-lunev commented on a change in pull request #5252:
URL: https://github.com/apache/incubator-nuttx/pull/5252#discussion_r786398944



##########
File path: net/tcp/tcp_send_buffered.c
##########
@@ -355,10 +355,38 @@ static uint16_t psock_send_eventhandler(FAR struct net_driver_s *dev,
                                         FAR void *pvconn, FAR void *pvpriv,
                                         uint16_t flags)
 {
-  FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+  /* FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+   *
+   * Do not use pvconn argument to get the TCP connection pointer (the above
+   * commented line) because pvconn is normally NULL for some events like
+   * NETDEV_DOWN. Instead, the TCP connection pointer can be reliably
+   * obtained from the corresponding TCP socket.
+   */
+
   FAR struct socket *psock = (FAR struct socket *)pvpriv;
+  DEBUGASSERT(psock != NULL);

Review comment:
       I'm sorry. I forgot about C89 again. Fixed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] yamt commented on a change in pull request #5252: net/tcp/tcp_send*: reliably obtain the TCP connection pointer in TCP event handlers

Posted by GitBox <gi...@apache.org>.
yamt commented on a change in pull request #5252:
URL: https://github.com/apache/incubator-nuttx/pull/5252#discussion_r802260920



##########
File path: net/tcp/tcp_send_buffered.c
##########
@@ -355,10 +355,39 @@ static uint16_t psock_send_eventhandler(FAR struct net_driver_s *dev,
                                         FAR void *pvconn, FAR void *pvpriv,
                                         uint16_t flags)
 {
-  FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+  /* FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+   *
+   * Do not use pvconn argument to get the TCP connection pointer (the above
+   * commented line) because pvconn is normally NULL for some events like
+   * NETDEV_DOWN. Instead, the TCP connection pointer can be reliably
+   * obtained from the corresponding TCP socket.
+   */
+
   FAR struct socket *psock = (FAR struct socket *)pvpriv;
+  FAR struct tcp_conn_s *conn;
   bool rexmit = false;
 
+  DEBUGASSERT(psock != NULL);
+
+  /* Get the TCP connection pointer reliably from
+   * the corresponding TCP socket.
+   */
+
+  conn = psock->s_conn;
+  DEBUGASSERT(conn != NULL);
+
+  /* The TCP socket is connected and, hence, should be bound to a device.
+   * Make sure that the polling device is the one that we are bound to.
+   */
+
+  DEBUGASSERT(conn->dev != NULL);

Review comment:
       my app sometimes uses task_create.
   CONFIG_FDCLONE_DISABLE is not set.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] a-lunev commented on a change in pull request #5252: net/tcp/tcp_send*: reliably obtain the TCP connection pointer in TCP event handlers

Posted by GitBox <gi...@apache.org>.
a-lunev commented on a change in pull request #5252:
URL: https://github.com/apache/incubator-nuttx/pull/5252#discussion_r802242652



##########
File path: net/tcp/tcp_send_buffered.c
##########
@@ -355,10 +355,39 @@ static uint16_t psock_send_eventhandler(FAR struct net_driver_s *dev,
                                         FAR void *pvconn, FAR void *pvpriv,
                                         uint16_t flags)
 {
-  FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+  /* FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+   *
+   * Do not use pvconn argument to get the TCP connection pointer (the above
+   * commented line) because pvconn is normally NULL for some events like
+   * NETDEV_DOWN. Instead, the TCP connection pointer can be reliably
+   * obtained from the corresponding TCP socket.
+   */
+
   FAR struct socket *psock = (FAR struct socket *)pvpriv;
+  FAR struct tcp_conn_s *conn;
   bool rexmit = false;
 
+  DEBUGASSERT(psock != NULL);
+
+  /* Get the TCP connection pointer reliably from
+   * the corresponding TCP socket.
+   */
+
+  conn = psock->s_conn;
+  DEBUGASSERT(conn != NULL);
+
+  /* The TCP socket is connected and, hence, should be bound to a device.
+   * Make sure that the polling device is the one that we are bound to.
+   */
+
+  DEBUGASSERT(conn->dev != NULL);

Review comment:
       @yamt Is your app using dup() or dup2() or vfork() or posix_spawn() or possibly some other call that may result in calling dup()/dup2()?
   What is CONFIG_FDCLONE_DISABLE option value in your CI config?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] a-lunev commented on a change in pull request #5252: net/tcp/tcp_send*: reliably obtain the TCP connection pointer in TCP event handlers

Posted by GitBox <gi...@apache.org>.
a-lunev commented on a change in pull request #5252:
URL: https://github.com/apache/incubator-nuttx/pull/5252#discussion_r791787509



##########
File path: net/tcp/tcp_send_buffered.c
##########
@@ -355,10 +355,39 @@ static uint16_t psock_send_eventhandler(FAR struct net_driver_s *dev,
                                         FAR void *pvconn, FAR void *pvpriv,
                                         uint16_t flags)
 {
-  FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+  /* FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+   *
+   * Do not use pvconn argument to get the TCP connection pointer (the above
+   * commented line) because pvconn is normally NULL for some events like
+   * NETDEV_DOWN. Instead, the TCP connection pointer can be reliably
+   * obtained from the corresponding TCP socket.
+   */
+
   FAR struct socket *psock = (FAR struct socket *)pvpriv;
+  FAR struct tcp_conn_s *conn;
   bool rexmit = false;
 
+  DEBUGASSERT(psock != NULL);
+
+  /* Get the TCP connection pointer reliably from
+   * the corresponding TCP socket.
+   */
+
+  conn = psock->s_conn;
+  DEBUGASSERT(conn != NULL);
+
+  /* The TCP socket is connected and, hence, should be bound to a device.
+   * Make sure that the polling device is the one that we are bound to.
+   */
+
+  DEBUGASSERT(conn->dev != NULL);

Review comment:
       @yamt preliminary I have analyzed the backtrace vs the source code. So far I am trying to understand the root cause. Could you please say what namely config file you are using? What program are you running (server / client mode)? I would like to reproduce the issue on my side.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] yamt commented on a change in pull request #5252: net/tcp/tcp_send*: reliably obtain the TCP connection pointer in TCP event handlers

Posted by GitBox <gi...@apache.org>.
yamt commented on a change in pull request #5252:
URL: https://github.com/apache/incubator-nuttx/pull/5252#discussion_r792211285



##########
File path: net/tcp/tcp_send_buffered.c
##########
@@ -355,10 +355,39 @@ static uint16_t psock_send_eventhandler(FAR struct net_driver_s *dev,
                                         FAR void *pvconn, FAR void *pvpriv,
                                         uint16_t flags)
 {
-  FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+  /* FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+   *
+   * Do not use pvconn argument to get the TCP connection pointer (the above
+   * commented line) because pvconn is normally NULL for some events like
+   * NETDEV_DOWN. Instead, the TCP connection pointer can be reliably
+   * obtained from the corresponding TCP socket.
+   */
+
   FAR struct socket *psock = (FAR struct socket *)pvpriv;
+  FAR struct tcp_conn_s *conn;
   bool rexmit = false;
 
+  DEBUGASSERT(psock != NULL);
+
+  /* Get the TCP connection pointer reliably from
+   * the corresponding TCP socket.
+   */
+
+  conn = psock->s_conn;
+  DEBUGASSERT(conn != NULL);
+
+  /* The TCP socket is connected and, hence, should be bound to a device.
+   * Make sure that the polling device is the one that we are bound to.
+   */
+
+  DEBUGASSERT(conn->dev != NULL);

Review comment:
       not that urgent for us.
   
   * it was sim on ubuntu.
   * it was with commit 71d3ff104553aebf4cb938ee8abc6fd7d34fd239
   * it was running an app which does tcp active open. (it's a MQTT client, if it matters.)
   * it seems that it doesn't always crash. i dunno a way to reproduce it reliably.
   * it crashed after the app decided to reconnect to the server.
   * before the reconnection, probably on the previous tcp connection, mbedtls complained something unusual. it might be another issue in the recent tcp changes. `mbedtls/library/ssl_msg.c:3566: |1| unknown record type 128`
   * i can't share you the whole config as it is. TCP related options are as the following. if you want to know other specific configuration, please ask.
   
   ```
   spacetanuki% grep TCP test-config
   # CONFIG_NET_TCPPROTO_OPTIONS is not set
   # TCP/IP Networking
   CONFIG_NET_TCP=y
   # CONFIG_NET_TCP_NO_STACK is not set
   # CONFIG_NET_TCP_DELAYED_ACK is not set
   # CONFIG_NET_TCP_KEEPALIVE is not set
   # CONFIG_NET_TCPURGDATA is not set
   CONFIG_NET_TCP_CONNS=8
   CONFIG_NET_TCP_NPOLLWAITERS=1
   CONFIG_NET_TCP_RTO=3
   CONFIG_NET_TCP_WAIT_TIMEOUT=120
   CONFIG_NET_TCP_FAST_RETRANSMIT_WATERMARK=3
   # CONFIG_NET_TCP_WINDOW_SCALE is not set
   CONFIG_NET_TCP_NOTIFIER=y
   CONFIG_NET_TCP_WRITE_BUFFERS=y
   CONFIG_NET_TCP_NWRBCHAINS=8
   # CONFIG_NET_TCP_WRBUFFER_DEBUG is not set
   # CONFIG_NET_TCPBACKLOG is not set
   # CONFIG_EXAMPLES_TCPBLASTER is not set
   # CONFIG_EXAMPLES_TCPECHO is not set
   spacetanuki% 
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] a-lunev commented on a change in pull request #5252: net/tcp/tcp_send*: reliably obtain the TCP connection pointer in TCP event handlers

Posted by GitBox <gi...@apache.org>.
a-lunev commented on a change in pull request #5252:
URL: https://github.com/apache/incubator-nuttx/pull/5252#discussion_r793127298



##########
File path: net/tcp/tcp_send_buffered.c
##########
@@ -355,10 +355,39 @@ static uint16_t psock_send_eventhandler(FAR struct net_driver_s *dev,
                                         FAR void *pvconn, FAR void *pvpriv,
                                         uint16_t flags)
 {
-  FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+  /* FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+   *
+   * Do not use pvconn argument to get the TCP connection pointer (the above
+   * commented line) because pvconn is normally NULL for some events like
+   * NETDEV_DOWN. Instead, the TCP connection pointer can be reliably
+   * obtained from the corresponding TCP socket.
+   */
+
   FAR struct socket *psock = (FAR struct socket *)pvpriv;
+  FAR struct tcp_conn_s *conn;
   bool rexmit = false;
 
+  DEBUGASSERT(psock != NULL);
+
+  /* Get the TCP connection pointer reliably from
+   * the corresponding TCP socket.
+   */
+
+  conn = psock->s_conn;
+  DEBUGASSERT(conn != NULL);
+
+  /* The TCP socket is connected and, hence, should be bound to a device.
+   * Make sure that the polling device is the one that we are bound to.
+   */
+
+  DEBUGASSERT(conn->dev != NULL);

Review comment:
       > if they are expected to be always true, why don't you just add them?
   
   I think it is reasonable to add this debug assert soon to the code (possibly more complex condition with a better debug). Let's investigate the discovered issue first.
   
   > btw, what's the motivation of this PR?
   > it seems that it's proven the old way was actually more reliable than the new way...
   
   The motivation of PR #5252 is that pvconn argument can be sometimes NULL. At least it is known now for NETDEV_DOWN event. As I understand, we can consider this behavior was done by design (not a bug). However, inside of the callback handler it is not good that we can not rely on pvconn argument w/o dependence on other conditions (it was even not documented). I think it gets complicated and error-prone during further code changes. Moreover, as it turned out, before PR #5252 NuttX still crashed in tcp send unbuffered mode in case of NETDEV_DOWN event (when network interface went down during TCP sending) because tcp_send_unbuffered.c checked conditions in the following order: (flags & TCP_ACKDATA) -> else if (flags & TCP_REXMIT) -> else if (flags & TCP_DISCONN_EVENTS). This order has never changed in tcp_send_unbuffered.c (and this original order looks the best concerning slightly better performance at high network traffic). The order was previously changed (to workaround pvconn=NULL
 ) only in tcp_send_buffered and tcp_sendfile callback functions.
   Thus I had idea that if we can not rely on pvconn argument, then alternatively the connection pointer can be obtained from the corresponding socket structure through pvpriv argument of the callback function (PR #5252 implements the idea). And then I have recovered the original condition check order in tcp_sendfile (#5262) towards tcp_send_unbuffered vs tcp_sendfile code unification (today these two are already unified vs each other for the most).
   
   As neither me nor you currently do not know the root cause of your discovered case, I think we should investigate the issue and understand the root cause to make right decision.
   Preliminary to me it seems that `DEBUGASSERT(conn->dev != NULL);` has helped to catch an abnormal behavior. If `DEBUGASSERT(pvconn == conn || pvconn == NULL);` also triggers, that will mean there is a serious bug (sockets are bound to wrong connections at some time, and vice versa).
   If we workaround this by masking the problem w/o understanding what is happening, I suppose it may lead to unpredictable and not desirable consequences.
   BTW based on the commit you were testing ([71d3ff1](https://github.com/apache/incubator-nuttx/commit/71d3ff104553aebf4cb938ee8abc6fd7d34fd239)) have you tried to revert PR #5252 locally? Has the floating issue fully disappeared?
   Also please consider commits [f73abc](https://github.com/apache/incubator-nuttx/commit/f73abc76d521bec76982ccc05ff01dfb2c113351) and [019fc0](https://github.com/apache/incubator-nuttx/commit/019fc0ad78a160aad7bd9182fc315e211e2850a7) that have influenced sim/netdev seriously (the TCP throughput of sim/netdev was increased significantly). I suppose it may expose some hidden bugs that might exist and were silent before.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org