You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@nuttx.apache.org by GitBox <gi...@apache.org> on 2021/07/10 01:23:10 UTC

[GitHub] [incubator-nuttx] Donny9 opened a new pull request #4111: syslog/intbuf: force output log when syslog interrupt buffer is full

Donny9 opened a new pull request #4111:
URL: https://github.com/apache/incubator-nuttx/pull/4111


   ## Summary
   syslog/intbuf: force output log when syslog interrupt buffer is full instead 
   of adding overrun memssage:"[truncated]\n" to the end.
   
   Change-Id: Ie27cdf0ca902d9ba203619a95162f01222e371c5
   Signed-off-by: Jiuzhu Dong <do...@xiaomi.com>
   ## Impact
   
   ## Testing
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-nuttx] fjpanag commented on pull request #4111: syslog/intbuf: force output log when syslog interrupt buffer is full

Posted by GitBox <gi...@apache.org>.

fjpanag commented on pull request #4111:
URL: https://github.com/apache/incubator-nuttx/pull/4111#issuecomment-879158019


   Why this change was made?
   
   I consider it quite bad.  
   That is because:
   
   1. The purpose of the intbuffer was to temporarily store the messages, till the system is out of an IRQ and can flush them. There was no writing in an IRQ, while now there is. This totally defeats the purpose of the intbuffer.
   
   2. This makes the system unpredictable and seriously affect the real-time behavior of the system. An IRQ may or may not need to flush the buffer. These two scenarios will have a huge difference in the timings of the IRQ. And most importantly, this is caused by code outside the scope of the IRQ itself. So, it is impossible to time the IRQ and guarantee the scheduling of the system.
   
   3. Many syslog channels do not support the force method. In this case the contents of the buffer are completely lost, without any hint of the fact. The user thinks that the IRQ never executed. At least before there was a hint that the buffer is small and that there is need to enlarge it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-nuttx] xiaoxiang781216 commented on pull request #4111: syslog/intbuf: force output log when syslog interrupt buffer is full

Posted by GitBox <gi...@apache.org>.

xiaoxiang781216 commented on pull request #4111:
URL: https://github.com/apache/incubator-nuttx/pull/4111#issuecomment-879869911


   > Also, losing data for some channels that do not support "force", is not very good.
   > 
   
   This is the fact of some log(e.g. file) implementation, but many other log driver support force.  
   
   > Not all logs are for the developer, working on a device, _on the bench_.
   > 
   > As an example. My application writes logs during its normal operation (also for non-OS purposes).
   > Logs are an important aspect of the device. Since the device is working unattended, the logs are kept in the SD card and uploaded to a server at a later time.
   > So a file channel is the only thing that I have available.
   > 
   > I am now working on a "crash reporter".
   > Whenever something bad happens, a core dump, the logs and other data are automatically uploaded to the server.
   > An "issue" is automatically opened in our GitLab server.
   > The issue is fixed by the developers and CI/CD automatically updates all devices.
   > 
   > Neglecting channels such as the file channel, will simply make us lose information.
   > More advanced development cycles will be impacted.
   > 
   > So, it's not a great assumption that development only happens on the bench with a serial logger.
   > In any way that the above are finally fixed / improved, its best to have such scenarios in mind.
   
   I suggest you don't use file channel here directly, otherwise you will lose the most important information. The reliable solution is:
   
   1. Switch to the ramlog
   2. Start log service and save/send the ramlog regularly
   3. Restore the unread ramlog in the next reboot
   
   So you never lose the log in any situation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-nuttx] xiaoxiang781216 commented on pull request #4111: syslog/intbuf: force output log when syslog interrupt buffer is full

Posted by GitBox <gi...@apache.org>.

xiaoxiang781216 commented on pull request #4111:
URL: https://github.com/apache/incubator-nuttx/pull/4111#issuecomment-879189811


   > Why this change was made?
   > 
   
   When the panic happen, there is huge log output. The critical log will lose before this change.
   
   > I consider it quite bad.
   > That is because:
   > 
   > 1. The purpose of the intbuffer was to temporarily store the messages, till the system is out of an IRQ and can flush them. There was no writing in an IRQ, while now there is. This totally defeats the purpose of the intbuffer.
   > 2. This makes the system unpredictable and seriously affect the real-time behavior of the system. An IRQ may or may not need to flush the buffer. These two scenarios will have a huge difference in the timings of the IRQ. And most importantly, this is caused by code outside the scope of the IRQ itself. So, it is impossible to time the IRQ and guarantee the scheduling of the system.
   
   If you don't write too much log in the interrupt handler, the intbuffer still work as before(buffer and then flush out side the interrupt). The difference happen when the output in one interrupt handler is larger than the intbuffer size. Except the panic case I mentioned before, I suppose it happen most likely the developer is debugging the code and add many log and then the log content is more important than the timing.
   
   > 3. Many syslog channels do not support the force method. In this case the contents of the buffer are completely lost, without any hint of the fact. The user thinks that the IRQ never executed. At least before there was a hint that the buffer is small and that there is need to enlarge it.
   
   Good catch, the user may lose the notice. How about we bring back the notice but still output the log?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-nuttx] fjpanag commented on pull request #4111: syslog/intbuf: force output log when syslog interrupt buffer is full

Posted by GitBox <gi...@apache.org>.

fjpanag commented on pull request #4111:
URL: https://github.com/apache/incubator-nuttx/pull/4111#issuecomment-880822336


   > I suggest you don't use file channel here directly, otherwise you will lose the most important information. The reliable solution is:
   > 
   >     1. Switch to the ramlog
   > 
   >     2. Start log service and save/send the ramlog regularly
   > 
   >     3. Restore the unread ramlog in the next reboot
   > 
   > 
   > So you never lose the log in any situation.
   
   Thank you for this idea, I will take a look on this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-nuttx] xiaoxiang781216 edited a comment on pull request #4111: syslog/intbuf: force output log when syslog interrupt buffer is full

Posted by GitBox <gi...@apache.org>.

xiaoxiang781216 edited a comment on pull request #4111:
URL: https://github.com/apache/incubator-nuttx/pull/4111#issuecomment-880952711


   > IF there is a call to syslog... If not, IRQs continue to write to intbuffer, until overflow. There is no guarantee on if and when the buffer will be flushed. At least from the IRQ's perspective.
   
   Oh, I forget this case. one solution(again) is work queue if it exist.
   
   > 
   > Thank you for this idea, I will take a look on this.
   
   This patch is important to recovery the log after reboot:
   https://github.com/apache/incubator-nuttx/pull/3808
   Please remeber to put the ram buffer out side .bss section, otherwise the initializition code will zero it before we have a chance to restore the last log.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-nuttx] fjpanag commented on pull request #4111: syslog/intbuf: force output log when syslog interrupt buffer is full

Posted by GitBox <gi...@apache.org>.

fjpanag commented on pull request #4111:
URL: https://github.com/apache/incubator-nuttx/pull/4111#issuecomment-879853848

Also, losing data for some channels that do not support "force", is not very good.

Not all logs are for the developer, working on a device, *on the bench*.

As an example. My application writes logs during its normal operation (also for non-OS purposes).
Logs are an important aspect of the device. Since the device is working unattended, the logs are kept in the SD card and uploaded to a server at a later time.
So a file channel is the only thing that I have available.

I am now working on a "crash reporter".
Whenever something bad happens, a core dump, the logs and other data are automatically uploaded to the server.
An "issue" is automatically opened in our GitLab server.
The issue is fixed by the developers and CI/CD automatically updates all devices.

Neglecting channels such as the file channel, will simply make us lose information.
More advanced development cycles will be impacted.

So, it's not a great assumption that development only happens on the bench with a serial logger.
In any way that the above are finally fixed / improved, its best to have such scenarios in mind.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-nuttx] xiaoxiang781216 edited a comment on pull request #4111: syslog/intbuf: force output log when syslog interrupt buffer is full

Posted by GitBox <gi...@apache.org>.

xiaoxiang781216 edited a comment on pull request #4111:
URL: https://github.com/apache/incubator-nuttx/pull/4111#issuecomment-880952711


   > IF there is a call to syslog... If not, IRQs continue to write to intbuffer, until overflow. There is no guarantee on if and when the buffer will be flushed. At least from the IRQ's perspective.
   
   Oh, I forget this case. one solution(again) is work queue if it exist.
   
   > 
   > Thank you for this idea, I will take a look on this.
   
   This patch is important to recovery the log after reboot: https://github.com/apache/incubator-nuttx/pull/3808
   Please remeber to put the ram buffer out side .bss section, otherwise the initializition code will zero it before we have a chance to restore the last log.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-nuttx] fjpanag commented on pull request #4111: syslog/intbuf: force output log when syslog interrupt buffer is full

Posted by GitBox <gi...@apache.org>.

fjpanag commented on pull request #4111:
URL: https://github.com/apache/incubator-nuttx/pull/4111#issuecomment-879824881

*Some random thoughts...*

I think that the `[truncated]` message should be considered error handling *only*. The error is the buffer that is too small, and the handling is the truncation of the message. I think the only proper solution is to increase the buffer size. Trying to make this a "normal way of functioning" will only bring problems, I believe.

That being said, I don't think that the old implementation was that bad.

---

You are very right that a crash produces too many logs. And we cannot afford a "large enough" buffer, just in case.
However, a system crash is a special case. Maybe it also deserves special handling?

One idea would be to ditch the syslog interface, and have the system access directly the underlying channels, specifically in the case of a system crash.

Another idea is to have a special "crash handler".
During the crash, all threads may be forcefully terminated. A new special thread is created. Execution returns to this special thread (the only one existing).
There, we can do all error handling, logging, recovery actions etc, without being within a critical section or IRQ.
The intbuffer will be able to be flushed normally, all syslog channels will work correctly etc.

Please note [this issue](https://github.com/apache/incubator-nuttx/issues/3417) too.
A "crash handling thread" may also solve such issues too.

---

What I really don't like about the intbuffer system is that it waits to be flushed during the next non-IRQ call to syslog. Which may be very late (or never happen), and the buffer maybe full by then.
What if every log to intbuffer was scheduling a flush in the near future automatically? Maybe by a watchdog, or similar facility? So we do not depend on log messages in unrelated parts of the code.

This does not eliminate the issue, but greatly improves the situation. It reduces the chances of an overflow, and makes the system somewhat more predictable.
It will also make it much easier to size the intbuffer correctly to ensure that it will never overflow (and I believe the buffer would be smaller too).

Overflowing during a crash will still be a problem, though...

---

> Good catch, the user may lose the notice. How about we bring back the notice but still output the log?

I would insist that writing to the log while in IRQ is not good.
I would definitely prefer the system to lose some logs, instead of losing its guarantees...

If your concern is a system crash, and only this, then maybe flushing should be allowed only then? And normal IRQs just truncate the logs?

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-nuttx] fjpanag commented on pull request #4111: syslog/intbuf: force output log when syslog interrupt buffer is full

Posted by GitBox <gi...@apache.org>.

fjpanag commented on pull request #4111:
URL: https://github.com/apache/incubator-nuttx/pull/4111#issuecomment-880820125


   > If you consider this error should be fixed by larging buffer before production, it is enough to notice the user by bringing back '[trucated]' message.
   
   Sounds good.
   
   > It isn't a safe option because the thread may fail to create in this case.
   
   > It isn't safe too, even we create a crash thread in the initial phase, because the fatal error may make the scheduler stop working.
   
   Yeah, I was expecting it that it wouldn't be so simple.
   
   Maybe there shall be a dedicated piece of code to handle a crash? So the IRQ returns to this code specifically, instead of the "main" scheduler and all normal threads.
   
   You definitely have a better understanding about those aspects of the kernel than me. Do you have any idea on how/if this is possible?
   
   > You can't use watchdog here, because it still in the interrupt context, but you can schedule a hp/lop work.
   
   I think it is best not to have syslog depend on workers. Smaller systems may not be able to afford a worker thread.
   
   > The current implementation is fast one regarding the buffer overflow. Once OS switch to the normal thread context, the interrupt buffer will flush in the first syslog call.
   
   IF there is a call to syslog... If not, IRQs continue to write to intbuffer, until overflow. There is no guarantee on if and when the buffer will be flushed. At least from the IRQ's perspective.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-nuttx] xiaoxiang781216 commented on pull request #4111: syslog/intbuf: force output log when syslog interrupt buffer is full

Posted by GitBox <gi...@apache.org>.

xiaoxiang781216 commented on pull request #4111:
URL: https://github.com/apache/incubator-nuttx/pull/4111#issuecomment-880952711


   > IF there is a call to syslog... If not, IRQs continue to write to intbuffer, until overflow. There is no guarantee on if and when the buffer will be flushed. At least from the IRQ's perspective.
   
   Oh, I forget this case. one solution(again) is work queue if it exist.
   
   For ramlog solution, this patch is important to recovery the log after reboot:
   https://github.com/apache/incubator-nuttx/pull/3808
   Please remeber to put the ram buffer out side .bss section, otherwise the initializition code will zero it before we restore the last log.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-nuttx] xiaoxiang781216 merged pull request #4111: syslog/intbuf: force output log when syslog interrupt buffer is full

Posted by GitBox <gi...@apache.org>.

xiaoxiang781216 merged pull request #4111:
URL: https://github.com/apache/incubator-nuttx/pull/4111


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-nuttx] xiaoxiang781216 commented on pull request #4111: syslog/intbuf: force output log when syslog interrupt buffer is full

Posted by GitBox <gi...@apache.org>.

xiaoxiang781216 commented on pull request #4111:
URL: https://github.com/apache/incubator-nuttx/pull/4111#issuecomment-879863483


   > _Some random thoughts..._
   > 
   > I think that the `[truncated]` message should be considered error handling _only_. The error is the buffer that is too small, and the handling is the truncation of the message. I think the only proper solution is to increase the buffer size. Trying to make this a "normal way of functioning" will only bring problems, I believe.
   
   If you consider this error should be fixed by larging buffer before production, it is enough to notice the user by bringing back '[trucated]' message.
   
   > 
   > That being said, I don't think that the old implementation was that bad.
   > 
   > You are very right that a crash produces too many logs. And we cannot afford a "large enough" buffer, just in case.
   > However, a system crash is a special case. Maybe it also deserves special handling?
   > 
   > One idea would be to ditch the syslog interface, and have the system access directly the underlying channels, specifically in the case of a system crash.
   
   Yes, it's a good idea, but we need fix the panic log lose issue before this come to true.
   
   > 
   > Another idea is to have a special "crash handler".
   > During the crash, all threads may be forcefully terminated. A new special thread is created. Execution returns to this special thread (the only one existing).
   
   It isn't a safe option because the thread may fail to create in this case.
   
   > There, we can do all error handling, logging, recovery actions etc, without being within a critical section or IRQ.
   > The intbuffer will be able to be flushed normally, all syslog channels will work correctly etc.
   > 
   > Please note [this issue](https://github.com/apache/incubator-nuttx/issues/3417) too.
   > A "crash handling thread" may also solve such issues too.
   > 
   
   It isn't safe too, even we create a crash thread in the initial phase, because the fatal error may make the scheduler stop working.
   
   > What I really don't like about the intbuffer system is that it waits to be flushed during the next non-IRQ call to syslog. Which may be very late (or never happen), and the buffer maybe full by then.
   > What if every log to intbuffer was scheduling a flush in the near future automatically? Maybe by a watchdog, or similar facility? So we do not depend on log messages in unrelated parts of the code.
   
   You can't use watchdog here, because it still in the interrupt context, but you can schedule a hp/lop work.
   
   > 
   > This does not eliminate the issue, but greatly improves the situation. It reduces the chances of an overflow, and makes the system somewhat more predictable.
   
   The current implementation is fast one regarding the buffer overflow. Once OS switch to the normal thread context, the interrupt buffer will flush in the first syslog call.
   
   > It will also make it much easier to size the intbuffer correctly to ensure that it will never overflow (and I believe the buffer would be smaller too).
   > 
   > Overflowing during a crash will still be a problem, though...
   > 
   > > Good catch, the user may lose the notice. How about we bring back the notice but still output the log?
   > 
   > I would insist that writing to the log while in IRQ is not good.
   > I would definitely prefer the system to lose some logs, instead of losing its guarantees...
   > 
   > If your concern is a system crash, and only this, then maybe flushing should be allowed only then? And normal IRQs just truncate the logs?
   
   First, the driver writer should disable all log before production if he/she really care about the real time guarantee. Second, even he/she left some crtical log in the production, the volume should be very small and never trigger the flush operation. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org