You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@logging.apache.org by "Ralph Goers (Jira)" <ji...@apache.org> on 2021/03/05 17:06:00 UTC

[jira] [Comment Edited] (LOG4J2-2926) Application OUTAGE due to Unable to write to stream TCP

    [ https://issues.apache.org/jira/browse/LOG4J2-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17296183#comment-17296183 ] 

Ralph Goers edited comment on LOG4J2-2926 at 3/5/21, 5:05 PM:
--------------------------------------------------------------

Actually, the BerkleyDB change is directly related to this problem.  I had a similar problem the other day where we had network issues preventing new connections to Logstash in our dev environment. It turns out the existing connections the app had to Logstash didn't fail but if they had we would have encountered this exact problem. Any logs generated during the network outage would have been lost. Yes, you can use a Failover appender but then when things recover you have to somehow get those logs into the ELK stack.  It makes more sense to handle that automatically. But you are correct that perhaps it could be done more generically so as to wrap any appender so that logs are written to disk when a failure occurs and when things recover the accumulated logs are retrieved and written. 

I should point out that using ActiveMQ doesn't really solve the problem as ActiveMQ doesn't buffer messages in the client. They will fail to send just like the SocketAppender did. The messages have to be able to be buffered in the application to be able to prevent loss of data.

The point is this feature is exactly what the reporter is asking for.


was (Author: ralph.goers@dslextreme.com):
Actually, the BerkleyDB change is directly related to this problem.  I had a similar problem the other day where we had network issues preventing new connections to Logstash in our dev environment. It turns out the existing connections the app had to Logstash didn't fail but if they had we would have encountered this exact problem. Any logs generated during the network outage would have been lost. Yes, you can use a Failover appender but then when things recover you have to somehow get those logs into the ELK stack.  It makes more sense to handle that automatically. But you are correct that perhaps it could be done more generically so as to wrap any appender so that logs are written to disk when a failure occurs and when things recover the accumulated logs are retrieved and written. 

The point is this feature is exactly what the reporter is asking for.

> Application OUTAGE due to Unable to write to stream TCP
> -------------------------------------------------------
>
>                 Key: LOG4J2-2926
>                 URL: https://issues.apache.org/jira/browse/LOG4J2-2926
>             Project: Log4j 2
>          Issue Type: Bug
>          Components: Appenders
>    Affects Versions: 2.13.3
>         Environment: Mulesoft, Linux, ELK (hosted service on AWS)
>            Reporter: Kaushik Vankayala
>            Assignee: Ralph Goers
>            Priority: Major
>              Labels: SocketAppender, beginner
>             Fix For: 2.13.3
>
>
> Hi Team, we have recently encountered an outage in our PRODUCTION application. We have custom logging using log4j2 and the remote server was out of storage. We suspect we got the issue because of the same reason and the ERROR we faced is as below;
>  
> 2020-08-30 22:23:04,686 Log4j2-TF-17-AsyncLoggerConfig-9 ERROR Unable to write to stream TCP:[api-manager-2623b9734249246e.elb.ap-southeast-1.amazonaws.com|http://api-manager-2623b9734249246e.elb.ap-southeast-1.amazonaws.com/]:8500 for appender SOCKET org.apache.logging.log4j.core.appender.AppenderLoggingException: Error sending to TCP:[api-manager-2623b9734249246e.elb.ap-southeast-1.amazonaws.com|http://api-manager-2623b9734249246e.elb.ap-southeast-1.amazonaws.com/]:8500 for [api-manager-2623b9734249246e.elb.ap-southeast-1.amazonaws.com/52.221.23.118:8500|http://api-manager-2623b9734249246e.elb.ap-southeast-1.amazonaws.com/52.221.23.118:8500] at org.apache.logging.log4j.core.net.TcpSocketManager.write(TcpSocketManager.java:231) at org.apache.logging.log4j.core.appender.OutputStreamManager.write(OutputStreamManager.java:190) at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.writeByteArrayToManager(AbstractOutputStreamAppender.java:206) at org.apache.logging.log4j.core.appender.SocketAppender.directEncodeEvent(SocketAppender.java:459) at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.tryAppend(AbstractOutputStreamAppender.java:190) "http.listener.02 SelectorRunner" #76 prio=5 os_prio=0 tid=0x00007f314c52d800 nid=0xb19 waiting for monitor entry [0x00007f314a6fc000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.logging.log4j.core.async.AsyncLoggerConfigDisruptor.enqueue(AsyncLoggerConfigDisruptor.java:376) - waiting to lock <0x0000000088b43a58> (a java.lang.Object) at org.apache.logging.log4j.core.async.AsyncLoggerConfigDisruptor.enqueueEvent(AsyncLoggerConfigDisruptor.java:330) at org.apache.logging.log4j.core.async.AsyncLoggerConfig.logInBackgroundThread(AsyncLoggerConfig.java:159) at org.apache.logging.log4j.core.async.EventRoute$1.logMessage(EventRoute.java:46)
>  
> We tried to follow the link ([https://help.mulesoft.com/s/article/Mule-instance-which-implements-a-log4j2-SocketAppender-complains-with-Broken-Pipe-Error]).
>  
> Unlike splunk we have ELK in our architecture. Our Socket appender looks like below
>  
> {{<Socket name="SOCKET" host="${sys:tcp.host}" port="${sys:tcp.port}" reconnectDelayMillis="30000" immediateFail="false" bufferedIo="true" bufferSize="204800" protocol="TCP" immediateFlush="false">}}
>  
> We have couple of queries below if you could kindly address them;
>  # With the current Socket Appender what additional tags may be needed to independently stream the logs irrestive of the remote destination status 
>  # Our ELK server is a hosted servie. The first point after Cloudhub is a Load Balancer after which there is an EC2 server where Logstash is running. Do we need to configure any keep-alive configuration at the O/S level?
>  # Why should a storage issue at a remote destination cause an issue in the socket appender and eventually fail the running of an application. Logging by socket appender should ideally be an independent activithy.
> Finally, we would request you to recommend a solution for the case where the remote endpoint storage is exhausted or there may be any TCP sockets dead, and how we can avoid the OUTAGE of MuleSoft application due to a logging problem by Log4j2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)