You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@logging.apache.org by "Ralph Goers (Jira)" <ji...@apache.org> on 2020/09/15 04:49:00 UTC
[jira] [Commented] (LOG4J2-2926) Application OUTAGE due to Unable to write to stream TCP

    [ https://issues.apache.org/jira/browse/LOG4J2-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17195872#comment-17195872 ] 

Ralph Goers commented on LOG4J2-2926:
-------------------------------------

# If you have multiple instances of your target (Logstash?) you can use a DNS entry that specifies all their IP addresses and Log4j will failover to another one. In a production environment you should always have more than one instance of everything for situations such as this. However, this assumes that Log4j is connecting directly to your target. I am not familiar with Cloudhub but a normal use case with a load balancer would be that it is sending to multiple Logstash instances. The load balance should be checking the health of the target and stop sending it traffic when it isn't responding. Of course, one would hope that Cloudhub is also not a single point of failure.
 # There is nothing you are going to be able to do to keep requests flowing to a server that isn't accepting data.
 # If the remote server stops processing requests then the OS is not going to let anymore flow because the are not being acknowledged. After some timeout the socket will fail as it has here. 

See the answer to item 1.

> Application OUTAGE due to Unable to write to stream TCP
> -------------------------------------------------------
>
>                 Key: LOG4J2-2926
>                 URL: https://issues.apache.org/jira/browse/LOG4J2-2926
>             Project: Log4j 2
>          Issue Type: Bug
>          Components: Appenders
>    Affects Versions: 2.13.3
>         Environment: Mulesoft, Linux, ELK (hosted service on AWS)
>            Reporter: Kaushik Vankayala
>            Priority: Major
>              Labels: SocketAppender, beginner
>             Fix For: 2.13.3
>
>
> Hi Team, we have recently encountered an outage in our PRODUCTION application. We have custom logging using log4j2 and the remote server was out of storage. We suspect we got the issue because of the same reason and the ERROR we faced is as below;
>  
> 2020-08-30 22:23:04,686 Log4j2-TF-17-AsyncLoggerConfig-9 ERROR Unable to write to stream TCP:[api-manager-2623b9734249246e.elb.ap-southeast-1.amazonaws.com|http://api-manager-2623b9734249246e.elb.ap-southeast-1.amazonaws.com/]:8500 for appender SOCKET org.apache.logging.log4j.core.appender.AppenderLoggingException: Error sending to TCP:[api-manager-2623b9734249246e.elb.ap-southeast-1.amazonaws.com|http://api-manager-2623b9734249246e.elb.ap-southeast-1.amazonaws.com/]:8500 for [api-manager-2623b9734249246e.elb.ap-southeast-1.amazonaws.com/52.221.23.118:8500|http://api-manager-2623b9734249246e.elb.ap-southeast-1.amazonaws.com/52.221.23.118:8500] at org.apache.logging.log4j.core.net.TcpSocketManager.write(TcpSocketManager.java:231) at org.apache.logging.log4j.core.appender.OutputStreamManager.write(OutputStreamManager.java:190) at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.writeByteArrayToManager(AbstractOutputStreamAppender.java:206) at org.apache.logging.log4j.core.appender.SocketAppender.directEncodeEvent(SocketAppender.java:459) at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.tryAppend(AbstractOutputStreamAppender.java:190) "http.listener.02 SelectorRunner" #76 prio=5 os_prio=0 tid=0x00007f314c52d800 nid=0xb19 waiting for monitor entry [0x00007f314a6fc000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.logging.log4j.core.async.AsyncLoggerConfigDisruptor.enqueue(AsyncLoggerConfigDisruptor.java:376) - waiting to lock <0x0000000088b43a58> (a java.lang.Object) at org.apache.logging.log4j.core.async.AsyncLoggerConfigDisruptor.enqueueEvent(AsyncLoggerConfigDisruptor.java:330) at org.apache.logging.log4j.core.async.AsyncLoggerConfig.logInBackgroundThread(AsyncLoggerConfig.java:159) at org.apache.logging.log4j.core.async.EventRoute$1.logMessage(EventRoute.java:46)
>  
> We tried to follow the link ([https://help.mulesoft.com/s/article/Mule-instance-which-implements-a-log4j2-SocketAppender-complains-with-Broken-Pipe-Error]).
>  
> Unlike splunk we have ELK in our architecture. Our Socket appender looks like below
>  
> {{<Socket name="SOCKET" host="${sys:tcp.host}" port="${sys:tcp.port}" reconnectDelayMillis="30000" immediateFail="false" bufferedIo="true" bufferSize="204800" protocol="TCP" immediateFlush="false">}}
>  
> We have couple of queries below if you could kindly address them;
>  # With the current Socket Appender what additional tags may be needed to independently stream the logs irrestive of the remote destination status 
>  # Our ELK server is a hosted servie. The first point after Cloudhub is a Load Balancer after which there is an EC2 server where Logstash is running. Do we need to configure any keep-alive configuration at the O/S level?
>  # Why should a storage issue at a remote destination cause an issue in the socket appender and eventually fail the running of an application. Logging by socket appender should ideally be an independent activithy.
> Finally, we would request you to recommend a solution for the case where the remote endpoint storage is exhausted or there may be any TCP sockets dead, and how we can avoid the OUTAGE of MuleSoft application due to a logging problem by Log4j2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)