You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Rayman (Jira)" <ji...@apache.org> on 2020/08/10 23:14:00 UTC

[jira] [Commented] (SAMZA-2577) Threads appending to StreamAppender block/deadlock in high tput scenarios, leading to processing stalls

    [ https://issues.apache.org/jira/browse/SAMZA-2577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175097#comment-17175097 ] 

Rayman commented on SAMZA-2577:
-------------------------------

Log4j2 fix: [https://github.com/apache/samza/pull/1411/files]

 

> Threads appending to StreamAppender block/deadlock in high tput scenarios, leading to processing stalls
> -------------------------------------------------------------------------------------------------------
>
>                 Key: SAMZA-2577
>                 URL: https://issues.apache.org/jira/browse/SAMZA-2577
>             Project: Samza
>          Issue Type: Bug
>            Reporter: Rayman
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Problem: 
>  In both StreamAppender for log4j1 and log4j2 a blocking queue is used to coordinate between the append()-ing threads and a single thread send()-ing to Kafka.
>  This is a bounded, blocking, lock-synchronized queue.
>  To avoid deadlock scenarios (see SAMZA-1537), the append()-ing threads have a timeout of 2 seconds, after which the log message is discarded and the queue is drained. 
>  This means in case of message bursts, threads calling append() may block for upto 2 seconds, and may continually be stuck in this pattern, leading to processing stalls and lowered throughput. 
> *Solutions for Log4j2* 
>  Solution 1. Enable async logger in log4j2, since they are supported and provided in log4j2.[https://logging.apache.org/log4j/2.x/manual/async.html].
>  In using this capability, the blocking-queue in StreamAppender is not required because the logger itself will be asynchronous, and so append() threads can directly call systemProducer.send(). 
>  However, if async loggers are not used then this queue based mechanism, to give the append()-ing threads an "async" illusion, is required.
> Solution 2. Continue using the blocking bounded lock-based queue, but make the queue size and timeout configurable. Users can then tune this to account for message bursts.
> Solution 3. Move to use a lock-less queue, e.g., ConcurrentLinkedQueue (unbounded) or 
>  implement a bounded lock-less queue, or use [open-source implementations|[https://stackoverflow.com/questions/20890554/lock-free-circular-array]].
>  Append()-ing threads will no longer need to block or timeout. However the caller may busy-wait or need a fixed-rate or fixed-sleep-time to avoid busy waits, since a lock-less queue is non blocking. 
>  It uses CAS operations. 
>  *For log4j2, we will adopt Solution 1.*
> *Solutions for Log4j1*
>  Solution 1. Deprecate – log4j1 is not supported. 
>  Solution 2. Similar to Solution 2 above.
>  Solution 3. Similar to Solution 3 above.
>  *For log4j1, we will adopt Solution 1 – won't fix.*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)