You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@logging.apache.org by Volkan Yazıcı <vo...@gmail.com> on 2020/09/24 15:00:31 UTC

Batching in appenders

If one would look close to existing appenders, almost everyone of them
implements a custom batching solution. To the best of my knowledge, this is
mainly due to the constraint in the Appender interface:

    void append(LogEvent event);

If this would have been replaced (enhanced?) with

    void append(List<LogEvent> event);

appenders wouldn't need to implement any batching at all. Further, Log4j
could have also provided a batching filter replacing all the custom
batching code internal to the appenders.

What do you think? Am I missing something obvious here?

Re: Batching in appenders

Posted by Ralph Goers <ra...@dslextreme.com>.
Yeah, I forgot about the async logger ring buffer.

Ralph

> On Sep 24, 2020, at 8:56 AM, Carter Kozak <ck...@ckozak.net> wrote:
> 
> The AsyncAppender more or less implements what you've described, except it forwards each event in the batch individually to a delegate appender and sets the end-of-batch marker on the last one so the API isn't quite as pretty. We could implement something similar that groups events up to some interval or maximum capacity before forwarding them, but we'd have to be careful to avoid creating garbage collection problems. Potentially holding otherwise short lived parameters in the background can have unintended impact on garbage collector performance :-)
> 
> The LogEvent.isEndOfBatch method is designed to be sympathetic to the ring buffer design, where we never create an intermediate collection, but instead signal when an event is the last in a group that we're aware of.
> 
> Batching comes up fairly frequently for network and database appenders, I think we've duplicated that code a few times. I'd be supportive of a canonical implementation (or reusable utility functionality) if we can extract it out.
> 
> -ck
> 
> On Thu, Sep 24, 2020, at 11:46, Ralph Goers wrote:
>> Well, Log4j only feeds events to Appenders one at a time. It has no place to aggregate them. However, one Appender could wrap others to create a batch at which point this would become useful.  I suppose some other component could be invented to create a batch but I am not sure where or how that would fit into the process -   does the batch apply only to a single appender or to multiple appenders?  What filtering takes place before it is added to the batch and what, if any, happens after?
>> 
>> Ralph
>> 
>>> On Sep 24, 2020, at 8:00 AM, Volkan Yazıcı <vo...@gmail.com> wrote:
>>> 
>>> If one would look close to existing appenders, almost everyone of them
>>> implements a custom batching solution. To the best of my knowledge, this is
>>> mainly due to the constraint in the Appender interface:
>>> 
>>>   void append(LogEvent event);
>>> 
>>> If this would have been replaced (enhanced?) with
>>> 
>>>   void append(List<LogEvent> event);
>>> 
>>> appenders wouldn't need to implement any batching at all. Further, Log4j
>>> could have also provided a batching filter replacing all the custom
>>> batching code internal to the appenders.
>>> 
>>> What do you think? Am I missing something obvious here?
>> 
>> 
>> 



Re: Batching in appenders

Posted by Carter Kozak <ck...@ckozak.net>.
We don't provide a great reusable abstraction to batch log events and hand them off to a sink (appender). If we're writing to disk (or similar) this isn't a problem because we buffer at an entirely different layer -- using a RandomAccessFile appender, after passing through the layout, my log events are batched in an output buffer until either the buffer is filled or an event with endOfBatch=true is received (note: this requires immediateFlush=false). When an appender is capable of pushing data to a more efficient format, it's overall less work and less risk for the system to take advantage of that.
For example, when I log an event, the retained heap for that LogEvent is relatively large and includes the message string as well as all parameter references which may not be held elsewhere. By passing the LogEvent through a layout we are left with fewer total bytes to track: all parameter references are gone, old the formatted values if requested, contextData and contextStack may not have been used at all.

> Regarding your remark about "being sympathetic to the ring buffer design",
> I did not fully understand this. Would you mind elaborating on this a
> little bit more and maybe even sharing a comparison between this and the
> aforementioned explicit batching at the interface level, please?

Of course :-) We use the lmax disruptor for high performance, low latency, asynchronous logging. It takes advantage of several interesting performance optimizations (allocation avoidance, but also cache line optimization to avoid contention between producers and consumers in most cases) that allow substantially greater throughput than a BlockingQueue model. This difference is best illustrated by our benchmarks between fully async loggers (AsyncLoggerContextSelector) compared to AsyncAppender (ArrayBlockingQueue with a consumer thread).
In order to avoid creating intermediate objects (collections of events), disruptor uses an end-of-batch flag to suggest to inform consumers that there are no more events in the queue: 
https://lmax-exchange.github.io/disruptor/docs/com/lmax/disruptor/EventHandler.html
Our LogEvent endOfBatch marker is based on that, it does not allow us to enforce an upper bound, but it does allow us to take moments when load has receded to opportunistically flush appenders.

Written quickly, hopefully there aren't terribly many typos! I'm still settling after moving and things are still a little hectic, sorry for the recent lack of engagement.

TLDR: I think the idea that I've danced around is that batching based on the number of events may not give us enough data. One large event may be too large for a batch, and a thousand tiny events may be too small, sometimes the appender knows best. Sometimes it doesn't ;-)

-ck

On Thu, Sep 24, 2020, at 14:48, Volkan Yazıcı wrote:
> I think endOfBatch flag is still falling short of addressing the main
> issue: the appender is oblivious to the batching. Put another way, it
> cannot have an assumption whether endOfBatch is employed or not. Consider
> appender A honoring endOfBatch flag, though the preceding filter F never
> takes advantage of it, hence A will keep on piling events until an OOM
> exception. This is my main motivation for introducing batching explicitly
> at the appender interface. For instance,
> 
>     interface Appender {
>         default void append(LogEvent logEvent) {
> append(Batch.of(logEvent)); }    // for backward-compatibility
>         void append(Batch<LogEvent> batch);
>     }
> 
>     interface Batch<E> {
>         void forEach(Consumer<E> item);
>     }
> 
> This way we can introduce reusable (and hence, mutable) Batch<E>
> implementations, while the appender is free to perform its own batching
> magic tuned for the underlying sink, e.g., JDBC connection, network socket,
> etc.
> 
> Regarding your remark about "being sympathetic to the ring buffer design",
> I did not fully understand this. Would you mind elaborating on this a
> little bit more and maybe even sharing a comparison between this and the
> aforementioned explicit batching at the interface level, please?
> 
> In conclusion, IMHO, making batching/aggregation explicit at the interface
> level will help us to avoid quite some code repetition and make the life
> easier for future appenders.
> 
> On Thu, Sep 24, 2020 at 5:56 PM Carter Kozak <ck...@ckozak.net> wrote:
> 
> > The AsyncAppender more or less implements what you've described, except it
> > forwards each event in the batch individually to a delegate appender and
> > sets the end-of-batch marker on the last one so the API isn't quite as
> > pretty. We could implement something similar that groups events up to some
> > interval or maximum capacity before forwarding them, but we'd have to be
> > careful to avoid creating garbage collection problems. Potentially holding
> > otherwise short lived parameters in the background can have unintended
> > impact on garbage collector performance :-)
> >
> > The LogEvent.isEndOfBatch method is designed to be sympathetic to the ring
> > buffer design, where we never create an intermediate collection, but
> > instead signal when an event is the last in a group that we're aware of.
> >
> > Batching comes up fairly frequently for network and database appenders, I
> > think we've duplicated that code a few times. I'd be supportive of a
> > canonical implementation (or reusable utility functionality) if we can
> > extract it out.
> >
> > -ck
> >
> > On Thu, Sep 24, 2020, at 11:46, Ralph Goers wrote:
> > > Well, Log4j only feeds events to Appenders one at a time. It has no
> > place to aggregate them. However, one Appender could wrap others to create
> > a batch at which point this would become useful.  I suppose some other
> > component could be invented to create a batch but I am not sure where or
> > how that would fit into the process -   does the batch apply only to a
> > single appender or to multiple appenders?  What filtering takes place
> > before it is added to the batch and what, if any, happens after?
> > >
> > > Ralph
> > >
> > > > On Sep 24, 2020, at 8:00 AM, Volkan Yazıcı <vo...@gmail.com>
> > wrote:
> > > >
> > > > If one would look close to existing appenders, almost everyone of them
> > > > implements a custom batching solution. To the best of my knowledge,
> > this is
> > > > mainly due to the constraint in the Appender interface:
> > > >
> > > >    void append(LogEvent event);
> > > >
> > > > If this would have been replaced (enhanced?) with
> > > >
> > > >    void append(List<LogEvent> event);
> > > >
> > > > appenders wouldn't need to implement any batching at all. Further,
> > Log4j
> > > > could have also provided a batching filter replacing all the custom
> > > > batching code internal to the appenders.
> > > >
> > > > What do you think? Am I missing something obvious here?
> > >
> > >
> > >
> >
> 

Re: Batching in appenders

Posted by Ralph Goers <ra...@dslextreme.com>.
The more I think about this the more questions I have. As far as I can recall the Appenders that support batching do so because there is a distinct performance benefit to it and the underlying transport mechanism supports it.  But almost every different transport handles batching differently. For example, the Flume Appender creates a batch of Flume Events, the database Appenders create a batch of rows to insert and the Socket appender just uses a buffer that needs to be full before any I/O takes place. In other words, what they are creating a batch of is events that have already had their Layout applied. So I don’t see how you can create a way to create batches before the Appender that will help in any meaningful way.

Ralph

Re: Batching in appenders

Posted by Ralph Goers <ra...@dslextreme.com>.

> On Sep 24, 2020, at 11:48 AM, Volkan Yazıcı <vo...@gmail.com> wrote:
> 
> I think endOfBatch flag is still falling short of addressing the main
> issue: the appender is oblivious to the batching. Put another way, it
> cannot have an assumption whether endOfBatch is employed or not. Consider
> appender A honoring endOfBatch flag, though the preceding filter F never
> takes advantage of it, hence A will keep on piling events until an OOM
> exception. This is my main motivation for introducing batching explicitly
> at the appender interface. For instance,

I’m not following this.  A Filter analyzes a single event, much as a firewall analyzes a single packet. If it rejects the event the Appender should discard it so that wouldn’t generate an OOM.  OTOH, batching events before they hit Appenders could waste memory but that would primarily be because you are trying to create a batch of events for ALL Appenders to process. One might reject some of the events and a different Appender might reject a different set.



> 
>    interface Appender {
>        default void append(LogEvent logEvent) {
> append(Batch.of(logEvent)); }    // for backward-compatibility
>        void append(Batch<LogEvent> batch);
>    }
> 
>    interface Batch<E> {
>        void forEach(Consumer<E> item);
>    }
> 
> This way we can introduce reusable (and hence, mutable) Batch<E>
> implementations, while the appender is free to perform its own batching
> magic tuned for the underlying sink, e.g., JDBC connection, network socket,
> etc.
> 
> Regarding your remark about "being sympathetic to the ring buffer design",
> I did not fully understand this. Would you mind elaborating on this a
> little bit more and maybe even sharing a comparison between this and the
> aforementioned explicit batching at the interface level, please?
> 
> In conclusion, IMHO, making batching/aggregation explicit at the interface
> level will help us to avoid quite some code repetition and make the life
> easier for future appenders.


Synchronous loggers deliver events 1 by 1. It makes no sense to “batch” them because they then become asynchronous which is what the async Loggers do. That is why Ring Buffers were mentioned. But all the ring buffer really does is act as a FIFO queue to process the events on a different thread. It still follows (more or less) the same flow that synchronous events do. Processing them in batches would make things a bit more complicated as each event needs to be valued by Filters on the LoggerConfig and Appender-Ref before being passed to Appenders. Does that mean Filters now have to accept batches? If so, how do they return the Filter.Result for each event?

Ralph

Re: Batching in appenders

Posted by Volkan Yazıcı <vo...@gmail.com>.
I think endOfBatch flag is still falling short of addressing the main
issue: the appender is oblivious to the batching. Put another way, it
cannot have an assumption whether endOfBatch is employed or not. Consider
appender A honoring endOfBatch flag, though the preceding filter F never
takes advantage of it, hence A will keep on piling events until an OOM
exception. This is my main motivation for introducing batching explicitly
at the appender interface. For instance,

    interface Appender {
        default void append(LogEvent logEvent) {
append(Batch.of(logEvent)); }    // for backward-compatibility
        void append(Batch<LogEvent> batch);
    }

    interface Batch<E> {
        void forEach(Consumer<E> item);
    }

This way we can introduce reusable (and hence, mutable) Batch<E>
implementations, while the appender is free to perform its own batching
magic tuned for the underlying sink, e.g., JDBC connection, network socket,
etc.

Regarding your remark about "being sympathetic to the ring buffer design",
I did not fully understand this. Would you mind elaborating on this a
little bit more and maybe even sharing a comparison between this and the
aforementioned explicit batching at the interface level, please?

In conclusion, IMHO, making batching/aggregation explicit at the interface
level will help us to avoid quite some code repetition and make the life
easier for future appenders.

On Thu, Sep 24, 2020 at 5:56 PM Carter Kozak <ck...@ckozak.net> wrote:

> The AsyncAppender more or less implements what you've described, except it
> forwards each event in the batch individually to a delegate appender and
> sets the end-of-batch marker on the last one so the API isn't quite as
> pretty. We could implement something similar that groups events up to some
> interval or maximum capacity before forwarding them, but we'd have to be
> careful to avoid creating garbage collection problems. Potentially holding
> otherwise short lived parameters in the background can have unintended
> impact on garbage collector performance :-)
>
> The LogEvent.isEndOfBatch method is designed to be sympathetic to the ring
> buffer design, where we never create an intermediate collection, but
> instead signal when an event is the last in a group that we're aware of.
>
> Batching comes up fairly frequently for network and database appenders, I
> think we've duplicated that code a few times. I'd be supportive of a
> canonical implementation (or reusable utility functionality) if we can
> extract it out.
>
> -ck
>
> On Thu, Sep 24, 2020, at 11:46, Ralph Goers wrote:
> > Well, Log4j only feeds events to Appenders one at a time. It has no
> place to aggregate them. However, one Appender could wrap others to create
> a batch at which point this would become useful.  I suppose some other
> component could be invented to create a batch but I am not sure where or
> how that would fit into the process -   does the batch apply only to a
> single appender or to multiple appenders?  What filtering takes place
> before it is added to the batch and what, if any, happens after?
> >
> > Ralph
> >
> > > On Sep 24, 2020, at 8:00 AM, Volkan Yazıcı <vo...@gmail.com>
> wrote:
> > >
> > > If one would look close to existing appenders, almost everyone of them
> > > implements a custom batching solution. To the best of my knowledge,
> this is
> > > mainly due to the constraint in the Appender interface:
> > >
> > >    void append(LogEvent event);
> > >
> > > If this would have been replaced (enhanced?) with
> > >
> > >    void append(List<LogEvent> event);
> > >
> > > appenders wouldn't need to implement any batching at all. Further,
> Log4j
> > > could have also provided a batching filter replacing all the custom
> > > batching code internal to the appenders.
> > >
> > > What do you think? Am I missing something obvious here?
> >
> >
> >
>

Re: Batching in appenders

Posted by Carter Kozak <ck...@ckozak.net>.
The AsyncAppender more or less implements what you've described, except it forwards each event in the batch individually to a delegate appender and sets the end-of-batch marker on the last one so the API isn't quite as pretty. We could implement something similar that groups events up to some interval or maximum capacity before forwarding them, but we'd have to be careful to avoid creating garbage collection problems. Potentially holding otherwise short lived parameters in the background can have unintended impact on garbage collector performance :-)

The LogEvent.isEndOfBatch method is designed to be sympathetic to the ring buffer design, where we never create an intermediate collection, but instead signal when an event is the last in a group that we're aware of.

Batching comes up fairly frequently for network and database appenders, I think we've duplicated that code a few times. I'd be supportive of a canonical implementation (or reusable utility functionality) if we can extract it out.

-ck

On Thu, Sep 24, 2020, at 11:46, Ralph Goers wrote:
> Well, Log4j only feeds events to Appenders one at a time. It has no place to aggregate them. However, one Appender could wrap others to create a batch at which point this would become useful.  I suppose some other component could be invented to create a batch but I am not sure where or how that would fit into the process -   does the batch apply only to a single appender or to multiple appenders?  What filtering takes place before it is added to the batch and what, if any, happens after?
> 
> Ralph
> 
> > On Sep 24, 2020, at 8:00 AM, Volkan Yazıcı <vo...@gmail.com> wrote:
> > 
> > If one would look close to existing appenders, almost everyone of them
> > implements a custom batching solution. To the best of my knowledge, this is
> > mainly due to the constraint in the Appender interface:
> > 
> >    void append(LogEvent event);
> > 
> > If this would have been replaced (enhanced?) with
> > 
> >    void append(List<LogEvent> event);
> > 
> > appenders wouldn't need to implement any batching at all. Further, Log4j
> > could have also provided a batching filter replacing all the custom
> > batching code internal to the appenders.
> > 
> > What do you think? Am I missing something obvious here?
> 
> 
> 

Re: Batching in appenders

Posted by Ralph Goers <ra...@dslextreme.com>.
Well, Log4j only feeds events to Appenders one at a time. It has no place to aggregate them. However, one Appender could wrap others to create a batch at which point this would become useful.  I suppose some other component could be invented to create a batch but I am not sure where or how that would fit into the process -   does the batch apply only to a single appender or to multiple appenders?  What filtering takes place before it is added to the batch and what, if any, happens after?

Ralph

> On Sep 24, 2020, at 8:00 AM, Volkan Yazıcı <vo...@gmail.com> wrote:
> 
> If one would look close to existing appenders, almost everyone of them
> implements a custom batching solution. To the best of my knowledge, this is
> mainly due to the constraint in the Appender interface:
> 
>    void append(LogEvent event);
> 
> If this would have been replaced (enhanced?) with
> 
>    void append(List<LogEvent> event);
> 
> appenders wouldn't need to implement any batching at all. Further, Log4j
> could have also provided a batching filter replacing all the custom
> batching code internal to the appenders.
> 
> What do you think? Am I missing something obvious here?