You are viewing a plain text version of this content. The canonical link for it is here.
Posted to log4j-dev@logging.apache.org by "Remko Popma (JIRA)" <ji...@apache.org> on 2014/09/19 17:20:35 UTC

[jira] [Commented] (LOG4J2-506) Binary logging interface

    [ https://issues.apache.org/jira/browse/LOG4J2-506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140728#comment-14140728 ] 

Remko Popma commented on LOG4J2-506:
------------------------------------

Some more thoughts on the BinaryMessage API...

One of my design goals is to make it possible for binary logging to be garbage-free, and ideally copy-free. That is why I am not considering an API like this:
{code}
public interface BinaryMessage extends Message {
    byte[] getData(); // forces the implementation to create a new byte array when invoked
}
{code}

I like this better:  
*Option 1*
{code}
// it may be possible to avoid creating new objects and avoid copying data
// if the Appender (perhaps via the Layout) passes its underlying buffer directly to the {@code #writeInto} method.
public interface BinaryMessage extends Message {
    int getLength(); // return byte count
    void writeTo(ByteBuffer buffer); // write length bytes from the current buffer position
}
{code}

The caller can query the message for the message length. If the caller's buffer has enough remaining capacity, it can just pass this buffer to the message to be written to. Otherwise the caller can increase its buffer capacity. For MemoryMappedFile this would mean remapping to a new region. For a RandomAccessFile appender this would mean either flush()ing if emptying the buffer is sufficient, and otherwise, if the message is very large, the appender could discard its current buffer and create a new one large enough to hold this message.

This API is convenient for the caller (Log4J Appenders and Layouts), but puts more responsibility on the implementor:
If the binary format changes, both the writeTo() method and the getLength() method must be updated. Also, getLength() may exceed the actual length, but must not return too small a value: if the BinaryMessage implementation class writes more bytes to the buffer than its length indicates, an IndexOutOfBounds exception is thrown.

*Option 2*
An alternative is to pass a wrapper object to the message instead of a raw ByteBuffer. This wrapper would take care of boundary checking and any action necessary when crossing a buffer boundary. We could define a custom wrapper interface, or we could use java.io.DataOutput:
{code}
public interface BinaryMessage extends Message {
    void writeTo(DataOutput sink);
}
{code}

This is more convenient for the implementor (only one method to maintain), and safer: underestimating the length is no longer a problem. Not exposing the underlying ByteBuffer directly to the Message is also perhaps the safer thing to do. TBD is there any performance impact?

Some implications of using DataOutput:
* The contract of DataOutput is much more strictly defined than ByteBuffer. DataOutput writes binary data in BigEndian byte order. This is not necessarily a problem, but may cause some slight performance loss on Intel chips since they are LittleEndian. 
* For floats and doubles, the DataOutput javadoc says it writes the data as Float.floatToIntBits and Double.doubleToLongBits, but ByteBuffer will write as Float.floatToRawIntBits and Double.doubleToRawLongBits, preserving NaN values. So a ByteBuffer implementation of DataOutput would not follow the contract exactly. I doubt this will be a problem.


> Binary logging interface
> ------------------------
>
>                 Key: LOG4J2-506
>                 URL: https://issues.apache.org/jira/browse/LOG4J2-506
>             Project: Log4j 2
>          Issue Type: New Feature
>          Components: Core, Layouts
>            Reporter: Gregg Donovan
>            Assignee: Remko Popma
>            Priority: Minor
>              Labels: binary
>
> From [discussion|https://groups.google.com/forum/#!topic/mechanical-sympathy/7HdRCacJCDk] with Remko Pompa on the mechanical-sympathy list:
> {quote}We -- Etsy.com -- have a request logging system that we've begun to outgrow due to increased log volume and increased size of the messages we need to log. The messages are Thrift-encoded search requests from 1-100K in size. We use the logs to feed our homegrown traffic replay tools for offline performance tests and to reproduce issues found in production (GC, perf, etc.).
> For our next gen request logging system we would like the following:
> Low contention -- Minimal lock contention despite log writes from many (10s to 100s) application threads.
> Async writes -- File I/O should be outside of the actual request path.
> Binary logging -- Our current system base64 encodes the payload as a String and logs it via log4j 1.2. ;-) This creates more garbage than we would like and makes the data written twice as large. It was good enough for a while, but now we need something more efficient...
> Compression -- Either per-message or per block of messages would be great. Incurring large periodic CPU penalties from logrotate + gzip, as we do now, is less than ideal.
> Rotation -- Ideally based on file size, but time-based would also do.
> {quote}
> Remko:
> {quote}
> You can do this with log4j-2.0. The current version (log4j-2.0-beta-9) out of the box gives you 3 out of 5: Async Loggers use the Disruptor under the hood -> low contention & async writes, and RollingRandomAccessFileAppender (the fastest appender) can do rotation. This appender can also optionally do compression in a separate thread, but this happens on roll-over only, not continuously. (Still, I would guess that this is more CPU-efficient than using logrotate + gzip in a separate process...)
> The binary logging can be achieved with a bit of custom code.
> First, your objects would need to provide some way to turn themselves into bytes. I doubt that you want to use java standard serialization :-), so how about this as an interface to log4j that your objects would implement:
> {quote}
> {code:java}
> public interface BinaryMessage extends org.apache.logging.log4j.message.Message {
>     int getLength(); // return byte count
>     void writeInto(ByteBuffer buffer); // write length bytes from the current buffer position
> }
> {code}
> {quote}
> Log4j2 Loggers already have methods that take a Message argument (instead of a String), so in your application code you could just write 
> {quote}
> {code:java}
> Logger logger = LogManager.getLogger(MyClass.class); // usually cached in static field
> logger.info(myObject); // where myObject implements BinaryMessage
> {code}
> {quote}
> Second, in the log4j2 framework, the Layout is responsible for converting a LogEvent into bytes, so you need a BinaryLayout that recognizes BinaryMessages. The layout could optionally output the timestamp (long) and log level (byte), followed by the message size (int) and the message bytes (byte[]).
> Using the log4j2 plugin model, an implementation could look something like below.
> {quote}
> {code:java}
> @Plugin(name = "BinaryLayout", category = "Core", elementType = "layout", printObject = true)
> public final class BinaryLayout extends AbstractLayout<LogEvent> {
>     private static enum TimestampStrategy {
>         WRITE {
>             public void write(LogEvent event, ByteBuffer buffer) {
>                 buffer.putLong(event.getMillis());
>             }
>         },
>         DONT {
>             public void write(LogEvent event, ByteBuffer buffer) {
>             }
>         };
>         abstract void write(LogEvent event, ByteBuffer buffer);
>     }
>     private static enum LevelStrategy {
>         WRITE {
>             public void write(LogEvent event, ByteBuffer buffer) {
>                 buffer.put((byte) event.getLevel().intLevel());
>             }
>         },
>         DONT {
>             public void write(LogEvent event, ByteBuffer buffer) {
>             }
>         };
>         abstract void write(LogEvent event, ByteBuffer buffer);
>     }
>     private final TimestampStrategy timestamp;
>     private final LevelStrategy level;
>     private final int prefixLength;
>     public BinaryLayout(boolean timestamp, boolean level) {
>         this.timestamp = timestamp ? TimestampStrategy.WRITE : TimestampStrategy.DONT;
>         this.level = level ? LevelStrategy.WRITE : LevelStrategy.DONT;
>         final int stampBytes = timestamp ? 8 : 0;
>         final int levelBytes = level ? 1 : 0;
>         prefixLength = 4 + stampBytes + levelBytes;
>     }
>     @Override
>     public byte[] toByteArray(LogEvent event) {
>         final BinaryMessage msg = (BinaryMessage) event.getMessage();
>         final byte[] result = new byte[msg.getLength() + prefixLength];
>         final ByteBuffer buffer = ByteBuffer.wrap(result);
>         timestamp.write(event, buffer);
>         level.write(event, buffer);
>         buffer.putInt(msg.getLength());
>         msg.writeInto(buffer);
>         return result;
>     }
>     @Override
>     public LogEvent toSerializable(LogEvent event) {
>         return event;
>     }
>     @Override
>     public String getContentType() {
>         return "application/octet-stream";
>     }
>     @Override
>     public Map<String, String> getContentFormat() {
>         return new HashMap<String, String>();
>     }
>     @PluginFactory
>     public static BinaryLayout createLayout(
>             @PluginAttribute("writeTimestamp") final String writeTimestamp,
>             @PluginAttribute("writeLevel") final String writeLevel) {
>         final boolean timestamp = Booleans.parseBoolean(writeTimestamp, false);
>         final boolean level = Booleans.parseBoolean(writeLevel, false);
>         return new BinaryLayout(timestamp, level);
>     }
> }
> {code}
> {quote}
> Unfortunately the log4j API does not allow us to be completely garbage-free, you still need to allocate the byte[] array and ByteBuffer for each event...
> (The LogEvents are reused Disruptor ring buffer slots if you make all loggers AsyncLoggers.)
> Still, this should be a lot better (garbage-wise) than base64-encoding a String, and then turning that String into bytes again...
> Here is the log4j2.xml configuration that would put it all together (modify the "packages" attribute at the top to match your package):
> (Don't forget to set system property -DLog4jContextSelector=org.apache.logging.log4j.core.async.AsyncLoggerContextSelector to make all loggers AsyncLoggers.)
> {quote}
> {code:xml}
> <?xml version="1.0" encoding="UTF-8"?>
> <Configuration status="WARN" packages="package.of.binary.layout.class">
>   <Appenders>
>     <RollingRandomAccessFile name="rraf" fileName="binary.log"
>                  filePattern="logs/$${date:yyyy-MM}/app-%d{MM-dd-yyyy}-%i.log.gz"
>                  append="false" immediateFlush="false">
>       <BinaryLayout writeTimestamp="true" writeLevel="true" />
>       <Policies>
>         <TimeBasedTriggeringPolicy />
>         <SizeBasedTriggeringPolicy size="250 MB"/>
>       </Policies>
>     </RollingRandomAccessFile>
>   </Appenders>
>   
>   <Loggers>
>     <Root level="trace" includeLocation="false">
>       <AppenderRef ref="rraf"/>
>     </Root>
>   </Loggers>
> </Configuration>
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-dev-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-dev-help@logging.apache.org