You are viewing a plain text version of this content. The canonical link for it is here.
Posted to log4j-dev@logging.apache.org by "Gregg Donovan (JIRA)" <ji...@apache.org> on 2014/01/21 17:39:19 UTC

[jira] [Updated] (LOG4J2-506) Binary logging interface

     [ https://issues.apache.org/jira/browse/LOG4J2-506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gregg Donovan updated LOG4J2-506:
---------------------------------

    Description: 
>From [discussion|https://groups.google.com/forum/#!topic/mechanical-sympathy/7HdRCacJCDk] with Remko Pompa on the mechanical-sympathy list:

{quote}We -- Etsy.com -- have a request logging system that we've begun to outgrow due to increased log volume and increased size of the messages we need to log. The messages are Thrift-encoded search requests from 1-100K in size. We use the logs to feed our homegrown traffic replay tools for offline performance tests and to reproduce issues found in production (GC, perf, etc.).

For our next gen request logging system we would like the following:

Low contention -- Minimal lock contention despite log writes from many (10s to 100s) application threads.
Async writes -- File I/O should be outside of the actual request path.
Binary logging -- Our current system base64 encodes the payload as a String and logs it via log4j 1.2. ;-) This creates more garbage than we would like and makes the data written twice as large. It was good enough for a while, but now we need something more efficient...
Compression -- Either per-message or per block of messages would be great. Incurring large periodic CPU penalties from logrotate + gzip, as we do now, is less than ideal.
Rotation -- Ideally based on file size, but time-based would also do.
{quote}

Remko:
{quote}
You can do this with log4j-2.0. The current version (log4j-2.0-beta-9) out of the box gives you 3 out of 5: Async Loggers use the Disruptor under the hood -> low contention & async writes, and RollingRandomAccessFileAppender (the fastest appender) can do rotation. This appender can also optionally do compression in a separate thread, but this happens on roll-over only, not continuously. (Still, I would guess that this is more CPU-efficient than using logrotate + gzip in a separate process...)

The binary logging can be achieved with a bit of custom code.

First, your objects would need to provide some way to turn themselves into bytes. I doubt that you want to use java standard serialization :-), so how about this as an interface to log4j that your objects would implement:
{quote}

{code:java}
public interface BinaryMessage extends org.apache.logging.log4j.message.Message {
    int getLength(); // return byte count
    void writeInto(ByteBuffer buffer); // write length bytes from the current buffer position
}
{code}

{quote}
Log4j2 Loggers already have methods that take a Message argument (instead of a String), so in your application code you could just write 
{quote}
{code:java}
Logger logger = LogManager.getLogger(MyClass.class); // usually cached in static field
logger.info(myObject); // where myObject implements BinaryMessage
{code}

{quote}
Second, in the log4j2 framework, the Layout is responsible for converting a LogEvent into bytes, so you need a BinaryLayout that recognizes BinaryMessages. The layout could optionally output the timestamp (long) and log level (byte), followed by the message size (int) and the message bytes (byte[]).
Using the log4j2 plugin model, an implementation could look something like below.
{quote}

{code:java}
@Plugin(name = "BinaryLayout", category = "Core", elementType = "layout", printObject = true)
public final class BinaryLayout extends AbstractLayout<LogEvent> {
    private static enum TimestampStrategy {
        WRITE {
            public void write(LogEvent event, ByteBuffer buffer) {
                buffer.putLong(event.getMillis());
            }
        },
        DONT {
            public void write(LogEvent event, ByteBuffer buffer) {
            }
        };
        abstract void write(LogEvent event, ByteBuffer buffer);
    }
    private static enum LevelStrategy {
        WRITE {
            public void write(LogEvent event, ByteBuffer buffer) {
                buffer.put((byte) event.getLevel().intLevel());
            }
        },
        DONT {
            public void write(LogEvent event, ByteBuffer buffer) {
            }
        };
        abstract void write(LogEvent event, ByteBuffer buffer);
    }
    private final TimestampStrategy timestamp;
    private final LevelStrategy level;
    private final int prefixLength;

    public BinaryLayout(boolean timestamp, boolean level) {
        this.timestamp = timestamp ? TimestampStrategy.WRITE : TimestampStrategy.DONT;
        this.level = level ? LevelStrategy.WRITE : LevelStrategy.DONT;
        final int stampBytes = timestamp ? 8 : 0;
        final int levelBytes = level ? 1 : 0;
        prefixLength = 4 + stampBytes + levelBytes;
    }

    @Override
    public byte[] toByteArray(LogEvent event) {
        final BinaryMessage msg = (BinaryMessage) event.getMessage();
        final byte[] result = new byte[msg.getLength() + prefixLength];
        final ByteBuffer buffer = ByteBuffer.wrap(result);
        timestamp.write(event, buffer);
        level.write(event, buffer);
        buffer.putInt(msg.getLength());
        msg.writeInto(buffer);
        return result;
    }
    @Override
    public LogEvent toSerializable(LogEvent event) {
        return event;
    }
    @Override
    public String getContentType() {
        return "application/octet-stream";
    }
    @Override
    public Map<String, String> getContentFormat() {
        return new HashMap<String, String>();
    }
    @PluginFactory
    public static BinaryLayout createLayout(
            @PluginAttribute("writeTimestamp") final String writeTimestamp,
            @PluginAttribute("writeLevel") final String writeLevel) {
        final boolean timestamp = Booleans.parseBoolean(writeTimestamp, false);
        final boolean level = Booleans.parseBoolean(writeLevel, false);
        return new BinaryLayout(timestamp, level);
    }
}
{code}

{quote}
Unfortunately the log4j API does not allow us to be completely garbage-free, you still need to allocate the byte[] array and ByteBuffer for each event...
(The LogEvents are reused Disruptor ring buffer slots if you make all loggers AsyncLoggers.)
Still, this should be a lot better (garbage-wise) than base64-encoding a String, and then turning that String into bytes again...

Here is the log4j2.xml configuration that would put it all together (modify the "packages" attribute at the top to match your package):
(Don't forget to set system property -DLog4jContextSelector=org.apache.logging.log4j.core.async.AsyncLoggerContextSelector to make all loggers AsyncLoggers.)
{quote}

{code:xml}
<?xml version="1.0" encoding="UTF-8"?>
<Configuration status="WARN" packages="package.of.binary.layout.class">
  <Appenders>
    <RollingRandomAccessFile name="rraf" fileName="binary.log"
                 filePattern="logs/$${date:yyyy-MM}/app-%d{MM-dd-yyyy}-%i.log.gz"
                 append="false" immediateFlush="false">
      <BinaryLayout writeTimestamp="true" writeLevel="true" />
      <Policies>
        <TimeBasedTriggeringPolicy />
        <SizeBasedTriggeringPolicy size="250 MB"/>
      </Policies>
    </RollingRandomAccessFile>
  </Appenders>
  
  <Loggers>
    <Root level="trace" includeLocation="false">
      <AppenderRef ref="rraf"/>
    </Root>
  </Loggers>
</Configuration>
{code}

  was:From discussion with 


> Binary logging interface
> ------------------------
>
>                 Key: LOG4J2-506
>                 URL: https://issues.apache.org/jira/browse/LOG4J2-506
>             Project: Log4j 2
>          Issue Type: New Feature
>          Components: Core, Layouts
>            Reporter: Gregg Donovan
>            Priority: Minor
>
> From [discussion|https://groups.google.com/forum/#!topic/mechanical-sympathy/7HdRCacJCDk] with Remko Pompa on the mechanical-sympathy list:
> {quote}We -- Etsy.com -- have a request logging system that we've begun to outgrow due to increased log volume and increased size of the messages we need to log. The messages are Thrift-encoded search requests from 1-100K in size. We use the logs to feed our homegrown traffic replay tools for offline performance tests and to reproduce issues found in production (GC, perf, etc.).
> For our next gen request logging system we would like the following:
> Low contention -- Minimal lock contention despite log writes from many (10s to 100s) application threads.
> Async writes -- File I/O should be outside of the actual request path.
> Binary logging -- Our current system base64 encodes the payload as a String and logs it via log4j 1.2. ;-) This creates more garbage than we would like and makes the data written twice as large. It was good enough for a while, but now we need something more efficient...
> Compression -- Either per-message or per block of messages would be great. Incurring large periodic CPU penalties from logrotate + gzip, as we do now, is less than ideal.
> Rotation -- Ideally based on file size, but time-based would also do.
> {quote}
> Remko:
> {quote}
> You can do this with log4j-2.0. The current version (log4j-2.0-beta-9) out of the box gives you 3 out of 5: Async Loggers use the Disruptor under the hood -> low contention & async writes, and RollingRandomAccessFileAppender (the fastest appender) can do rotation. This appender can also optionally do compression in a separate thread, but this happens on roll-over only, not continuously. (Still, I would guess that this is more CPU-efficient than using logrotate + gzip in a separate process...)
> The binary logging can be achieved with a bit of custom code.
> First, your objects would need to provide some way to turn themselves into bytes. I doubt that you want to use java standard serialization :-), so how about this as an interface to log4j that your objects would implement:
> {quote}
> {code:java}
> public interface BinaryMessage extends org.apache.logging.log4j.message.Message {
>     int getLength(); // return byte count
>     void writeInto(ByteBuffer buffer); // write length bytes from the current buffer position
> }
> {code}
> {quote}
> Log4j2 Loggers already have methods that take a Message argument (instead of a String), so in your application code you could just write 
> {quote}
> {code:java}
> Logger logger = LogManager.getLogger(MyClass.class); // usually cached in static field
> logger.info(myObject); // where myObject implements BinaryMessage
> {code}
> {quote}
> Second, in the log4j2 framework, the Layout is responsible for converting a LogEvent into bytes, so you need a BinaryLayout that recognizes BinaryMessages. The layout could optionally output the timestamp (long) and log level (byte), followed by the message size (int) and the message bytes (byte[]).
> Using the log4j2 plugin model, an implementation could look something like below.
> {quote}
> {code:java}
> @Plugin(name = "BinaryLayout", category = "Core", elementType = "layout", printObject = true)
> public final class BinaryLayout extends AbstractLayout<LogEvent> {
>     private static enum TimestampStrategy {
>         WRITE {
>             public void write(LogEvent event, ByteBuffer buffer) {
>                 buffer.putLong(event.getMillis());
>             }
>         },
>         DONT {
>             public void write(LogEvent event, ByteBuffer buffer) {
>             }
>         };
>         abstract void write(LogEvent event, ByteBuffer buffer);
>     }
>     private static enum LevelStrategy {
>         WRITE {
>             public void write(LogEvent event, ByteBuffer buffer) {
>                 buffer.put((byte) event.getLevel().intLevel());
>             }
>         },
>         DONT {
>             public void write(LogEvent event, ByteBuffer buffer) {
>             }
>         };
>         abstract void write(LogEvent event, ByteBuffer buffer);
>     }
>     private final TimestampStrategy timestamp;
>     private final LevelStrategy level;
>     private final int prefixLength;
>     public BinaryLayout(boolean timestamp, boolean level) {
>         this.timestamp = timestamp ? TimestampStrategy.WRITE : TimestampStrategy.DONT;
>         this.level = level ? LevelStrategy.WRITE : LevelStrategy.DONT;
>         final int stampBytes = timestamp ? 8 : 0;
>         final int levelBytes = level ? 1 : 0;
>         prefixLength = 4 + stampBytes + levelBytes;
>     }
>     @Override
>     public byte[] toByteArray(LogEvent event) {
>         final BinaryMessage msg = (BinaryMessage) event.getMessage();
>         final byte[] result = new byte[msg.getLength() + prefixLength];
>         final ByteBuffer buffer = ByteBuffer.wrap(result);
>         timestamp.write(event, buffer);
>         level.write(event, buffer);
>         buffer.putInt(msg.getLength());
>         msg.writeInto(buffer);
>         return result;
>     }
>     @Override
>     public LogEvent toSerializable(LogEvent event) {
>         return event;
>     }
>     @Override
>     public String getContentType() {
>         return "application/octet-stream";
>     }
>     @Override
>     public Map<String, String> getContentFormat() {
>         return new HashMap<String, String>();
>     }
>     @PluginFactory
>     public static BinaryLayout createLayout(
>             @PluginAttribute("writeTimestamp") final String writeTimestamp,
>             @PluginAttribute("writeLevel") final String writeLevel) {
>         final boolean timestamp = Booleans.parseBoolean(writeTimestamp, false);
>         final boolean level = Booleans.parseBoolean(writeLevel, false);
>         return new BinaryLayout(timestamp, level);
>     }
> }
> {code}
> {quote}
> Unfortunately the log4j API does not allow us to be completely garbage-free, you still need to allocate the byte[] array and ByteBuffer for each event...
> (The LogEvents are reused Disruptor ring buffer slots if you make all loggers AsyncLoggers.)
> Still, this should be a lot better (garbage-wise) than base64-encoding a String, and then turning that String into bytes again...
> Here is the log4j2.xml configuration that would put it all together (modify the "packages" attribute at the top to match your package):
> (Don't forget to set system property -DLog4jContextSelector=org.apache.logging.log4j.core.async.AsyncLoggerContextSelector to make all loggers AsyncLoggers.)
> {quote}
> {code:xml}
> <?xml version="1.0" encoding="UTF-8"?>
> <Configuration status="WARN" packages="package.of.binary.layout.class">
>   <Appenders>
>     <RollingRandomAccessFile name="rraf" fileName="binary.log"
>                  filePattern="logs/$${date:yyyy-MM}/app-%d{MM-dd-yyyy}-%i.log.gz"
>                  append="false" immediateFlush="false">
>       <BinaryLayout writeTimestamp="true" writeLevel="true" />
>       <Policies>
>         <TimeBasedTriggeringPolicy />
>         <SizeBasedTriggeringPolicy size="250 MB"/>
>       </Policies>
>     </RollingRandomAccessFile>
>   </Appenders>
>   
>   <Loggers>
>     <Root level="trace" includeLocation="false">
>       <AppenderRef ref="rraf"/>
>     </Root>
>   </Loggers>
> </Configuration>
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-dev-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-dev-help@logging.apache.org