You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by "Mike Percy (Issue Comment Edited) (JIRA)" <ji...@apache.org> on 2012/02/22 09:26:48 UTC

[jira] [Issue Comment Edited] (FLUME-828) LoggerSink representation of the event's body isn't too useful

    [ https://issues.apache.org/jira/browse/FLUME-828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13213447#comment-13213447 ] 

Mike Percy edited comment on FLUME-828 at 2/22/12 8:25 AM:
-----------------------------------------------------------

Hi Brock,
I wonder if toString() should be overridden in SimpleEvent at all. I'm thinking it might make sense to factor the hex dumping code out into a static method of some utility class that takes an Event as a parameter. That way it would work with any Event implementation, not just SimpleEvent.

Regarding the LoggerSink, based on the example of hooking up a Sequence Generator Source to a LoggerSink, I think the intention was to simply stringify the bytes, assuming they were UTF8-encoded, and print them in a human-readable fashion. So we would assume that the body is the result of String.getBytes(Charsets.UTF_8) and therefore decode via new String(event.getBody(), Charsets.UTF_8). Unfortunately, that isn't very helpful in the more general case, so I can see the utility of the hex dump.

To be honest, I think this bug is an indication that we may be missing some important type information in the system that one might want to use to determine how to decode a given Event. So regardless of how we fix this bug it ends up being kind of a band-aid. :) What do you think?
                
      was (Author: mpercy):
    Hi Brock,
I wonder if toString() should be overridden in SimpleEvent at all. I'm thinking it might make sense to factor the hex dumping code out into a static method of some utility class that takes an Event as a parameter. That way it would work with any Event implementation, not just SimpleEvent.

Regarding the LoggerSink, based on the example of hooking up a Sequence Generator Source to a LoggerSink, I think the intention was to simply stringify the bytes, assuming they were UTF8-encoded, and print them in a human-readable fashion. So we would assume that the body is the result of String.getBytes() and therefore decode via new String(event.getBody(), Charset.UTF_8). Unfortunately, that isn't very helpful in the more general case, so I can see the utility of the hex dump.

To be honest, I think this bug is an indication that we may be missing some important type information in the system that one might want to use to determine how to decode a given Event. So regardless of how we fix this bug it ends up being kind of a band-aid. :) What do you think?
                  
> LoggerSink representation of the event's body isn't too useful
> --------------------------------------------------------------
>
>                 Key: FLUME-828
>                 URL: https://issues.apache.org/jira/browse/FLUME-828
>             Project: Flume
>          Issue Type: Improvement
>          Components: Sinks+Sources
>    Affects Versions: NG alpha 1
>            Reporter: Will McQueen
>            Assignee: Brock Noland
>             Fix For: v1.1.0
>
>         Attachments: FLUME-828-0.patch, FLUME-828-1.patch
>
>
> LoggerSink logs entries to console that looks like this:
>      Event: { headers:{} body:[B@5c1ae90c }
> ...where the body is just "getClass().getName() + "@" + Integer.toHexString(hashCode())". The "getClass().getName() will always resolve to [B.
> The issue seems to be how can we represent a SimpleEvent's payload as a String, when the payload is some arbitrary byte array... the array's bytes could represent encoded ascii chars, encoded UTF-8 chars, or binary data such as an encrypted payload. If we default to ASCII translation for everything, then the resulting String won't be useful for binary payloads since not all 256 possible bytes have equivalent printable ASCII chars. Here's one idea:
> For each event body, we can print up to the first 16 bytes in hex format. If there are >16 bytes, then print a "..." suffix at the end. The output would look similar to what you get with unix "hexdump -C". Here's what a sample output from LoggerSink would look like:
>      Event: { headers:{} body: 00000000 54 68 65 20 71 75 69 63 6B 20 62 72 6F 77 6E 20 |The quick brown | ... }
> ...where both the hex and the ascii are displayed for the first 16 chars.
> Is it the most useful representation of the body? Probably not. Is it as least more useful than printing "[B@" + Integer.toHexString(hashCode())"? I think so.
> The commons io lib has a useful HexDump.dump cmd we can leverage.
> Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira