You are viewing a plain text version of this content. The canonical link for it is here.
Posted to log4net-user@logging.apache.org by Nicko Cadell <ni...@neoworks.com> on 2005/04/10 21:07:10 UTC

RE: chainsaw and "escaping" XML entities

Mike,

I have created an issue to track this.

http://issues.apache.org/jira/browse/LOG4NET-22

Nicko

> -----Original Message-----
> From: Mike Blake-Knox [mailto:mikebk824@gmail.com] 
> Sent: 24 March 2005 16:01
> To: Log4NET User
> Cc: Log4J Users List
> Subject: Re: chainsaw and "escaping" XML entities
> 
> To be honest, I was really more concerned by the log being 
> completely dropped. Either of your suggestions would be a 
> step forward. I'd think of substituting some visible 
> character giving an appearance like a hex dump where 
> unprintable characters are often replaced with periods.
> 
> What I've done here is to scan the text before call log4net 
> and replacing most of the control characters with \xnn where 
> nn is the hex value of the offending character. I've done 
> this as I can foresee using the log/trace as a source of test 
> data and a filtered one wouldn't be sufficient.
> 
> Mike
> 
> On Thu, 24 Mar 2005 09:34:39 -0000, Nicko Cadell 
> <ni...@neoworks.com> wrote:
> > If chainsaw can only parse XML 1.0 then we will need to ensure that 
> > our output can be configured to be XML 1.0 compatible. 
> Presumably this 
> > means just stripping the characters that are invalid in XML.
> > 
> > The XML faq basically says that you should use XML elements to 
> > represent control characters. Presumably they had some complaints 
> > about this and therefore allowed them in XML 1.1. If the data is 
> > binary then it should be base64 escaped.
> > 
> > By the way, what do you want chainsaw to do with your control 
> > characters? If we just stripped them out would that be ok? 
> Or could we 
> > replace them with spaces?
> > 
> > Nicko
> > 
> > > -----Original Message-----
> > > From: Mike Blake-Knox [mailto:mikebk824@gmail.com]
> > > Sent: 23 March 2005 14:03
> > > To: Log4NET User
> > > Subject: Re: chainsaw and "escaping" XML entities
> > >
> > > I changed the xml version to 1.1.  Unfortunately, messages "XML 
> > > version "1.0" is recognized, but not "1.1"" are displayed when it 
> > > tries to parse it. (In some perverse way, that's an 
> improvement in 
> > > that you get the message on the display for each received 
> log; using 
> > > xml 1.0, it just dropped the message and dumped into the console).
> > >
> > > I'm using jdk 1.4.2_06.
> > >
> > > Mike
> > >
> > > On Mon, 21 Mar 2005 14:58:54 -0000, Nicko Cadell 
> > > <ni...@neoworks.com> wrote:
> > > >
> > > > In XML 1.1 so called restricted chars are allowed to be 
> included 
> > > > as numeric character references (i.e. &#x1E;).
> > > >
> > > > For details see
> > > > http://www.w3.org/TR/2004/REC-xml11-20040204/#charsets
> > > > and also
> > > > http://www.w3.org/International/questions/qa-controls#answer
> > > >
> > > > If you change the version header at the top of the XML file
> > > to 1.1 can
> > > > chainsaw parse the event?
> > > >
> > > > I will have a look and see if it is possible to control the
> > > behaviour
> > > > of the XmlTextWriter with respect to the target XML version.
> > > >
> > > > Nicko
> > > >
> > >
> > 
> 
> 
> --
> Mike Blake-Knox
> 

Re: chainsaw and "escaping" XML entities

Posted by Mike Blake-Knox <mi...@gmail.com>.
I hadn't realized the size of the can of worms I was opening.

On Apr 10, 2005 3:07 PM, Nicko Cadell <ni...@neoworks.com> wrote:
> Mike,
> 
> I have created an issue to track this.
> 
> http://issues.apache.org/jira/browse/LOG4NET-22
> 


>> For invalid characters such as 0x1e there are 3 possible solutions:

>> 1) Discard the character from the output.

>> 2) Replace the character with a numeric representation e.g. "0x1E".

>> 3) Replace the character with an XML element e.g. <char code="30"/>

> Nicko
> 
>> favour option 3 above because information is not lost. In options 1 and 2
>>  information is lost. In 2 the encoding is not reversible. With 3 the
>>  application reading the data requires additional smarts to pickup on
>>  the encoded values in element, but all the original information is
>>  preserved. If the app just asks for the text nodes, ignoring the
>>  child elements, then they will get back the same result as from 1.

If the application just deserializes the string, they'll end up with a
much more complex tree structure with a couple of text nodes, an
attribute node, ....

I don't see that the transport of binary data is a key purpose for
log4net. Much as I dislike option proliferation, I wonder if would it
be reasonable to have 3 as an optional behavior but 1 or 2 as a
default?  What does log4j do in this situation?
-- 
Mike Blake-Knox