You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Aleksandr Kravets <ak...@gmail.com> on 2009/02/27 00:17:55 UTC

carriage return in attribute

Hello,

I am loading an XML document that contains carriage returns in attributes.
How can I preserve these or convert them to &#xD;? Is it possible. I found
this thread, but not exactly sure how to use character entity reference.

thanks,
Alex

Re: carriage return in attribute

Posted by Michael Glavassevich <mr...@ca.ibm.com>.
It should have been written by the producer/serializer of the XML document
if the carriage return is actually part of the document's content.

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

Aleksandr Kravets <ak...@gmail.com> wrote on 03/02/2009 01:39:35
PM:

> So it would need to be replaced in place of carriage return manually?

> On Mon, Mar 2, 2009 at 1:36 PM, Paul Gearon <ge...@ieee.org> wrote:
> I'm not saying that this is the answer to your problem, but the entity
> referred to here is:
>  &#x0D;
>
> Paul
>
> On Mon, Mar 2, 2009 at 12:20 PM, Aleksandr Kravets
> <ak...@gmail.com> wrote:
> > Ok, I think I found an issue similar to mine, it is in this thread:
> > http://www.stylusstudio.com/xsllist/200404/post40600.html
> >
> > Particular line of interest to me is this:
> >
> > "BTW, if you want your attribute to have a carriage return, you can use
an
> > entity to express the carriage return, then it doesn't get normalized."
> >
> > So can someone explain what this means and how do I describe
theseentities?
> > May be I can insert them into XML before importing and letting parser
do its
> > work?
> >
> > thanks,
> > Alex
> >
> >
> >
> > On Mon, Mar 2, 2009 at 12:26 PM, Aleksandr Kravets
<ak...@gmail.com>
> > wrote:
> >>
> >> Totally agree, but even if originating XML is corrected, there are
clients
> >> with wrong style XML that will use my application to import XML and in
such
> >> a case there is little I can do. So, is there a way to correct this
problem
> >> during the import?
> >>
> >> Thanks for your help,
> >> Alex
> >>
> >> On Mon, Mar 2, 2009 at 12:21 PM, <ke...@us.ibm.com> wrote:
> >>>
> >>> The purpose of an XML parser is to read correct XML. Get whoever's
> >>> generating that file to produce XML that expresses their intent
> >>> correctly,
> >>> or throw in a filtering stage that corrects their error.  Personally,
I
> >>> would apply a clue-by-four to the author of whatever's generating
that
> >>> document rather than trying to tolerate it, since they're just going
to
> >>> get themselves in deeper trouble later... but I understand that this
> >>> isn't
> >>> always possible.
> >>>
> >>> "The customer isn't always right. Unfortunately, the customer is
always
> >>> the one with the money."
> >>>
> >>> ______________________________________
> >>> "... Three things see no end: A loop with exit code done wrong,
> >>> A semaphore untested, And the change that comes along. ..."
> >>>  -- "Threes" Rev 1.1 - Duane Elms / Leslie Fish (
> >>> http://www.ovff.org/pegasus/songs/threes-rev-11.html)
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> >>> For additional commands, e-mail: j-users-help@xerces.apache.org
> >>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org

Re: carriage return in attribute

Posted by Paul Gearon <ge...@ieee.org>.
Well, it's legal to have these entities in your attributes, and it's
not legal to have a carriage return in an attribute, so if you can
replace one with the other, then great.

The problem with doing a pre-filter like this is ONLY replacing
carriage returns that are inside attributes. You'll need some sort of
parser for that, and the parser will need to know a fair amount of
XML. Do you see where this is going?  :-)

Really, anyone generating faulty XML like this needs to be instructed
in the error of their ways. I mean, what are they creating the XML
for? Is there some parser out there that is currently handling these
faulty documents for them?

Paul

On Mon, Mar 2, 2009 at 12:39 PM, Aleksandr Kravets
<ak...@gmail.com> wrote:
> So it would need to be replaced in place of carriage return manually?
>
> On Mon, Mar 2, 2009 at 1:36 PM, Paul Gearon <ge...@ieee.org> wrote:
>>
>> I'm not saying that this is the answer to your problem, but the entity
>> referred to here is:
>>  &#x0D;
>>
>> Paul
>>
>> On Mon, Mar 2, 2009 at 12:20 PM, Aleksandr Kravets
>> <ak...@gmail.com> wrote:
>> > Ok, I think I found an issue similar to mine, it is in this thread:
>> > http://www.stylusstudio.com/xsllist/200404/post40600.html
>> >
>> > Particular line of interest to me is this:
>> >
>> > "BTW, if you want your attribute to have a carriage return, you can use
>> > an
>> > entity to express the carriage return, then it doesn't get normalized."
>> >
>> > So can someone explain what this means and how do I describe these
>> > entities?
>> > May be I can insert them into XML before importing and letting parser do
>> > its
>> > work?
>> >
>> > thanks,
>> > Alex
>> >
>> >
>> >
>> > On Mon, Mar 2, 2009 at 12:26 PM, Aleksandr Kravets
>> > <ak...@gmail.com>
>> > wrote:
>> >>
>> >> Totally agree, but even if originating XML is corrected, there are
>> >> clients
>> >> with wrong style XML that will use my application to import XML and in
>> >> such
>> >> a case there is little I can do. So, is there a way to correct this
>> >> problem
>> >> during the import?
>> >>
>> >> Thanks for your help,
>> >> Alex
>> >>
>> >> On Mon, Mar 2, 2009 at 12:21 PM, <ke...@us.ibm.com> wrote:
>> >>>
>> >>> The purpose of an XML parser is to read correct XML. Get whoever's
>> >>> generating that file to produce XML that expresses their intent
>> >>> correctly,
>> >>> or throw in a filtering stage that corrects their error.  Personally,
>> >>> I
>> >>> would apply a clue-by-four to the author of whatever's generating that
>> >>> document rather than trying to tolerate it, since they're just going
>> >>> to
>> >>> get themselves in deeper trouble later... but I understand that this
>> >>> isn't
>> >>> always possible.
>> >>>
>> >>> "The customer isn't always right. Unfortunately, the customer is
>> >>> always
>> >>> the one with the money."
>> >>>
>> >>> ______________________________________
>> >>> "... Three things see no end: A loop with exit code done wrong,
>> >>> A semaphore untested, And the change that comes along. ..."
>> >>>  -- "Threes" Rev 1.1 - Duane Elms / Leslie Fish (
>> >>> http://www.ovff.org/pegasus/songs/threes-rev-11.html)
>> >>>
>> >>> ---------------------------------------------------------------------
>> >>> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
>> >>> For additional commands, e-mail: j-users-help@xerces.apache.org
>> >>>
>> >>
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
>> For additional commands, e-mail: j-users-help@xerces.apache.org
>>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


Re: carriage return in attribute

Posted by Aleksandr Kravets <ak...@gmail.com>.
So it would need to be replaced in place of carriage return manually?

On Mon, Mar 2, 2009 at 1:36 PM, Paul Gearon <ge...@ieee.org> wrote:

> I'm not saying that this is the answer to your problem, but the entity
> referred to here is:
>  &#x0D;
>
> Paul
>
> On Mon, Mar 2, 2009 at 12:20 PM, Aleksandr Kravets
> <ak...@gmail.com> wrote:
> > Ok, I think I found an issue similar to mine, it is in this thread:
> > http://www.stylusstudio.com/xsllist/200404/post40600.html
> >
> > Particular line of interest to me is this:
> >
> > "BTW, if you want your attribute to have a carriage return, you can use
> an
> > entity to express the carriage return, then it doesn't get normalized."
> >
> > So can someone explain what this means and how do I describe these
> entities?
> > May be I can insert them into XML before importing and letting parser do
> its
> > work?
> >
> > thanks,
> > Alex
> >
> >
> >
> > On Mon, Mar 2, 2009 at 12:26 PM, Aleksandr Kravets <
> akravets.work@gmail.com>
> > wrote:
> >>
> >> Totally agree, but even if originating XML is corrected, there are
> clients
> >> with wrong style XML that will use my application to import XML and in
> such
> >> a case there is little I can do. So, is there a way to correct this
> problem
> >> during the import?
> >>
> >> Thanks for your help,
> >> Alex
> >>
> >> On Mon, Mar 2, 2009 at 12:21 PM, <ke...@us.ibm.com> wrote:
> >>>
> >>> The purpose of an XML parser is to read correct XML. Get whoever's
> >>> generating that file to produce XML that expresses their intent
> >>> correctly,
> >>> or throw in a filtering stage that corrects their error.  Personally, I
> >>> would apply a clue-by-four to the author of whatever's generating that
> >>> document rather than trying to tolerate it, since they're just going to
> >>> get themselves in deeper trouble later... but I understand that this
> >>> isn't
> >>> always possible.
> >>>
> >>> "The customer isn't always right. Unfortunately, the customer is always
> >>> the one with the money."
> >>>
> >>> ______________________________________
> >>> "... Three things see no end: A loop with exit code done wrong,
> >>> A semaphore untested, And the change that comes along. ..."
> >>>  -- "Threes" Rev 1.1 - Duane Elms / Leslie Fish (
> >>> http://www.ovff.org/pegasus/songs/threes-rev-11.html)
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> >>> For additional commands, e-mail: j-users-help@xerces.apache.org
> >>>
> >>
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org
>
>

Re: carriage return in attribute

Posted by Aleksandr Kravets <ak...@gmail.com>.
Thanks guys. I guess the only thing that is left for me to do is talk to XML
originator with "XML for Dummies" in one hand and a baseball bat in the
other :)

On Mon, Mar 2, 2009 at 1:49 PM, <ke...@us.ibm.com> wrote:

> Actually, &#xD; or &#13; are technically "numeric character references",
> not entity references. Check the spec, but if I'm remembering the whitespace
> rules correctly, these may get converted early enough not to help in this
> case. You may need an actual &CR; entity defined in the DTD.
>
> ______________________________________
> "... Three things see no end: A loop with exit code done wrong,
> A semaphore untested, And the change that comes along. ..."
>  -- "Threes" Rev 1.1 - Duane Elms / Leslie Fish (
> http://www.ovff.org/pegasus/songs/threes-rev-11.html)
>
>
>  *Paul Gearon <ge...@ieee.org>*
> Sent by: gearon@gmail.com
>
> 03/02/2009 01:36 PM
>  Please respond to
> j-users@xerces.apache.org
>
>   To
> j-users@xerces.apache.org  cc
>   Subject
> Re: carriage return in attribute
>
>
>
>
> I'm not saying that this is the answer to your problem, but the entity
> referred to here is:
>  &#x0D;
>
> Paul
>
> On Mon, Mar 2, 2009 at 12:20 PM, Aleksandr Kravets
> <ak...@gmail.com> wrote:
> > Ok, I think I found an issue similar to mine, it is in this thread:
> > http://www.stylusstudio.com/xsllist/200404/post40600.html
> >
> > Particular line of interest to me is this:
> >
> > "BTW, if you want your attribute to have a carriage return, you can use
> an
> > entity to express the carriage return, then it doesn't get normalized."
> >
> > So can someone explain what this means and how do I describe these
> entities?
> > May be I can insert them into XML before importing and letting parser do
> its
> > work?
> >
> > thanks,
> > Alex
> >
> >
> >
> > On Mon, Mar 2, 2009 at 12:26 PM, Aleksandr Kravets <
> akravets.work@gmail.com>
> > wrote:
> >>
> >> Totally agree, but even if originating XML is corrected, there are
> clients
> >> with wrong style XML that will use my application to import XML and in
> such
> >> a case there is little I can do. So, is there a way to correct this
> problem
> >> during the import?
> >>
> >> Thanks for your help,
> >> Alex
> >>
> >> On Mon, Mar 2, 2009 at 12:21 PM, <ke...@us.ibm.com> wrote:
> >>>
> >>> The purpose of an XML parser is to read correct XML. Get whoever's
> >>> generating that file to produce XML that expresses their intent
> >>> correctly,
> >>> or throw in a filtering stage that corrects their error.  Personally, I
> >>> would apply a clue-by-four to the author of whatever's generating that
> >>> document rather than trying to tolerate it, since they're just going to
> >>> get themselves in deeper trouble later... but I understand that this
> >>> isn't
> >>> always possible.
> >>>
> >>> "The customer isn't always right. Unfortunately, the customer is always
> >>> the one with the money."
> >>>
> >>> ______________________________________
> >>> "... Three things see no end: A loop with exit code done wrong,
> >>> A semaphore untested, And the change that comes along. ..."
> >>>  -- "Threes" Rev 1.1 - Duane Elms / Leslie Fish (
> >>> http://www.ovff.org/pegasus/songs/threes-rev-11.html)
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> >>> For additional commands, e-mail: j-users-help@xerces.apache.org
> >>>
> >>
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org
>
>
>

Re: carriage return in attribute

Posted by Michael Glavassevich <mr...@ca.ibm.com>.
keshlam@us.ibm.com wrote on 03/02/2009 01:49:31 PM:

> Actually, &#xD; or &#13; are technically "numeric character
> references", not entity references. Check the spec, but if I'm
> remembering the whitespace rules correctly, these may get converted
> early enough not to help in this case. You may need an actual &CR;
> entity defined in the DTD.

The character reference is sufficient to get passed end-of-line and
attribute value normalization.

> ______________________________________
> "... Three things see no end: A loop with exit code done wrong,
> A semaphore untested, And the change that comes along. ..."
>  -- "Threes" Rev 1.1 - Duane Elms / Leslie Fish (http://www.ovff.
> org/pegasus/songs/threes-rev-11.html)

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

Re: carriage return in attribute

Posted by ke...@us.ibm.com.
Actually, &#xD; or &#13; are technically "numeric character references", 
not entity references. Check the spec, but if I'm remembering the 
whitespace rules correctly, these may get converted early enough not to 
help in this case. You may need an actual &CR; entity defined in the DTD.

______________________________________
"... Three things see no end: A loop with exit code done wrong,
A semaphore untested, And the change that comes along. ..."
  -- "Threes" Rev 1.1 - Duane Elms / Leslie Fish (
http://www.ovff.org/pegasus/songs/threes-rev-11.html)



Paul Gearon <ge...@ieee.org> 
Sent by: gearon@gmail.com
03/02/2009 01:36 PM
Please respond to
j-users@xerces.apache.org


To
j-users@xerces.apache.org
cc

Subject
Re: carriage return in attribute






I'm not saying that this is the answer to your problem, but the entity
referred to here is:
  &#x0D;

Paul

On Mon, Mar 2, 2009 at 12:20 PM, Aleksandr Kravets
<ak...@gmail.com> wrote:
> Ok, I think I found an issue similar to mine, it is in this thread:
> http://www.stylusstudio.com/xsllist/200404/post40600.html
>
> Particular line of interest to me is this:
>
> "BTW, if you want your attribute to have a carriage return, you can use 
an
> entity to express the carriage return, then it doesn't get normalized."
>
> So can someone explain what this means and how do I describe these 
entities?
> May be I can insert them into XML before importing and letting parser do 
its
> work?
>
> thanks,
> Alex
>
>
>
> On Mon, Mar 2, 2009 at 12:26 PM, Aleksandr Kravets 
<ak...@gmail.com>
> wrote:
>>
>> Totally agree, but even if originating XML is corrected, there are 
clients
>> with wrong style XML that will use my application to import XML and in 
such
>> a case there is little I can do. So, is there a way to correct this 
problem
>> during the import?
>>
>> Thanks for your help,
>> Alex
>>
>> On Mon, Mar 2, 2009 at 12:21 PM, <ke...@us.ibm.com> wrote:
>>>
>>> The purpose of an XML parser is to read correct XML. Get whoever's
>>> generating that file to produce XML that expresses their intent
>>> correctly,
>>> or throw in a filtering stage that corrects their error.  Personally, 
I
>>> would apply a clue-by-four to the author of whatever's generating that
>>> document rather than trying to tolerate it, since they're just going 
to
>>> get themselves in deeper trouble later... but I understand that this
>>> isn't
>>> always possible.
>>>
>>> "The customer isn't always right. Unfortunately, the customer is 
always
>>> the one with the money."
>>>
>>> ______________________________________
>>> "... Three things see no end: A loop with exit code done wrong,
>>> A semaphore untested, And the change that comes along. ..."
>>>  -- "Threes" Rev 1.1 - Duane Elms / Leslie Fish (
>>> http://www.ovff.org/pegasus/songs/threes-rev-11.html)
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
>>> For additional commands, e-mail: j-users-help@xerces.apache.org
>>>
>>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org



Re: carriage return in attribute

Posted by Paul Gearon <ge...@ieee.org>.
I'm not saying that this is the answer to your problem, but the entity
referred to here is:
  &#x0D;

Paul

On Mon, Mar 2, 2009 at 12:20 PM, Aleksandr Kravets
<ak...@gmail.com> wrote:
> Ok, I think I found an issue similar to mine, it is in this thread:
> http://www.stylusstudio.com/xsllist/200404/post40600.html
>
> Particular line of interest to me is this:
>
> "BTW, if you want your attribute to have a carriage return, you can use an
> entity to express the carriage return, then it doesn't get normalized."
>
> So can someone explain what this means and how do I describe these entities?
> May be I can insert them into XML before importing and letting parser do its
> work?
>
> thanks,
> Alex
>
>
>
> On Mon, Mar 2, 2009 at 12:26 PM, Aleksandr Kravets <ak...@gmail.com>
> wrote:
>>
>> Totally agree, but even if originating XML is corrected, there are clients
>> with wrong style XML that will use my application to import XML and in such
>> a case there is little I can do. So, is there a way to correct this problem
>> during the import?
>>
>> Thanks for your help,
>> Alex
>>
>> On Mon, Mar 2, 2009 at 12:21 PM, <ke...@us.ibm.com> wrote:
>>>
>>> The purpose of an XML parser is to read correct XML. Get whoever's
>>> generating that file to produce XML that expresses their intent
>>> correctly,
>>> or throw in a filtering stage that corrects their error.  Personally, I
>>> would apply a clue-by-four to the author of whatever's generating that
>>> document rather than trying to tolerate it, since they're just going to
>>> get themselves in deeper trouble later... but I understand that this
>>> isn't
>>> always possible.
>>>
>>> "The customer isn't always right. Unfortunately, the customer is always
>>> the one with the money."
>>>
>>> ______________________________________
>>> "... Three things see no end: A loop with exit code done wrong,
>>> A semaphore untested, And the change that comes along. ..."
>>>  -- "Threes" Rev 1.1 - Duane Elms / Leslie Fish (
>>> http://www.ovff.org/pegasus/songs/threes-rev-11.html)
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
>>> For additional commands, e-mail: j-users-help@xerces.apache.org
>>>
>>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


Re: carriage return in attribute

Posted by Aleksandr Kravets <ak...@gmail.com>.
Ok, I think I found an issue similar to mine, it is in this thread:
http://www.stylusstudio.com/xsllist/200404/post40600.html

Particular line of interest to me is this:

"BTW, if you want your attribute to have a carriage return, you can use an
entity to express the carriage return, then it doesn't get normalized."

So can someone explain what this means and how do I describe these entities?
May be I can insert them into XML before importing and letting parser do its
work?

thanks,
Alex



On Mon, Mar 2, 2009 at 12:26 PM, Aleksandr Kravets
<ak...@gmail.com>wrote:

> Totally agree, but even if originating XML is corrected, there are clients
> with wrong style XML that will use my application to import XML and in such
> a case there is little I can do. So, is there a way to correct this problem
> during the import?
>
> Thanks for your help,
> Alex
>
>
> On Mon, Mar 2, 2009 at 12:21 PM, <ke...@us.ibm.com> wrote:
>
>> The purpose of an XML parser is to read correct XML. Get whoever's
>> generating that file to produce XML that expresses their intent correctly,
>> or throw in a filtering stage that corrects their error.  Personally, I
>> would apply a clue-by-four to the author of whatever's generating that
>> document rather than trying to tolerate it, since they're just going to
>> get themselves in deeper trouble later... but I understand that this isn't
>> always possible.
>>
>> "The customer isn't always right. Unfortunately, the customer is always
>> the one with the money."
>>
>> ______________________________________
>> "... Three things see no end: A loop with exit code done wrong,
>> A semaphore untested, And the change that comes along. ..."
>>  -- "Threes" Rev 1.1 - Duane Elms / Leslie Fish (
>> http://www.ovff.org/pegasus/songs/threes-rev-11.html)
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
>> For additional commands, e-mail: j-users-help@xerces.apache.org
>>
>>
>

Re: carriage return in attribute

Posted by Aleksandr Kravets <ak...@gmail.com>.
Totally agree, but even if originating XML is corrected, there are clients
with wrong style XML that will use my application to import XML and in such
a case there is little I can do. So, is there a way to correct this problem
during the import?

Thanks for your help,
Alex

On Mon, Mar 2, 2009 at 12:21 PM, <ke...@us.ibm.com> wrote:

> The purpose of an XML parser is to read correct XML. Get whoever's
> generating that file to produce XML that expresses their intent correctly,
> or throw in a filtering stage that corrects their error.  Personally, I
> would apply a clue-by-four to the author of whatever's generating that
> document rather than trying to tolerate it, since they're just going to
> get themselves in deeper trouble later... but I understand that this isn't
> always possible.
>
> "The customer isn't always right. Unfortunately, the customer is always
> the one with the money."
>
> ______________________________________
> "... Three things see no end: A loop with exit code done wrong,
> A semaphore untested, And the change that comes along. ..."
>  -- "Threes" Rev 1.1 - Duane Elms / Leslie Fish (
> http://www.ovff.org/pegasus/songs/threes-rev-11.html)
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org
>
>

Re: carriage return in attribute

Posted by ke...@us.ibm.com.
The purpose of an XML parser is to read correct XML. Get whoever's 
generating that file to produce XML that expresses their intent correctly, 
or throw in a filtering stage that corrects their error.  Personally, I 
would apply a clue-by-four to the author of whatever's generating that 
document rather than trying to tolerate it, since they're just going to 
get themselves in deeper trouble later... but I understand that this isn't 
always possible.

"The customer isn't always right. Unfortunately, the customer is always 
the one with the money."

______________________________________
"... Three things see no end: A loop with exit code done wrong,
A semaphore untested, And the change that comes along. ..."
  -- "Threes" Rev 1.1 - Duane Elms / Leslie Fish (
http://www.ovff.org/pegasus/songs/threes-rev-11.html)

---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


Re: carriage return in attribute

Posted by Aleksandr Kravets <ak...@gmail.com>.
Thanks Michael,

I understand about XML rules for processing of carriage returns. I am
dealing with an XML document that in being imported into my application. I
am not sure if it has been serialized correctly or not, but if I read
through this document byte-by-byte I see carriage return (13) and newline
(10) as termination characters in an attribute that is a String. I know it's
probably wrong to put these characters in an attribute and this should have
been a value of the element inside a CDATA, but this is the document that I
need to work with.
So once I parse this document all CRLFs are converted to LFs and I am left
with a line with newlines which changes how this attribute is displayed -
string is displayed in line instead of having newlines visible.
Now, I guess I can read through the document before it is imported (without
parser) and replace all CRLFs with &#xA; to make it correct. However, this
would be ugly and I was wondering if there is an easier way to deal with
this.

Hope I am being clear in what I am trying to achieve.

thanks,
Alex

On Sat, Feb 28, 2009 at 10:53 AM, Michael Glavassevich
<mr...@ca.ibm.com>wrote:

> I'm not sure what you're asking for. Attribute value normalization [1] is
> part of the parsing process. It occurs before the data is presented to an
> application through any of the standard APIs.
>
> [1] http://www.w3.org/TR/2006/REC-xml-20060816/#AVNormalize
>
> Michael Glavassevich
> XML Parser Development
> IBM Toronto Lab
> E-mail: mrglavas@ca.ibm.com
> E-mail: mrglavas@apache.org
>
> Aleksandr Kravets <ak...@gmail.com> wrote on 02/27/2009 10:07:08
> AM:
>
>
> > Thanks.
> > Are there utilities in Xerces that allow carriage returns
> > normalization easier than let's say parsing the whole document and
> > doing it manually?
>
> > On Thu, Feb 26, 2009 at 6:39 PM, <ke...@us.ibm.com> wrote:
> > Carriage return is ASCII 13, so &#13; or &xD; will represent that
> character.
> >
> > However, be sure you understand XML's rules for whitespace
> > normalization in attribute values. Depending on what you're trying
> > to do, you may want to replace that attribute with a child
> > element... or replace the offending character with some notation
> > that your application, rather than XML, will process appropriately.
> >
> > ______________________________________
> > "... Three things see no end: A loop with exit code done wrong,
> > A semaphore untested, And the change that comes along. ..."
> >  -- "Threes" Rev 1.1 - Duane Elms / Leslie Fish (http://www.ovff.
> > org/pegasus/songs/threes-rev-11.html)
>

Re: carriage return in attribute

Posted by Michael Glavassevich <mr...@ca.ibm.com>.
I'm not sure what you're asking for. Attribute value normalization [1] is
part of the parsing process. It occurs before the data is presented to an
application through any of the standard APIs.

[1] http://www.w3.org/TR/2006/REC-xml-20060816/#AVNormalize

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

Aleksandr Kravets <ak...@gmail.com> wrote on 02/27/2009 10:07:08
AM:

> Thanks.
> Are there utilities in Xerces that allow carriage returns
> normalization easier than let's say parsing the whole document and
> doing it manually?

> On Thu, Feb 26, 2009 at 6:39 PM, <ke...@us.ibm.com> wrote:
> Carriage return is ASCII 13, so &#13; or &xD; will represent that
character.
>
> However, be sure you understand XML's rules for whitespace
> normalization in attribute values. Depending on what you're trying
> to do, you may want to replace that attribute with a child
> element... or replace the offending character with some notation
> that your application, rather than XML, will process appropriately.
>
> ______________________________________
> "... Three things see no end: A loop with exit code done wrong,
> A semaphore untested, And the change that comes along. ..."
>  -- "Threes" Rev 1.1 - Duane Elms / Leslie Fish (http://www.ovff.
> org/pegasus/songs/threes-rev-11.html)

Re: carriage return in attribute

Posted by Aleksandr Kravets <ak...@gmail.com>.
Thanks.
Are there utilities in Xerces that allow carriage returns normalization
easier than let's say parsing the whole document and doing it manually?

On Thu, Feb 26, 2009 at 6:39 PM, <ke...@us.ibm.com> wrote:

> Carriage return is ASCII 13, so &#13; or &xD; will represent that
> character.
>
> However, be sure you understand XML's rules for whitespace normalization in
> attribute values. Depending on what you're trying to do, you may want to
> replace that attribute with a child element... or replace the offending
> character with some notation that your application, rather than XML, will
> process appropriately.
>
> ______________________________________
> "... Three things see no end: A loop with exit code done wrong,
> A semaphore untested, And the change that comes along. ..."
>  -- "Threes" Rev 1.1 - Duane Elms / Leslie Fish (
> http://www.ovff.org/pegasus/songs/threes-rev-11.html)

Re: carriage return in attribute

Posted by ke...@us.ibm.com.
Carriage return is ASCII 13, so &#13; or &xD; will represent that 
character.

However, be sure you understand XML's rules for whitespace normalization 
in attribute values. Depending on what you're trying to do, you may want 
to replace that attribute with a child element... or replace the offending 
character with some notation that your application, rather than XML, will 
process appropriately.

______________________________________
"... Three things see no end: A loop with exit code done wrong,
A semaphore untested, And the change that comes along. ..."
  -- "Threes" Rev 1.1 - Duane Elms / Leslie Fish (
http://www.ovff.org/pegasus/songs/threes-rev-11.html)

Re: carriage return in attribute

Posted by Aleksandr Kravets <ak...@gmail.com>.
Sorry, forgot to include the actual thread:
http://markmail.org/message/lmgntmy6difut76l#query:xerces%20preserve%20carriage%20return+page:2+mid:md2otcqsm2pllxx7+state:results

thanks,
Alex

On Thu, Feb 26, 2009 at 6:17 PM, Aleksandr Kravets
<ak...@gmail.com>wrote:

> Hello,
>
> I am loading an XML document that contains carriage returns in attributes.
> How can I preserve these or convert them to &#xD;? Is it possible. I found
> this thread, but not exactly sure how to use character entity reference.
>
> thanks,
> Alex
>